Cargando…

A top-down approach to classify enzyme functional classes and sub-classes using random forest

Advancements in sequencing technologies have witnessed an exponential rise in the number of newly found enzymes. Enzymes are proteins that catalyze bio-chemical reactions and play an important role in metabolic pathways. Commonly, function of such enzymes is determined by experiments that can be tim...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kumar, Chetan, Choudhary, Alok
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3351021/ https://www.ncbi.nlm.nih.gov/pubmed/22376768 http://dx.doi.org/10.1186/1687-4153-2012-1

_version_	1782232728052170752
author	Kumar, Chetan Choudhary, Alok
author_facet	Kumar, Chetan Choudhary, Alok
author_sort	Kumar, Chetan
collection	PubMed
description	Advancements in sequencing technologies have witnessed an exponential rise in the number of newly found enzymes. Enzymes are proteins that catalyze bio-chemical reactions and play an important role in metabolic pathways. Commonly, function of such enzymes is determined by experiments that can be time consuming and costly. Hence, a need for a computing method is felt that can distinguish protein enzyme sequences from those of non-enzymes and reliably predict the function of the former. To address this problem, approaches that cluster enzymes based on their sequence and structural similarity have been presented. But, these approaches are known to fail for proteins that perform the same function and are dissimilar in their sequence and structure. In this article, we present a supervised machine learning model to predict the function class and sub-class of enzymes based on a set of 73 sequence-derived features. The functional classes are as defined by International Union of Biochemistry and Molecular Biology. Using an efficient data mining algorithm called random forest, we construct a top-down three layer model where the top layer classifies a query protein sequence as an enzyme or non-enzyme, the second layer predicts the main function class and bottom layer further predicts the sub-function class. The model reported overall classification accuracy of 94.87% for the first level, 87.7% for the second, and 84.25% for the bottom level. Our results compare very well with existing methods, and in many cases report better performance. Using feature selection methods, we have shown the biological relevance of a few of the top rank attributes.
format	Online Article Text
id	pubmed-3351021
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33510212012-05-15 A top-down approach to classify enzyme functional classes and sub-classes using random forest Kumar, Chetan Choudhary, Alok EURASIP J Bioinform Syst Biol Research Advancements in sequencing technologies have witnessed an exponential rise in the number of newly found enzymes. Enzymes are proteins that catalyze bio-chemical reactions and play an important role in metabolic pathways. Commonly, function of such enzymes is determined by experiments that can be time consuming and costly. Hence, a need for a computing method is felt that can distinguish protein enzyme sequences from those of non-enzymes and reliably predict the function of the former. To address this problem, approaches that cluster enzymes based on their sequence and structural similarity have been presented. But, these approaches are known to fail for proteins that perform the same function and are dissimilar in their sequence and structure. In this article, we present a supervised machine learning model to predict the function class and sub-class of enzymes based on a set of 73 sequence-derived features. The functional classes are as defined by International Union of Biochemistry and Molecular Biology. Using an efficient data mining algorithm called random forest, we construct a top-down three layer model where the top layer classifies a query protein sequence as an enzyme or non-enzyme, the second layer predicts the main function class and bottom layer further predicts the sub-function class. The model reported overall classification accuracy of 94.87% for the first level, 87.7% for the second, and 84.25% for the bottom level. Our results compare very well with existing methods, and in many cases report better performance. Using feature selection methods, we have shown the biological relevance of a few of the top rank attributes. BioMed Central 2012 2012-02-29 /pmc/articles/PMC3351021/ /pubmed/22376768 http://dx.doi.org/10.1186/1687-4153-2012-1 Text en Copyright ©2012 Kumar and Choudhary; licensee Springer. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Kumar, Chetan Choudhary, Alok A top-down approach to classify enzyme functional classes and sub-classes using random forest
title	A top-down approach to classify enzyme functional classes and sub-classes using random forest
title_full	A top-down approach to classify enzyme functional classes and sub-classes using random forest
title_fullStr	A top-down approach to classify enzyme functional classes and sub-classes using random forest
title_full_unstemmed	A top-down approach to classify enzyme functional classes and sub-classes using random forest
title_short	A top-down approach to classify enzyme functional classes and sub-classes using random forest
title_sort	top-down approach to classify enzyme functional classes and sub-classes using random forest
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3351021/ https://www.ncbi.nlm.nih.gov/pubmed/22376768 http://dx.doi.org/10.1186/1687-4153-2012-1
work_keys_str_mv	AT kumarchetan atopdownapproachtoclassifyenzymefunctionalclassesandsubclassesusingrandomforest AT choudharyalok atopdownapproachtoclassifyenzymefunctionalclassesandsubclassesusingrandomforest AT kumarchetan topdownapproachtoclassifyenzymefunctionalclassesandsubclassesusingrandomforest AT choudharyalok topdownapproachtoclassifyenzymefunctionalclassesandsubclassesusingrandomforest

A top-down approach to classify enzyme functional classes and sub-classes using random forest

Ejemplares similares