Cargando…

Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context

BACKGROUND: Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierar...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yong-Cui, Wang, Yong, Yang, Zhi-Xia, Deng, Nai-Yang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3121122/
https://www.ncbi.nlm.nih.gov/pubmed/21689481
http://dx.doi.org/10.1186/1752-0509-5-S1-S6
_version_ 1782206801006034944
author Wang, Yong-Cui
Wang, Yong
Yang, Zhi-Xia
Deng, Nai-Yang
author_facet Wang, Yong-Cui
Wang, Yong
Yang, Zhi-Xia
Deng, Nai-Yang
author_sort Wang, Yong-Cui
collection PubMed
description BACKGROUND: Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. RESULTS: In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew’s correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. CONCLUSIONS: Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.
format Online
Article
Text
id pubmed-3121122
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31211222011-06-23 Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context Wang, Yong-Cui Wang, Yong Yang, Zhi-Xia Deng, Nai-Yang BMC Syst Biol Report BACKGROUND: Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. RESULTS: In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew’s correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. CONCLUSIONS: Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community. BioMed Central 2011-06-20 /pmc/articles/PMC3121122/ /pubmed/21689481 http://dx.doi.org/10.1186/1752-0509-5-S1-S6 Text en Copyright ©2011 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Report
Wang, Yong-Cui
Wang, Yong
Yang, Zhi-Xia
Deng, Nai-Yang
Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context
title Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context
title_full Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context
title_fullStr Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context
title_full_unstemmed Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context
title_short Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context
title_sort support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context
topic Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3121122/
https://www.ncbi.nlm.nih.gov/pubmed/21689481
http://dx.doi.org/10.1186/1752-0509-5-S1-S6
work_keys_str_mv AT wangyongcui supportvectormachinepredictionofenzymefunctionwithconjointtriadfeatureandhierarchicalcontext
AT wangyong supportvectormachinepredictionofenzymefunctionwithconjointtriadfeatureandhierarchicalcontext
AT yangzhixia supportvectormachinepredictionofenzymefunctionwithconjointtriadfeatureandhierarchicalcontext
AT dengnaiyang supportvectormachinepredictionofenzymefunctionwithconjointtriadfeatureandhierarchicalcontext