Cargando…
Enzyme classification with peptide programs: a comparative study
BACKGROUND: Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning method...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2724424/ https://www.ncbi.nlm.nih.gov/pubmed/19630945 http://dx.doi.org/10.1186/1471-2105-10-231 |
_version_ | 1782170414605139968 |
---|---|
author | Faria, Daniel Ferreira, António EN Falcão, André O |
author_facet | Faria, Daniel Ferreira, António EN Falcão, André O |
author_sort | Faria, Daniel |
collection | PubMed |
description | BACKGROUND: Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning methods have been applied to predict protein function, but they lose much of the information encoded by protein sequences because they need to transform them to obtain data of fixed length. RESULTS: We have developed a machine learning methodology, called peptide programs (PPs), to deal directly with protein sequences and compared its performance with that of Support Vector Machines (SVMs) and BLAST in detailed enzyme classification tasks. Overall, the PPs and SVMs had a similar performance in terms of Matthews Correlation Coefficient, but the PPs had generally a higher precision. BLAST performed globally better than both methodologies, but the PPs had better results than BLAST and SVMs for the smaller datasets. CONCLUSION: The higher precision of the PPs in comparison to the SVMs suggests that dealing with sequences is advantageous for detailed protein classification, as precision is essential to avoid annotation errors. The fact that the PPs performed better than BLAST for the smaller datasets demonstrates the potential of the methodology, but the drop in performance observed for the larger datasets indicates that further development is required. Possible strategies to address this issue include partitioning the datasets into smaller subsets and training individual PPs for each subset, or training several PPs for each dataset and combining them using a bagging strategy. |
format | Text |
id | pubmed-2724424 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27244242009-08-11 Enzyme classification with peptide programs: a comparative study Faria, Daniel Ferreira, António EN Falcão, André O BMC Bioinformatics Research Article BACKGROUND: Efficient and accurate prediction of protein function from sequence is one of the standing problems in Biology. The generalised use of sequence alignments for inferring function promotes the propagation of errors, and there are limits to its applicability. Several machine learning methods have been applied to predict protein function, but they lose much of the information encoded by protein sequences because they need to transform them to obtain data of fixed length. RESULTS: We have developed a machine learning methodology, called peptide programs (PPs), to deal directly with protein sequences and compared its performance with that of Support Vector Machines (SVMs) and BLAST in detailed enzyme classification tasks. Overall, the PPs and SVMs had a similar performance in terms of Matthews Correlation Coefficient, but the PPs had generally a higher precision. BLAST performed globally better than both methodologies, but the PPs had better results than BLAST and SVMs for the smaller datasets. CONCLUSION: The higher precision of the PPs in comparison to the SVMs suggests that dealing with sequences is advantageous for detailed protein classification, as precision is essential to avoid annotation errors. The fact that the PPs performed better than BLAST for the smaller datasets demonstrates the potential of the methodology, but the drop in performance observed for the larger datasets indicates that further development is required. Possible strategies to address this issue include partitioning the datasets into smaller subsets and training individual PPs for each subset, or training several PPs for each dataset and combining them using a bagging strategy. BioMed Central 2009-07-24 /pmc/articles/PMC2724424/ /pubmed/19630945 http://dx.doi.org/10.1186/1471-2105-10-231 Text en Copyright © 2009 Faria et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Faria, Daniel Ferreira, António EN Falcão, André O Enzyme classification with peptide programs: a comparative study |
title | Enzyme classification with peptide programs: a comparative study |
title_full | Enzyme classification with peptide programs: a comparative study |
title_fullStr | Enzyme classification with peptide programs: a comparative study |
title_full_unstemmed | Enzyme classification with peptide programs: a comparative study |
title_short | Enzyme classification with peptide programs: a comparative study |
title_sort | enzyme classification with peptide programs: a comparative study |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2724424/ https://www.ncbi.nlm.nih.gov/pubmed/19630945 http://dx.doi.org/10.1186/1471-2105-10-231 |
work_keys_str_mv | AT fariadaniel enzymeclassificationwithpeptideprogramsacomparativestudy AT ferreiraantonioen enzymeclassificationwithpeptideprogramsacomparativestudy AT falcaoandreo enzymeclassificationwithpeptideprogramsacomparativestudy |