Cargando…

Predicting protein function by machine learning on amino acid sequences – a critical evaluation

BACKGROUND: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are abl...

Descripción completa

Detalles Bibliográficos
Autores principales: Al-Shahib, Ali, Breitling, Rainer, Gilbert, David R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1847686/
https://www.ncbi.nlm.nih.gov/pubmed/17374164
http://dx.doi.org/10.1186/1471-2164-8-78
_version_ 1782132910843756544
author Al-Shahib, Ali
Breitling, Rainer
Gilbert, David R
author_facet Al-Shahib, Ali
Breitling, Rainer
Gilbert, David R
author_sort Al-Shahib, Ali
collection PubMed
description BACKGROUND: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. RESULTS: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. CONCLUSION: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function.
format Text
id pubmed-1847686
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18476862007-04-05 Predicting protein function by machine learning on amino acid sequences – a critical evaluation Al-Shahib, Ali Breitling, Rainer Gilbert, David R BMC Genomics Research Article BACKGROUND: Predicting the function of newly discovered proteins by simply inspecting their amino acid sequence is one of the major challenges of post-genomic computational biology, especially when done without recourse to experimentation or homology information. Machine learning classifiers are able to discriminate between proteins belonging to different functional classes. Until now, however, it has been unclear if this ability would be transferable to proteins of unknown function, which may show distinct biases compared to experimentally more tractable proteins. RESULTS: Here we show that proteins with known and unknown function do indeed differ significantly. We then show that proteins from different bacterial species also differ to an even larger and very surprising extent, but that functional classifiers nonetheless generalize successfully across species boundaries. We also show that in the case of highly specialized proteomes classifiers from a different, but more conventional, species may in fact outperform the endogenous species-specific classifier. CONCLUSION: We conclude that there is very good prospect of successfully predicting the function of yet uncharacterized proteins using machine learning classifiers trained on proteins of known function. BioMed Central 2007-03-20 /pmc/articles/PMC1847686/ /pubmed/17374164 http://dx.doi.org/10.1186/1471-2164-8-78 Text en Copyright © 2007 Al-Shahib et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Al-Shahib, Ali
Breitling, Rainer
Gilbert, David R
Predicting protein function by machine learning on amino acid sequences – a critical evaluation
title Predicting protein function by machine learning on amino acid sequences – a critical evaluation
title_full Predicting protein function by machine learning on amino acid sequences – a critical evaluation
title_fullStr Predicting protein function by machine learning on amino acid sequences – a critical evaluation
title_full_unstemmed Predicting protein function by machine learning on amino acid sequences – a critical evaluation
title_short Predicting protein function by machine learning on amino acid sequences – a critical evaluation
title_sort predicting protein function by machine learning on amino acid sequences – a critical evaluation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1847686/
https://www.ncbi.nlm.nih.gov/pubmed/17374164
http://dx.doi.org/10.1186/1471-2164-8-78
work_keys_str_mv AT alshahibali predictingproteinfunctionbymachinelearningonaminoacidsequencesacriticalevaluation
AT breitlingrainer predictingproteinfunctionbymachinelearningonaminoacidsequencesacriticalevaluation
AT gilbertdavidr predictingproteinfunctionbymachinelearningonaminoacidsequencesacriticalevaluation