Cargando…
Incorporating functional inter-relationships into protein function prediction algorithms
BACKGROUND: Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein function prediction. While successful function prediction algor...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2693438/ https://www.ncbi.nlm.nih.gov/pubmed/19435516 http://dx.doi.org/10.1186/1471-2105-10-142 |
_version_ | 1782167957362704384 |
---|---|
author | Pandey, Gaurav Myers, Chad L Kumar, Vipin |
author_facet | Pandey, Gaurav Myers, Chad L Kumar, Vipin |
author_sort | Pandey, Gaurav |
collection | PubMed |
description | BACKGROUND: Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein function prediction. While successful function prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches. RESULTS: We propose a method to enhance the performance of classification-based protein function prediction algorithms by addressing the issue of using these interrelationships between functional classes constituting functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the k-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate predictions for a large number of the functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of functional inter-relationships enables the discovery of interesting biology in the form of novel functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1. CONCLUSION: We implemented and evaluated a methodology for incorporating interrelationships between functional classes into a standard classification-based protein function prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at . |
format | Text |
id | pubmed-2693438 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26934382009-06-08 Incorporating functional inter-relationships into protein function prediction algorithms Pandey, Gaurav Myers, Chad L Kumar, Vipin BMC Bioinformatics Methodology Article BACKGROUND: Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein function prediction. While successful function prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches. RESULTS: We propose a method to enhance the performance of classification-based protein function prediction algorithms by addressing the issue of using these interrelationships between functional classes constituting functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the k-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate predictions for a large number of the functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of functional inter-relationships enables the discovery of interesting biology in the form of novel functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1. CONCLUSION: We implemented and evaluated a methodology for incorporating interrelationships between functional classes into a standard classification-based protein function prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at . BioMed Central 2009-05-12 /pmc/articles/PMC2693438/ /pubmed/19435516 http://dx.doi.org/10.1186/1471-2105-10-142 Text en Copyright © 2009 Pandey et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Pandey, Gaurav Myers, Chad L Kumar, Vipin Incorporating functional inter-relationships into protein function prediction algorithms |
title | Incorporating functional inter-relationships into protein function prediction algorithms |
title_full | Incorporating functional inter-relationships into protein function prediction algorithms |
title_fullStr | Incorporating functional inter-relationships into protein function prediction algorithms |
title_full_unstemmed | Incorporating functional inter-relationships into protein function prediction algorithms |
title_short | Incorporating functional inter-relationships into protein function prediction algorithms |
title_sort | incorporating functional inter-relationships into protein function prediction algorithms |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2693438/ https://www.ncbi.nlm.nih.gov/pubmed/19435516 http://dx.doi.org/10.1186/1471-2105-10-142 |
work_keys_str_mv | AT pandeygaurav incorporatingfunctionalinterrelationshipsintoproteinfunctionpredictionalgorithms AT myerschadl incorporatingfunctionalinterrelationshipsintoproteinfunctionpredictionalgorithms AT kumarvipin incorporatingfunctionalinterrelationshipsintoproteinfunctionpredictionalgorithms |