Cargando…

Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks

BACKGROUND: Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves...

Descripción completa

Detalles Bibliográficos
Autores principales: Daraselia, Nikolai, Yuryev, Anton, Egorov, Sergei, Mazo, Ilya, Ispolatov, Iaroslav
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1940026/
https://www.ncbi.nlm.nih.gov/pubmed/17620146
http://dx.doi.org/10.1186/1471-2105-8-243
_version_ 1782134431216041984
author Daraselia, Nikolai
Yuryev, Anton
Egorov, Sergei
Mazo, Ilya
Ispolatov, Iaroslav
author_facet Daraselia, Nikolai
Yuryev, Anton
Egorov, Sergei
Mazo, Ilya
Ispolatov, Iaroslav
author_sort Daraselia, Nikolai
collection PubMed
description BACKGROUND: Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. RESULTS: We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller. CONCLUSION: Protein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity.
format Text
id pubmed-1940026
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19400262007-08-07 Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks Daraselia, Nikolai Yuryev, Anton Egorov, Sergei Mazo, Ilya Ispolatov, Iaroslav BMC Bioinformatics Research Article BACKGROUND: Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets. RESULTS: We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller. CONCLUSION: Protein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity. BioMed Central 2007-07-10 /pmc/articles/PMC1940026/ /pubmed/17620146 http://dx.doi.org/10.1186/1471-2105-8-243 Text en Copyright © 2007 Daraselia et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Daraselia, Nikolai
Yuryev, Anton
Egorov, Sergei
Mazo, Ilya
Ispolatov, Iaroslav
Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
title Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
title_full Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
title_fullStr Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
title_full_unstemmed Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
title_short Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
title_sort automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1940026/
https://www.ncbi.nlm.nih.gov/pubmed/17620146
http://dx.doi.org/10.1186/1471-2105-8-243
work_keys_str_mv AT daraselianikolai automaticextractionofgeneontologyannotationanditscorrelationwithclustersinproteinnetworks
AT yuryevanton automaticextractionofgeneontologyannotationanditscorrelationwithclustersinproteinnetworks
AT egorovsergei automaticextractionofgeneontologyannotationanditscorrelationwithclustersinproteinnetworks
AT mazoilya automaticextractionofgeneontologyannotationanditscorrelationwithclustersinproteinnetworks
AT ispolatoviaroslav automaticextractionofgeneontologyannotationanditscorrelationwithclustersinproteinnetworks