Cargando…

Predicting the functions of a protein from its ability to associate with other molecules

BACKGROUND: All proteins associate with other molecules. These associated molecules are highly predictive of the potential functions of proteins. The association of a protein and a molecule can be determined from their co-occurrences in biomedical abstracts. Extensive semantically related co-occurre...

Descripción completa

Detalles Bibliográficos
Autores principales:	Taha, Kamal, Yoo, Paul D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4714473/ https://www.ncbi.nlm.nih.gov/pubmed/26767846 http://dx.doi.org/10.1186/s12859-016-0882-3

_version_	1782410328055742464
author	Taha, Kamal Yoo, Paul D.
author_facet	Taha, Kamal Yoo, Paul D.
author_sort	Taha, Kamal
collection	PubMed
description	BACKGROUND: All proteins associate with other molecules. These associated molecules are highly predictive of the potential functions of proteins. The association of a protein and a molecule can be determined from their co-occurrences in biomedical abstracts. Extensive semantically related co-occurrences of a protein’s name and a molecule’s name in the sentences of biomedical abstracts can be considered as indicative of the association between the protein and the molecule. Dependency parsers extract textual relations from a text by determining the grammatical relations between words in a sentence. They can be used for determining the textual relations between proteins and molecules. Despite their success, they may extract textual relations with low precision. This is because they do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). Moreover, they may not be well suited for complex sentences and for long-distance textual relations. RESULTS: We introduce an information extraction system called PPFBM that predicts the functions of unannotated proteins from the molecules that associate with these proteins. PPFBM represents each protein by the other molecules that associate with it in the abstracts referenced in the protein’s entries in reliable biological databases. It automatically extracts each co-occurrence of a protein-molecule pair that represents semantic relationship between the pair. Towards this, we present novel semantic rules that identify the semantic relationship between each co-occurrence of a protein-molecule pair using the syntactic structures of sentences and linguistics theories. PPFBM determines the functions of an un-annotated protein p as follows. First, it determines the set S(r) of annotated proteins that is semantically similar to p by matching the molecules representing p and the annotated proteins. Then, it assigns p the functional category FC if the significance of the frequency of occurrences of S(r) in abstracts associated with proteins annotated with FC is statistically significantly different than the significance of the frequency of occurrences of S(r) in abstracts associated with proteins annotated with all other functional categories. We evaluated the quality of PPFBM by comparing it experimentally with two other systems. Results showed marked improvement. CONCLUSIONS: The experimental results demonstrated that PPFBM outperforms other systems that predict protein function from the textual information found within biomedical abstracts. This is because these system do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). PPFBM’s performance over these system increases steadily as the number of training protein increases. That is, PPFBM’s prediction performance becomes more accurate constantly, as the size of training proteins gets larger. This is because every time a new set of test proteins is added to the current set of training proteins. A demo of PPFBM that annotates each input Yeast protein (SGD (Saccharomyces Genome Database). Available at: http://www.yeastgenome.org/download-data/curation) with the functions of Gene Ontology terms is available at: (see Appendix for more details about the demo)http://ecesrvr.kustar.ac.ae:8080/PPFBM/.
format	Online Article Text
id	pubmed-4714473
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-47144732016-01-16 Predicting the functions of a protein from its ability to associate with other molecules Taha, Kamal Yoo, Paul D. BMC Bioinformatics Research Article BACKGROUND: All proteins associate with other molecules. These associated molecules are highly predictive of the potential functions of proteins. The association of a protein and a molecule can be determined from their co-occurrences in biomedical abstracts. Extensive semantically related co-occurrences of a protein’s name and a molecule’s name in the sentences of biomedical abstracts can be considered as indicative of the association between the protein and the molecule. Dependency parsers extract textual relations from a text by determining the grammatical relations between words in a sentence. They can be used for determining the textual relations between proteins and molecules. Despite their success, they may extract textual relations with low precision. This is because they do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). Moreover, they may not be well suited for complex sentences and for long-distance textual relations. RESULTS: We introduce an information extraction system called PPFBM that predicts the functions of unannotated proteins from the molecules that associate with these proteins. PPFBM represents each protein by the other molecules that associate with it in the abstracts referenced in the protein’s entries in reliable biological databases. It automatically extracts each co-occurrence of a protein-molecule pair that represents semantic relationship between the pair. Towards this, we present novel semantic rules that identify the semantic relationship between each co-occurrence of a protein-molecule pair using the syntactic structures of sentences and linguistics theories. PPFBM determines the functions of an un-annotated protein p as follows. First, it determines the set S(r) of annotated proteins that is semantically similar to p by matching the molecules representing p and the annotated proteins. Then, it assigns p the functional category FC if the significance of the frequency of occurrences of S(r) in abstracts associated with proteins annotated with FC is statistically significantly different than the significance of the frequency of occurrences of S(r) in abstracts associated with proteins annotated with all other functional categories. We evaluated the quality of PPFBM by comparing it experimentally with two other systems. Results showed marked improvement. CONCLUSIONS: The experimental results demonstrated that PPFBM outperforms other systems that predict protein function from the textual information found within biomedical abstracts. This is because these system do not consider the semantic relationships between terms in a sentence (i.e., they consider only the structural relationships between the terms). PPFBM’s performance over these system increases steadily as the number of training protein increases. That is, PPFBM’s prediction performance becomes more accurate constantly, as the size of training proteins gets larger. This is because every time a new set of test proteins is added to the current set of training proteins. A demo of PPFBM that annotates each input Yeast protein (SGD (Saccharomyces Genome Database). Available at: http://www.yeastgenome.org/download-data/curation) with the functions of Gene Ontology terms is available at: (see Appendix for more details about the demo)http://ecesrvr.kustar.ac.ae:8080/PPFBM/. BioMed Central 2016-01-15 /pmc/articles/PMC4714473/ /pubmed/26767846 http://dx.doi.org/10.1186/s12859-016-0882-3 Text en © Taha and Yoo. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Taha, Kamal Yoo, Paul D. Predicting the functions of a protein from its ability to associate with other molecules
title	Predicting the functions of a protein from its ability to associate with other molecules
title_full	Predicting the functions of a protein from its ability to associate with other molecules
title_fullStr	Predicting the functions of a protein from its ability to associate with other molecules
title_full_unstemmed	Predicting the functions of a protein from its ability to associate with other molecules
title_short	Predicting the functions of a protein from its ability to associate with other molecules
title_sort	predicting the functions of a protein from its ability to associate with other molecules
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4714473/ https://www.ncbi.nlm.nih.gov/pubmed/26767846 http://dx.doi.org/10.1186/s12859-016-0882-3
work_keys_str_mv	AT tahakamal predictingthefunctionsofaproteinfromitsabilitytoassociatewithothermolecules AT yoopauld predictingthefunctionsofaproteinfromitsabilitytoassociatewithothermolecules

Predicting the functions of a protein from its ability to associate with other molecules

Ejemplares similares