Cargando…

Predicting protein functions by applying predicate logic to biomedical literature

BACKGROUND: A large number of computational methods have been proposed for predicting protein functions. The underlying techniques adopted by most of these methods revolve around predicting the functions of an unannotated protein p from already annotated proteins that have similar characteristics as...

Descripción completa

Detalles Bibliográficos
Autores principales: Taha, Kamal, Iraqi, Youssef, Al Aamri, Amira
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6368809/
https://www.ncbi.nlm.nih.gov/pubmed/30736739
http://dx.doi.org/10.1186/s12859-019-2594-y
_version_ 1783394069154627584
author Taha, Kamal
Iraqi, Youssef
Al Aamri, Amira
author_facet Taha, Kamal
Iraqi, Youssef
Al Aamri, Amira
author_sort Taha, Kamal
collection PubMed
description BACKGROUND: A large number of computational methods have been proposed for predicting protein functions. The underlying techniques adopted by most of these methods revolve around predicting the functions of an unannotated protein p from already annotated proteins that have similar characteristics as p. Recent Information Extraction methods take advantage of the huge growth of biomedical literature to predict protein functions. They extract biological molecule terms that directly describe protein functions from biomedical texts. However, they consider only explicitly mentioned terms that co-occur with proteins in texts. We observe that some important biological molecule terms pertaining functional categories may implicitly co-occur with proteins in texts. Therefore, the methods that rely solely on explicitly mentioned terms in texts may miss vital functional information implicitly mentioned in the texts. RESULTS: To overcome the limitations of methods that rely solely on explicitly mentioned terms in texts to predict protein functions, we propose in this paper an Information Extraction system called PL-PPF. The proposed system employs techniques for predicting the functions of proteins based on their co-occurrences with explicitly and implicitly mentioned biological molecule terms that pertain functional categories in biomedical literature. That is, PL-PPF employs a combination of statistical-based explicit term extraction techniques and logic-based implicit term extraction techniques. The statistical component of PL-PPF predicts some of the functions of a protein by extracting the explicitly mentioned functional terms that directly describe the functions of the protein from the biomedical texts associated with the protein. The logic-based component of PL-PPF predicts additional functions of the protein by inferring the functional terms that co-occur implicitly with the protein in the biomedical texts associated with it. First, the system employs its statistical-based component to extract the explicitly mentioned functional terms. Then, it employs its logic-based component to infer additional functions of the protein. Our hypothesis is that important biological molecule terms pertaining functional categories of proteins are likely to co-occur implicitly with the proteins in biomedical texts. We evaluated PL-PPF experimentally and compared it with five systems. Results revealed better prediction performance. CONCLUSIONS: The experimental results showed that PL-PPF outperformed the other five systems. This is an indication of the effectiveness and practical viability of PL-PPF’s combination of explicit and implicit techniques. We also evaluated two versions of PL-PPF: one adopting the complete techniques (i.e., adopting both the implicit and explicit techniques) and the other adopting only the explicit terms co-occurrence extraction techniques (i.e., without the inference rules for predicate logic). The experimental results showed that the complete version outperformed significantly the other version. This is attributed to the effectiveness of the rules of predicate logic to infer functional terms that co-occur implicitly with proteins in biomedical texts. A demo application of PL-PPF can be accessed through the following link: http://ecesrvr.kustar.ac.ae:8080/plppf/ ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2594-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6368809
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63688092019-02-15 Predicting protein functions by applying predicate logic to biomedical literature Taha, Kamal Iraqi, Youssef Al Aamri, Amira BMC Bioinformatics Research Article BACKGROUND: A large number of computational methods have been proposed for predicting protein functions. The underlying techniques adopted by most of these methods revolve around predicting the functions of an unannotated protein p from already annotated proteins that have similar characteristics as p. Recent Information Extraction methods take advantage of the huge growth of biomedical literature to predict protein functions. They extract biological molecule terms that directly describe protein functions from biomedical texts. However, they consider only explicitly mentioned terms that co-occur with proteins in texts. We observe that some important biological molecule terms pertaining functional categories may implicitly co-occur with proteins in texts. Therefore, the methods that rely solely on explicitly mentioned terms in texts may miss vital functional information implicitly mentioned in the texts. RESULTS: To overcome the limitations of methods that rely solely on explicitly mentioned terms in texts to predict protein functions, we propose in this paper an Information Extraction system called PL-PPF. The proposed system employs techniques for predicting the functions of proteins based on their co-occurrences with explicitly and implicitly mentioned biological molecule terms that pertain functional categories in biomedical literature. That is, PL-PPF employs a combination of statistical-based explicit term extraction techniques and logic-based implicit term extraction techniques. The statistical component of PL-PPF predicts some of the functions of a protein by extracting the explicitly mentioned functional terms that directly describe the functions of the protein from the biomedical texts associated with the protein. The logic-based component of PL-PPF predicts additional functions of the protein by inferring the functional terms that co-occur implicitly with the protein in the biomedical texts associated with it. First, the system employs its statistical-based component to extract the explicitly mentioned functional terms. Then, it employs its logic-based component to infer additional functions of the protein. Our hypothesis is that important biological molecule terms pertaining functional categories of proteins are likely to co-occur implicitly with the proteins in biomedical texts. We evaluated PL-PPF experimentally and compared it with five systems. Results revealed better prediction performance. CONCLUSIONS: The experimental results showed that PL-PPF outperformed the other five systems. This is an indication of the effectiveness and practical viability of PL-PPF’s combination of explicit and implicit techniques. We also evaluated two versions of PL-PPF: one adopting the complete techniques (i.e., adopting both the implicit and explicit techniques) and the other adopting only the explicit terms co-occurrence extraction techniques (i.e., without the inference rules for predicate logic). The experimental results showed that the complete version outperformed significantly the other version. This is attributed to the effectiveness of the rules of predicate logic to infer functional terms that co-occur implicitly with proteins in biomedical texts. A demo application of PL-PPF can be accessed through the following link: http://ecesrvr.kustar.ac.ae:8080/plppf/ ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2594-y) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-08 /pmc/articles/PMC6368809/ /pubmed/30736739 http://dx.doi.org/10.1186/s12859-019-2594-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Taha, Kamal
Iraqi, Youssef
Al Aamri, Amira
Predicting protein functions by applying predicate logic to biomedical literature
title Predicting protein functions by applying predicate logic to biomedical literature
title_full Predicting protein functions by applying predicate logic to biomedical literature
title_fullStr Predicting protein functions by applying predicate logic to biomedical literature
title_full_unstemmed Predicting protein functions by applying predicate logic to biomedical literature
title_short Predicting protein functions by applying predicate logic to biomedical literature
title_sort predicting protein functions by applying predicate logic to biomedical literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6368809/
https://www.ncbi.nlm.nih.gov/pubmed/30736739
http://dx.doi.org/10.1186/s12859-019-2594-y
work_keys_str_mv AT tahakamal predictingproteinfunctionsbyapplyingpredicatelogictobiomedicalliterature
AT iraqiyoussef predictingproteinfunctionsbyapplyingpredicatelogictobiomedicalliterature
AT alaamriamira predictingproteinfunctionsbyapplyingpredicatelogictobiomedicalliterature