Cargando…

DextMP: deep dive into text for predicting moonlighting proteins

MOTIVATION: Moonlighting proteins (MPs) are an important class of proteins that perform more than one independent cellular function. MPs are gaining more attention in recent years as they are found to play important roles in various systems including disease developments. MPs also have a significant...

Descripción completa

Detalles Bibliográficos
Autores principales: Khan, Ishita K, Bhuiyan, Mansurul, Kihara, Daisuke
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870774/
https://www.ncbi.nlm.nih.gov/pubmed/28881966
http://dx.doi.org/10.1093/bioinformatics/btx231
_version_ 1783309547515936768
author Khan, Ishita K
Bhuiyan, Mansurul
Kihara, Daisuke
author_facet Khan, Ishita K
Bhuiyan, Mansurul
Kihara, Daisuke
author_sort Khan, Ishita K
collection PubMed
description MOTIVATION: Moonlighting proteins (MPs) are an important class of proteins that perform more than one independent cellular function. MPs are gaining more attention in recent years as they are found to play important roles in various systems including disease developments. MPs also have a significant impact in computational function prediction and annotation in databases. Currently MPs are not labeled as such in biological databases even in cases where multiple distinct functions are known for the proteins. In this work, we propose a novel method named DextMP, which predicts whether a protein is a MP or not based on its textual features extracted from scientific literature and the UniProt database. RESULTS: DextMP extracts three categories of textual information for a protein: titles, abstracts from literature, and function description in UniProt. Three language models were applied and compared: a state-of-the-art deep unsupervised learning algorithm along with two other language models of different types, Term Frequency-Inverse Document Frequency in the bag-of-words and Latent Dirichlet Allocation in the topic modeling category. Cross-validation results on a dataset of known MPs and non-MPs showed that DextMP successfully predicted MPs with over 91% accuracy with significant improvement over existing MP prediction methods. Lastly, we ran DextMP with the best performing language models and text-based feature combinations on three genomes, human, yeast and Xenopus laevis, and found that about 2.5–35% of the proteomes are potential MPs. AVAILABILITY AND IMPLEMENTATION: Code available at http://kiharalab.org/DextMP.
format Online
Article
Text
id pubmed-5870774
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58707742018-03-29 DextMP: deep dive into text for predicting moonlighting proteins Khan, Ishita K Bhuiyan, Mansurul Kihara, Daisuke Bioinformatics Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 MOTIVATION: Moonlighting proteins (MPs) are an important class of proteins that perform more than one independent cellular function. MPs are gaining more attention in recent years as they are found to play important roles in various systems including disease developments. MPs also have a significant impact in computational function prediction and annotation in databases. Currently MPs are not labeled as such in biological databases even in cases where multiple distinct functions are known for the proteins. In this work, we propose a novel method named DextMP, which predicts whether a protein is a MP or not based on its textual features extracted from scientific literature and the UniProt database. RESULTS: DextMP extracts three categories of textual information for a protein: titles, abstracts from literature, and function description in UniProt. Three language models were applied and compared: a state-of-the-art deep unsupervised learning algorithm along with two other language models of different types, Term Frequency-Inverse Document Frequency in the bag-of-words and Latent Dirichlet Allocation in the topic modeling category. Cross-validation results on a dataset of known MPs and non-MPs showed that DextMP successfully predicted MPs with over 91% accuracy with significant improvement over existing MP prediction methods. Lastly, we ran DextMP with the best performing language models and text-based feature combinations on three genomes, human, yeast and Xenopus laevis, and found that about 2.5–35% of the proteomes are potential MPs. AVAILABILITY AND IMPLEMENTATION: Code available at http://kiharalab.org/DextMP. Oxford University Press 2017-07-15 2017-07-12 /pmc/articles/PMC5870774/ /pubmed/28881966 http://dx.doi.org/10.1093/bioinformatics/btx231 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
Khan, Ishita K
Bhuiyan, Mansurul
Kihara, Daisuke
DextMP: deep dive into text for predicting moonlighting proteins
title DextMP: deep dive into text for predicting moonlighting proteins
title_full DextMP: deep dive into text for predicting moonlighting proteins
title_fullStr DextMP: deep dive into text for predicting moonlighting proteins
title_full_unstemmed DextMP: deep dive into text for predicting moonlighting proteins
title_short DextMP: deep dive into text for predicting moonlighting proteins
title_sort dextmp: deep dive into text for predicting moonlighting proteins
topic Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870774/
https://www.ncbi.nlm.nih.gov/pubmed/28881966
http://dx.doi.org/10.1093/bioinformatics/btx231
work_keys_str_mv AT khanishitak dextmpdeepdiveintotextforpredictingmoonlightingproteins
AT bhuiyanmansurul dextmpdeepdiveintotextforpredictingmoonlightingproteins
AT kiharadaisuke dextmpdeepdiveintotextforpredictingmoonlightingproteins