Cargando…

Text Mining for Protein Docking

The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Badal, Varsha D., Kundrotas, Petras J., Vakser, Ilya A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4674139/ https://www.ncbi.nlm.nih.gov/pubmed/26650466 http://dx.doi.org/10.1371/journal.pcbi.1004630

_version_	1782404866405040128
author	Badal, Varsha D. Kundrotas, Petras J. Vakser, Ilya A.
author_facet	Badal, Varsha D. Kundrotas, Petras J. Vakser, Ilya A.
author_sort	Badal, Varsha D.
collection	PubMed
description	The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate.
format	Online Article Text
id	pubmed-4674139
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-46741392015-12-23 Text Mining for Protein Docking Badal, Varsha D. Kundrotas, Petras J. Vakser, Ilya A. PLoS Comput Biol Research Article The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. Public Library of Science 2015-12-09 /pmc/articles/PMC4674139/ /pubmed/26650466 http://dx.doi.org/10.1371/journal.pcbi.1004630 Text en © 2015 Badal et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Badal, Varsha D. Kundrotas, Petras J. Vakser, Ilya A. Text Mining for Protein Docking
title	Text Mining for Protein Docking
title_full	Text Mining for Protein Docking
title_fullStr	Text Mining for Protein Docking
title_full_unstemmed	Text Mining for Protein Docking
title_short	Text Mining for Protein Docking
title_sort	text mining for protein docking
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4674139/ https://www.ncbi.nlm.nih.gov/pubmed/26650466 http://dx.doi.org/10.1371/journal.pcbi.1004630
work_keys_str_mv	AT badalvarshad textminingforproteindocking AT kundrotaspetrasj textminingforproteindocking AT vakserilyaa textminingforproteindocking

Text Mining for Protein Docking

Ejemplares similares