Cargando…

Improved mutation tagging with gene identifiers applied to membrane protein stability prediction

BACKGROUND: The automated retrieval and integration of information about protein point mutations in combination with structure, domain and interaction data from literature and databases promises to be a valuable approach to study structure-function relationships in biomedical data sets. RESULTS: We...

Descripción completa

Detalles Bibliográficos
Autores principales: Winnenburg, Rainer, Plake, Conrad, Schroeder, Michael
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2745585/
https://www.ncbi.nlm.nih.gov/pubmed/19758467
http://dx.doi.org/10.1186/1471-2105-10-S8-S3
_version_ 1782171978046636032
author Winnenburg, Rainer
Plake, Conrad
Schroeder, Michael
author_facet Winnenburg, Rainer
Plake, Conrad
Schroeder, Michael
author_sort Winnenburg, Rainer
collection PubMed
description BACKGROUND: The automated retrieval and integration of information about protein point mutations in combination with structure, domain and interaction data from literature and databases promises to be a valuable approach to study structure-function relationships in biomedical data sets. RESULTS: We developed a rule- and regular expression-based protein point mutation retrieval pipeline for PubMed abstracts, which shows an F-measure of 87% for the mutation retrieval task on a benchmark dataset. In order to link mutations to their proteins, we utilize a named entity recognition algorithm for the identification of gene names co-occurring in the abstract, and establish links based on sequence checks. Vice versa, we could show that gene recognition improved from 77% to 91% F-measure when considering mutation information given in the text. To demonstrate practical relevance, we utilize mutation information from text to evaluate a novel solvation energy based model for the prediction of stabilizing regions in membrane proteins. For five G protein-coupled receptors we identified 35 relevant single mutations and associated phenotypes, of which none had been annotated in the UniProt or PDB database. In 71% reported phenotypes were in compliance with the model predictions, supporting a relation between mutations and stability issues in membrane proteins. CONCLUSION: We present a reliable approach for the retrieval of protein mutations from PubMed abstracts for any set of genes or proteins of interest. We further demonstrate how amino acid substitution information from text can be utilized for protein structure stability studies on the basis of a novel energy model.
format Text
id pubmed-2745585
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27455852009-09-18 Improved mutation tagging with gene identifiers applied to membrane protein stability prediction Winnenburg, Rainer Plake, Conrad Schroeder, Michael BMC Bioinformatics Research BACKGROUND: The automated retrieval and integration of information about protein point mutations in combination with structure, domain and interaction data from literature and databases promises to be a valuable approach to study structure-function relationships in biomedical data sets. RESULTS: We developed a rule- and regular expression-based protein point mutation retrieval pipeline for PubMed abstracts, which shows an F-measure of 87% for the mutation retrieval task on a benchmark dataset. In order to link mutations to their proteins, we utilize a named entity recognition algorithm for the identification of gene names co-occurring in the abstract, and establish links based on sequence checks. Vice versa, we could show that gene recognition improved from 77% to 91% F-measure when considering mutation information given in the text. To demonstrate practical relevance, we utilize mutation information from text to evaluate a novel solvation energy based model for the prediction of stabilizing regions in membrane proteins. For five G protein-coupled receptors we identified 35 relevant single mutations and associated phenotypes, of which none had been annotated in the UniProt or PDB database. In 71% reported phenotypes were in compliance with the model predictions, supporting a relation between mutations and stability issues in membrane proteins. CONCLUSION: We present a reliable approach for the retrieval of protein mutations from PubMed abstracts for any set of genes or proteins of interest. We further demonstrate how amino acid substitution information from text can be utilized for protein structure stability studies on the basis of a novel energy model. BioMed Central 2009-08-27 /pmc/articles/PMC2745585/ /pubmed/19758467 http://dx.doi.org/10.1186/1471-2105-10-S8-S3 Text en Copyright © 2009 Winnenburg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Winnenburg, Rainer
Plake, Conrad
Schroeder, Michael
Improved mutation tagging with gene identifiers applied to membrane protein stability prediction
title Improved mutation tagging with gene identifiers applied to membrane protein stability prediction
title_full Improved mutation tagging with gene identifiers applied to membrane protein stability prediction
title_fullStr Improved mutation tagging with gene identifiers applied to membrane protein stability prediction
title_full_unstemmed Improved mutation tagging with gene identifiers applied to membrane protein stability prediction
title_short Improved mutation tagging with gene identifiers applied to membrane protein stability prediction
title_sort improved mutation tagging with gene identifiers applied to membrane protein stability prediction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2745585/
https://www.ncbi.nlm.nih.gov/pubmed/19758467
http://dx.doi.org/10.1186/1471-2105-10-S8-S3
work_keys_str_mv AT winnenburgrainer improvedmutationtaggingwithgeneidentifiersappliedtomembraneproteinstabilityprediction
AT plakeconrad improvedmutationtaggingwithgeneidentifiersappliedtomembraneproteinstabilityprediction
AT schroedermichael improvedmutationtaggingwithgeneidentifiersappliedtomembraneproteinstabilityprediction