Cargando…

Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association

Protein point mutations are an essential component of the evolutionary and experimental analysis of protein structure and function. While many manually curated databases attempt to index point mutations, most experimentally generated point mutations and the biological impacts of the changes are desc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, Lawrence C, Horn, Florence, Cohen, Fred E
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2007
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1794323/ https://www.ncbi.nlm.nih.gov/pubmed/17274683 http://dx.doi.org/10.1371/journal.pcbi.0030016

_version_	1782132165117476864
author	Lee, Lawrence C Horn, Florence Cohen, Fred E
author_facet	Lee, Lawrence C Horn, Florence Cohen, Fred E
author_sort	Lee, Lawrence C
collection	PubMed
description	Protein point mutations are an essential component of the evolutionary and experimental analysis of protein structure and function. While many manually curated databases attempt to index point mutations, most experimentally generated point mutations and the biological impacts of the changes are described in the peer-reviewed published literature. We describe an application, Mutation GraB (Graph Bigram), that identifies, extracts, and verifies point mutations from biomedical literature. The principal problem of point mutation extraction is to link the point mutation with its associated protein and organism of origin. Our algorithm uses a graph-based bigram traversal to identify these relevant associations and exploits the Swiss-Prot protein database to verify this information. The graph bigram method is different from other models for point mutation extraction in that it incorporates frequency and positional data of all terms in an article to drive the point mutation–protein association. Our method was tested on 589 articles describing point mutations from the G protein–coupled receptor (GPCR), tyrosine kinase, and ion channel protein families. We evaluated our graph bigram metric against a word-proximity metric for term association on datasets of full-text literature in these three different protein families. Our testing shows that the graph bigram metric achieves a higher F-measure for the GPCRs (0.79 versus 0.76), protein tyrosine kinases (0.72 versus 0.69), and ion channel transporters (0.76 versus 0.74). Importantly, in situations where more than one protein can be assigned to a point mutation and disambiguation is required, the graph bigram metric achieves a precision of 0.84 compared with the word distance metric precision of 0.73. We believe the graph bigram search metric to be a significant improvement over previous search metrics for point mutation extraction and to be applicable to text-mining application requiring the association of words.
format	Text
id	pubmed-1794323
institution	National Center for Biotechnology Information
language	English
publishDate	2007
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-17943232007-02-07 Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association Lee, Lawrence C Horn, Florence Cohen, Fred E PLoS Comput Biol Research Article Protein point mutations are an essential component of the evolutionary and experimental analysis of protein structure and function. While many manually curated databases attempt to index point mutations, most experimentally generated point mutations and the biological impacts of the changes are described in the peer-reviewed published literature. We describe an application, Mutation GraB (Graph Bigram), that identifies, extracts, and verifies point mutations from biomedical literature. The principal problem of point mutation extraction is to link the point mutation with its associated protein and organism of origin. Our algorithm uses a graph-based bigram traversal to identify these relevant associations and exploits the Swiss-Prot protein database to verify this information. The graph bigram method is different from other models for point mutation extraction in that it incorporates frequency and positional data of all terms in an article to drive the point mutation–protein association. Our method was tested on 589 articles describing point mutations from the G protein–coupled receptor (GPCR), tyrosine kinase, and ion channel protein families. We evaluated our graph bigram metric against a word-proximity metric for term association on datasets of full-text literature in these three different protein families. Our testing shows that the graph bigram metric achieves a higher F-measure for the GPCRs (0.79 versus 0.76), protein tyrosine kinases (0.72 versus 0.69), and ion channel transporters (0.76 versus 0.74). Importantly, in situations where more than one protein can be assigned to a point mutation and disambiguation is required, the graph bigram metric achieves a precision of 0.84 compared with the word distance metric precision of 0.73. We believe the graph bigram search metric to be a significant improvement over previous search metrics for point mutation extraction and to be applicable to text-mining application requiring the association of words. Public Library of Science 2007-02 2007-02-02 /pmc/articles/PMC1794323/ /pubmed/17274683 http://dx.doi.org/10.1371/journal.pcbi.0030016 Text en © 2007 Lee et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Lee, Lawrence C Horn, Florence Cohen, Fred E Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association
title	Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association
title_full	Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association
title_fullStr	Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association
title_full_unstemmed	Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association
title_short	Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association
title_sort	automatic extraction of protein point mutations using a graph bigram association
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1794323/ https://www.ncbi.nlm.nih.gov/pubmed/17274683 http://dx.doi.org/10.1371/journal.pcbi.0030016
work_keys_str_mv	AT leelawrencec automaticextractionofproteinpointmutationsusingagraphbigramassociation AT hornflorence automaticextractionofproteinpointmutationsusingagraphbigramassociation AT cohenfrede automaticextractionofproteinpointmutationsusingagraphbigramassociation

Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association

Ejemplares similares