Cargando…

Literature mining of protein-residue associations with graph rules learned through distant supervision

BACKGROUND: We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. Th...

Descripción completa

Detalles Bibliográficos
Autores principales: Ravikumar, KE, Liu, Haibin, Cohn, Judith D, Wall, Michael E, Verspoor, Karin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465209/
https://www.ncbi.nlm.nih.gov/pubmed/23046792
http://dx.doi.org/10.1186/2041-1480-3-S3-S2
_version_ 1782245528158863360
author Ravikumar, KE
Liu, Haibin
Cohn, Judith D
Wall, Michael E
Verspoor, Karin
author_facet Ravikumar, KE
Liu, Haibin
Cohn, Judith D
Wall, Michael E
Verspoor, Karin
author_sort Ravikumar, KE
collection PubMed
description BACKGROUND: We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. RESULTS: The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. CONCLUSIONS: The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature.
format Online
Article
Text
id pubmed-3465209
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34652092012-10-18 Literature mining of protein-residue associations with graph rules learned through distant supervision Ravikumar, KE Liu, Haibin Cohn, Judith D Wall, Michael E Verspoor, Karin J Biomed Semantics Research BACKGROUND: We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from articles, ultimately supporting protein function prediction. Our method made use of linguistic patterns for identifying the amino acid residue mentions in text. Further, we applied an automated graph-based method to learn syntactic patterns corresponding to protein-residue pairs mentioned in the text. We finally present an approach to automated construction of relevant training and test data using the distant supervision model. RESULTS: The performance of the method was assessed by extracting protein-residue relations from a new automatically generated test set of sentences containing high confidence examples found using distant supervision. It achieved a F-measure of 0.84 on automatically created silver corpus and 0.79 on a manually annotated gold data set for this task, outperforming previous methods. CONCLUSIONS: The primary contributions of this work are to (1) demonstrate the effectiveness of distant supervision for automatic creation of training data for protein-residue relation extraction, substantially reducing the effort and time involved in manual annotation of a data set and (2) show that the graph-based relation extraction approach we used generalizes well to the problem of protein-residue association extraction. This work paves the way towards effective extraction of protein functional residues from the literature. BioMed Central 2012-10-05 /pmc/articles/PMC3465209/ /pubmed/23046792 http://dx.doi.org/10.1186/2041-1480-3-S3-S2 Text en Copyright ©2012 Ravikumar etal.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Ravikumar, KE
Liu, Haibin
Cohn, Judith D
Wall, Michael E
Verspoor, Karin
Literature mining of protein-residue associations with graph rules learned through distant supervision
title Literature mining of protein-residue associations with graph rules learned through distant supervision
title_full Literature mining of protein-residue associations with graph rules learned through distant supervision
title_fullStr Literature mining of protein-residue associations with graph rules learned through distant supervision
title_full_unstemmed Literature mining of protein-residue associations with graph rules learned through distant supervision
title_short Literature mining of protein-residue associations with graph rules learned through distant supervision
title_sort literature mining of protein-residue associations with graph rules learned through distant supervision
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465209/
https://www.ncbi.nlm.nih.gov/pubmed/23046792
http://dx.doi.org/10.1186/2041-1480-3-S3-S2
work_keys_str_mv AT ravikumarke literatureminingofproteinresidueassociationswithgraphruleslearnedthroughdistantsupervision
AT liuhaibin literatureminingofproteinresidueassociationswithgraphruleslearnedthroughdistantsupervision
AT cohnjudithd literatureminingofproteinresidueassociationswithgraphruleslearnedthroughdistantsupervision
AT wallmichaele literatureminingofproteinresidueassociationswithgraphruleslearnedthroughdistantsupervision
AT verspoorkarin literatureminingofproteinresidueassociationswithgraphruleslearnedthroughdistantsupervision