Cargando…

Large-scale extraction of gene interactions from full-text literature using DeepDive

Motivation: A complete repository of gene–gene interactions is key for understanding cellular processes, human disease and drug response. These gene–gene interactions include both protein–protein interactions and transcription factor interactions. The majority of known interactions are found in the...

Descripción completa

Detalles Bibliográficos
Autores principales: Mallory, Emily K., Zhang, Ce, Ré, Christopher, Altman, Russ B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681986/
https://www.ncbi.nlm.nih.gov/pubmed/26338771
http://dx.doi.org/10.1093/bioinformatics/btv476
_version_ 1782405810892046336
author Mallory, Emily K.
Zhang, Ce
Ré, Christopher
Altman, Russ B.
author_facet Mallory, Emily K.
Zhang, Ce
Ré, Christopher
Altman, Russ B.
author_sort Mallory, Emily K.
collection PubMed
description Motivation: A complete repository of gene–gene interactions is key for understanding cellular processes, human disease and drug response. These gene–gene interactions include both protein–protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene–gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein–protein and transcription factor interactions from over 100 000 full-text PLOS articles. Methods: We built an extractor for gene–gene interactions that identified candidate gene–gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions. Results: Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100 000 full-text articles. Availability and implementation: Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app Contact: russ.altman@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4681986
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46819862015-12-18 Large-scale extraction of gene interactions from full-text literature using DeepDive Mallory, Emily K. Zhang, Ce Ré, Christopher Altman, Russ B. Bioinformatics Original Papers Motivation: A complete repository of gene–gene interactions is key for understanding cellular processes, human disease and drug response. These gene–gene interactions include both protein–protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene–gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein–protein and transcription factor interactions from over 100 000 full-text PLOS articles. Methods: We built an extractor for gene–gene interactions that identified candidate gene–gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions. Results: Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100 000 full-text articles. Availability and implementation: Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app Contact: russ.altman@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-01-01 2015-09-03 /pmc/articles/PMC4681986/ /pubmed/26338771 http://dx.doi.org/10.1093/bioinformatics/btv476 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Mallory, Emily K.
Zhang, Ce
Ré, Christopher
Altman, Russ B.
Large-scale extraction of gene interactions from full-text literature using DeepDive
title Large-scale extraction of gene interactions from full-text literature using DeepDive
title_full Large-scale extraction of gene interactions from full-text literature using DeepDive
title_fullStr Large-scale extraction of gene interactions from full-text literature using DeepDive
title_full_unstemmed Large-scale extraction of gene interactions from full-text literature using DeepDive
title_short Large-scale extraction of gene interactions from full-text literature using DeepDive
title_sort large-scale extraction of gene interactions from full-text literature using deepdive
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681986/
https://www.ncbi.nlm.nih.gov/pubmed/26338771
http://dx.doi.org/10.1093/bioinformatics/btv476
work_keys_str_mv AT malloryemilyk largescaleextractionofgeneinteractionsfromfulltextliteratureusingdeepdive
AT zhangce largescaleextractionofgeneinteractionsfromfulltextliteratureusingdeepdive
AT rechristopher largescaleextractionofgeneinteractionsfromfulltextliteratureusingdeepdive
AT altmanrussb largescaleextractionofgeneinteractionsfromfulltextliteratureusingdeepdive