Cargando…
Large-scale extraction of gene interactions from full-text literature using DeepDive
Motivation: A complete repository of gene–gene interactions is key for understanding cellular processes, human disease and drug response. These gene–gene interactions include both protein–protein interactions and transcription factor interactions. The majority of known interactions are found in the...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681986/ https://www.ncbi.nlm.nih.gov/pubmed/26338771 http://dx.doi.org/10.1093/bioinformatics/btv476 |
_version_ | 1782405810892046336 |
---|---|
author | Mallory, Emily K. Zhang, Ce Ré, Christopher Altman, Russ B. |
author_facet | Mallory, Emily K. Zhang, Ce Ré, Christopher Altman, Russ B. |
author_sort | Mallory, Emily K. |
collection | PubMed |
description | Motivation: A complete repository of gene–gene interactions is key for understanding cellular processes, human disease and drug response. These gene–gene interactions include both protein–protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene–gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein–protein and transcription factor interactions from over 100 000 full-text PLOS articles. Methods: We built an extractor for gene–gene interactions that identified candidate gene–gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions. Results: Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100 000 full-text articles. Availability and implementation: Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app Contact: russ.altman@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4681986 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-46819862015-12-18 Large-scale extraction of gene interactions from full-text literature using DeepDive Mallory, Emily K. Zhang, Ce Ré, Christopher Altman, Russ B. Bioinformatics Original Papers Motivation: A complete repository of gene–gene interactions is key for understanding cellular processes, human disease and drug response. These gene–gene interactions include both protein–protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene–gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein–protein and transcription factor interactions from over 100 000 full-text PLOS articles. Methods: We built an extractor for gene–gene interactions that identified candidate gene–gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions. Results: Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100 000 full-text articles. Availability and implementation: Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app Contact: russ.altman@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-01-01 2015-09-03 /pmc/articles/PMC4681986/ /pubmed/26338771 http://dx.doi.org/10.1093/bioinformatics/btv476 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Mallory, Emily K. Zhang, Ce Ré, Christopher Altman, Russ B. Large-scale extraction of gene interactions from full-text literature using DeepDive |
title | Large-scale extraction of gene interactions from full-text literature using DeepDive |
title_full | Large-scale extraction of gene interactions from full-text literature using DeepDive |
title_fullStr | Large-scale extraction of gene interactions from full-text literature using DeepDive |
title_full_unstemmed | Large-scale extraction of gene interactions from full-text literature using DeepDive |
title_short | Large-scale extraction of gene interactions from full-text literature using DeepDive |
title_sort | large-scale extraction of gene interactions from full-text literature using deepdive |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681986/ https://www.ncbi.nlm.nih.gov/pubmed/26338771 http://dx.doi.org/10.1093/bioinformatics/btv476 |
work_keys_str_mv | AT malloryemilyk largescaleextractionofgeneinteractionsfromfulltextliteratureusingdeepdive AT zhangce largescaleextractionofgeneinteractionsfromfulltextliteratureusingdeepdive AT rechristopher largescaleextractionofgeneinteractionsfromfulltextliteratureusingdeepdive AT altmanrussb largescaleextractionofgeneinteractionsfromfulltextliteratureusingdeepdive |