Cargando…

A global network of biomedical relationships derived from text

MOTIVATION: The biomedical community’s collective understanding of how chemicals, genes and phenotypes interact is distributed across the text of over 24 million research articles. These interactions offer insights into the mechanisms behind higher order biochemical phenomena, such as drug-drug inte...

Descripción completa

Detalles Bibliográficos
Autores principales: Percha, Bethany, Altman, Russ B
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6061699/
https://www.ncbi.nlm.nih.gov/pubmed/29490008
http://dx.doi.org/10.1093/bioinformatics/bty114
_version_ 1783342275320872960
author Percha, Bethany
Altman, Russ B
author_facet Percha, Bethany
Altman, Russ B
author_sort Percha, Bethany
collection PubMed
description MOTIVATION: The biomedical community’s collective understanding of how chemicals, genes and phenotypes interact is distributed across the text of over 24 million research articles. These interactions offer insights into the mechanisms behind higher order biochemical phenomena, such as drug-drug interactions and variations in drug response across individuals. To assist their curation at scale, we must understand what relationship types are possible and map unstructured natural language descriptions onto these structured classes. We used NCBI’s PubTator annotations to identify instances of chemical, gene and disease names in Medline abstracts and applied the Stanford dependency parser to find connecting dependency paths between pairs of entities in single sentences. We combined a published ensemble biclustering algorithm (EBC) with hierarchical clustering to group the dependency paths into semantically-related categories, which we annotated with labels, or ‘themes’ (‘inhibition’ and ‘activation’, for example). We evaluated our theme assignments against six human-curated databases: DrugBank, Reactome, SIDER, the Therapeutic Target Database, OMIM and PharmGKB. RESULTS: Clustering revealed 10 broad themes for chemical-gene relationships, 7 for chemical-disease, 10 for gene-disease and 9 for gene–gene. In most cases, enriched themes corresponded directly to known database relationships. Our final dataset, represented as a network, contained 37 491 thematically-labeled chemical-gene edges, 2 021 192 chemical-disease edges, 136 206 gene-disease edges and 41 418 gene–gene edges, each representing a single-sentence description of an interaction from somewhere in the literature. AVAILABILITY AND IMPLEMENTATION: The complete network is available on Zenodo (https://zenodo.org/record/1035500). We have also provided the full set of dependency paths connecting biomedical entities in Medline abstracts, with associated sentences, for future use by the biomedical research community. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6061699
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60616992018-08-07 A global network of biomedical relationships derived from text Percha, Bethany Altman, Russ B Bioinformatics Original Papers MOTIVATION: The biomedical community’s collective understanding of how chemicals, genes and phenotypes interact is distributed across the text of over 24 million research articles. These interactions offer insights into the mechanisms behind higher order biochemical phenomena, such as drug-drug interactions and variations in drug response across individuals. To assist their curation at scale, we must understand what relationship types are possible and map unstructured natural language descriptions onto these structured classes. We used NCBI’s PubTator annotations to identify instances of chemical, gene and disease names in Medline abstracts and applied the Stanford dependency parser to find connecting dependency paths between pairs of entities in single sentences. We combined a published ensemble biclustering algorithm (EBC) with hierarchical clustering to group the dependency paths into semantically-related categories, which we annotated with labels, or ‘themes’ (‘inhibition’ and ‘activation’, for example). We evaluated our theme assignments against six human-curated databases: DrugBank, Reactome, SIDER, the Therapeutic Target Database, OMIM and PharmGKB. RESULTS: Clustering revealed 10 broad themes for chemical-gene relationships, 7 for chemical-disease, 10 for gene-disease and 9 for gene–gene. In most cases, enriched themes corresponded directly to known database relationships. Our final dataset, represented as a network, contained 37 491 thematically-labeled chemical-gene edges, 2 021 192 chemical-disease edges, 136 206 gene-disease edges and 41 418 gene–gene edges, each representing a single-sentence description of an interaction from somewhere in the literature. AVAILABILITY AND IMPLEMENTATION: The complete network is available on Zenodo (https://zenodo.org/record/1035500). We have also provided the full set of dependency paths connecting biomedical entities in Medline abstracts, with associated sentences, for future use by the biomedical research community. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-08-01 2018-02-27 /pmc/articles/PMC6061699/ /pubmed/29490008 http://dx.doi.org/10.1093/bioinformatics/bty114 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Percha, Bethany
Altman, Russ B
A global network of biomedical relationships derived from text
title A global network of biomedical relationships derived from text
title_full A global network of biomedical relationships derived from text
title_fullStr A global network of biomedical relationships derived from text
title_full_unstemmed A global network of biomedical relationships derived from text
title_short A global network of biomedical relationships derived from text
title_sort global network of biomedical relationships derived from text
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6061699/
https://www.ncbi.nlm.nih.gov/pubmed/29490008
http://dx.doi.org/10.1093/bioinformatics/bty114
work_keys_str_mv AT perchabethany aglobalnetworkofbiomedicalrelationshipsderivedfromtext
AT altmanrussb aglobalnetworkofbiomedicalrelationshipsderivedfromtext
AT perchabethany globalnetworkofbiomedicalrelationshipsderivedfromtext
AT altmanrussb globalnetworkofbiomedicalrelationshipsderivedfromtext