Cargando…

Learning the Structure of Biomedical Relationships from Unstructured Text

The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of st...

Descripción completa

Detalles Bibliográficos
Autores principales: Percha, Bethany, Altman, Russ B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4517797/
https://www.ncbi.nlm.nih.gov/pubmed/26219079
http://dx.doi.org/10.1371/journal.pcbi.1004216
_version_ 1782383242887823360
author Percha, Bethany
Altman, Russ B.
author_facet Percha, Bethany
Altman, Russ B.
author_sort Percha, Bethany
collection PubMed
description The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining.
format Online
Article
Text
id pubmed-4517797
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-45177972015-07-31 Learning the Structure of Biomedical Relationships from Unstructured Text Percha, Bethany Altman, Russ B. PLoS Comput Biol Research Article The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. Public Library of Science 2015-07-28 /pmc/articles/PMC4517797/ /pubmed/26219079 http://dx.doi.org/10.1371/journal.pcbi.1004216 Text en © 2015 Percha, Altman http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Percha, Bethany
Altman, Russ B.
Learning the Structure of Biomedical Relationships from Unstructured Text
title Learning the Structure of Biomedical Relationships from Unstructured Text
title_full Learning the Structure of Biomedical Relationships from Unstructured Text
title_fullStr Learning the Structure of Biomedical Relationships from Unstructured Text
title_full_unstemmed Learning the Structure of Biomedical Relationships from Unstructured Text
title_short Learning the Structure of Biomedical Relationships from Unstructured Text
title_sort learning the structure of biomedical relationships from unstructured text
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4517797/
https://www.ncbi.nlm.nih.gov/pubmed/26219079
http://dx.doi.org/10.1371/journal.pcbi.1004216
work_keys_str_mv AT perchabethany learningthestructureofbiomedicalrelationshipsfromunstructuredtext
AT altmanrussb learningthestructureofbiomedicalrelationshipsfromunstructuredtext