Cargando…
Learning the Structure of Biomedical Relationships from Unstructured Text
The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of st...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4517797/ https://www.ncbi.nlm.nih.gov/pubmed/26219079 http://dx.doi.org/10.1371/journal.pcbi.1004216 |
_version_ | 1782383242887823360 |
---|---|
author | Percha, Bethany Altman, Russ B. |
author_facet | Percha, Bethany Altman, Russ B. |
author_sort | Percha, Bethany |
collection | PubMed |
description | The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. |
format | Online Article Text |
id | pubmed-4517797 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-45177972015-07-31 Learning the Structure of Biomedical Relationships from Unstructured Text Percha, Bethany Altman, Russ B. PLoS Comput Biol Research Article The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. Public Library of Science 2015-07-28 /pmc/articles/PMC4517797/ /pubmed/26219079 http://dx.doi.org/10.1371/journal.pcbi.1004216 Text en © 2015 Percha, Altman http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Percha, Bethany Altman, Russ B. Learning the Structure of Biomedical Relationships from Unstructured Text |
title | Learning the Structure of Biomedical Relationships from Unstructured Text |
title_full | Learning the Structure of Biomedical Relationships from Unstructured Text |
title_fullStr | Learning the Structure of Biomedical Relationships from Unstructured Text |
title_full_unstemmed | Learning the Structure of Biomedical Relationships from Unstructured Text |
title_short | Learning the Structure of Biomedical Relationships from Unstructured Text |
title_sort | learning the structure of biomedical relationships from unstructured text |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4517797/ https://www.ncbi.nlm.nih.gov/pubmed/26219079 http://dx.doi.org/10.1371/journal.pcbi.1004216 |
work_keys_str_mv | AT perchabethany learningthestructureofbiomedicalrelationshipsfromunstructuredtext AT altmanrussb learningthestructureofbiomedicalrelationshipsfromunstructuredtext |