Cargando…
dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text
BACKGROUND: Discerning the genetic contributions to complex human diseases is a challenging mandate that demands new types of data and calls for new avenues for advancing the state-of-the-art in computational approaches to uncovering disease etiology. Systems approaches to studying observable phenot...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998061/ https://www.ncbi.nlm.nih.gov/pubmed/24725842 http://dx.doi.org/10.1186/1471-2105-15-105 |
_version_ | 1782313291785175040 |
---|---|
author | Xu, Rong Li, Li Wang, QuanQiu |
author_facet | Xu, Rong Li, Li Wang, QuanQiu |
author_sort | Xu, Rong |
collection | PubMed |
description | BACKGROUND: Discerning the genetic contributions to complex human diseases is a challenging mandate that demands new types of data and calls for new avenues for advancing the state-of-the-art in computational approaches to uncovering disease etiology. Systems approaches to studying observable phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repositioning. Currently, systematic study of disease relationships on a phenome-wide scale is limited due to the lack of large-scale machine understandable disease phenotype relationship knowledge bases. Our study innovates a semi-supervised iterative pattern learning approach that is used to build an precise, large-scale disease-disease risk relationship (D1 →D2) knowledge base (dRiskKB) from a vast corpus of free-text published biomedical literature. RESULTS: 21,354,075 MEDLINE records comprised the text corpus under study. First, we used one typical disease risk-specific syntactic pattern (i.e. “D1 due to D2”) as a seed to automatically discover other patterns specifying similar semantic relationships among diseases. We then extracted D1 →D2 risk pairs from MEDLINE using the learned patterns. We manually evaluated the precisions of the learned patterns and extracted pairs. Finally, we analyzed the correlations between disease-disease risk pairs and their associated genes and drugs. The newly created dRiskKB consists of a total of 34,448 unique D1 →D2 pairs, representing the risk-specific semantic relationships among 12,981 diseases with each disease linked to its associated genes and drugs. The identified patterns are highly precise (average precision of 0.99) in specifying the risk-specific relationships among diseases. The precisions of extracted pairs are 0.919 for those that are exactly matched and 0.988 for those that are partially matched. By comparing the iterative pattern approach starting from different seeds, we demonstrated that our algorithm is robust in terms of seed choice. We show that diseases and their risk diseases as well as diseases with similar risk profiles tend to share both genes and drugs. CONCLUSIONS: This unique dRiskKB, when combined with existing phenotypic, genetic, and genomic datasets, can have profound implications in our deeper understanding of disease etiology and in drug repositioning. |
format | Online Article Text |
id | pubmed-3998061 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-39980612014-05-08 dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text Xu, Rong Li, Li Wang, QuanQiu BMC Bioinformatics Research Article BACKGROUND: Discerning the genetic contributions to complex human diseases is a challenging mandate that demands new types of data and calls for new avenues for advancing the state-of-the-art in computational approaches to uncovering disease etiology. Systems approaches to studying observable phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repositioning. Currently, systematic study of disease relationships on a phenome-wide scale is limited due to the lack of large-scale machine understandable disease phenotype relationship knowledge bases. Our study innovates a semi-supervised iterative pattern learning approach that is used to build an precise, large-scale disease-disease risk relationship (D1 →D2) knowledge base (dRiskKB) from a vast corpus of free-text published biomedical literature. RESULTS: 21,354,075 MEDLINE records comprised the text corpus under study. First, we used one typical disease risk-specific syntactic pattern (i.e. “D1 due to D2”) as a seed to automatically discover other patterns specifying similar semantic relationships among diseases. We then extracted D1 →D2 risk pairs from MEDLINE using the learned patterns. We manually evaluated the precisions of the learned patterns and extracted pairs. Finally, we analyzed the correlations between disease-disease risk pairs and their associated genes and drugs. The newly created dRiskKB consists of a total of 34,448 unique D1 →D2 pairs, representing the risk-specific semantic relationships among 12,981 diseases with each disease linked to its associated genes and drugs. The identified patterns are highly precise (average precision of 0.99) in specifying the risk-specific relationships among diseases. The precisions of extracted pairs are 0.919 for those that are exactly matched and 0.988 for those that are partially matched. By comparing the iterative pattern approach starting from different seeds, we demonstrated that our algorithm is robust in terms of seed choice. We show that diseases and their risk diseases as well as diseases with similar risk profiles tend to share both genes and drugs. CONCLUSIONS: This unique dRiskKB, when combined with existing phenotypic, genetic, and genomic datasets, can have profound implications in our deeper understanding of disease etiology and in drug repositioning. BioMed Central 2014-04-12 /pmc/articles/PMC3998061/ /pubmed/24725842 http://dx.doi.org/10.1186/1471-2105-15-105 Text en Copyright © 2014 Xu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Xu, Rong Li, Li Wang, QuanQiu dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text |
title | dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text |
title_full | dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text |
title_fullStr | dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text |
title_full_unstemmed | dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text |
title_short | dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text |
title_sort | driskkb: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998061/ https://www.ncbi.nlm.nih.gov/pubmed/24725842 http://dx.doi.org/10.1186/1471-2105-15-105 |
work_keys_str_mv | AT xurong driskkbalargescalediseasediseaseriskrelationshipknowledgebaseconstructedfrombiomedicaltext AT lili driskkbalargescalediseasediseaseriskrelationshipknowledgebaseconstructedfrombiomedicaltext AT wangquanqiu driskkbalargescalediseasediseaseriskrelationshipknowledgebaseconstructedfrombiomedicaltext |