Cargando…

dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text

BACKGROUND: Discerning the genetic contributions to complex human diseases is a challenging mandate that demands new types of data and calls for new avenues for advancing the state-of-the-art in computational approaches to uncovering disease etiology. Systems approaches to studying observable phenot...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xu, Rong, Li, Li, Wang, QuanQiu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998061/ https://www.ncbi.nlm.nih.gov/pubmed/24725842 http://dx.doi.org/10.1186/1471-2105-15-105

_version_	1782313291785175040
author	Xu, Rong Li, Li Wang, QuanQiu
author_facet	Xu, Rong Li, Li Wang, QuanQiu
author_sort	Xu, Rong
collection	PubMed
description	BACKGROUND: Discerning the genetic contributions to complex human diseases is a challenging mandate that demands new types of data and calls for new avenues for advancing the state-of-the-art in computational approaches to uncovering disease etiology. Systems approaches to studying observable phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repositioning. Currently, systematic study of disease relationships on a phenome-wide scale is limited due to the lack of large-scale machine understandable disease phenotype relationship knowledge bases. Our study innovates a semi-supervised iterative pattern learning approach that is used to build an precise, large-scale disease-disease risk relationship (D1 →D2) knowledge base (dRiskKB) from a vast corpus of free-text published biomedical literature. RESULTS: 21,354,075 MEDLINE records comprised the text corpus under study. First, we used one typical disease risk-specific syntactic pattern (i.e. “D1 due to D2”) as a seed to automatically discover other patterns specifying similar semantic relationships among diseases. We then extracted D1 →D2 risk pairs from MEDLINE using the learned patterns. We manually evaluated the precisions of the learned patterns and extracted pairs. Finally, we analyzed the correlations between disease-disease risk pairs and their associated genes and drugs. The newly created dRiskKB consists of a total of 34,448 unique D1 →D2 pairs, representing the risk-specific semantic relationships among 12,981 diseases with each disease linked to its associated genes and drugs. The identified patterns are highly precise (average precision of 0.99) in specifying the risk-specific relationships among diseases. The precisions of extracted pairs are 0.919 for those that are exactly matched and 0.988 for those that are partially matched. By comparing the iterative pattern approach starting from different seeds, we demonstrated that our algorithm is robust in terms of seed choice. We show that diseases and their risk diseases as well as diseases with similar risk profiles tend to share both genes and drugs. CONCLUSIONS: This unique dRiskKB, when combined with existing phenotypic, genetic, and genomic datasets, can have profound implications in our deeper understanding of disease etiology and in drug repositioning.
format	Online Article Text
id	pubmed-3998061
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39980612014-05-08 dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text Xu, Rong Li, Li Wang, QuanQiu BMC Bioinformatics Research Article BACKGROUND: Discerning the genetic contributions to complex human diseases is a challenging mandate that demands new types of data and calls for new avenues for advancing the state-of-the-art in computational approaches to uncovering disease etiology. Systems approaches to studying observable phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repositioning. Currently, systematic study of disease relationships on a phenome-wide scale is limited due to the lack of large-scale machine understandable disease phenotype relationship knowledge bases. Our study innovates a semi-supervised iterative pattern learning approach that is used to build an precise, large-scale disease-disease risk relationship (D1 →D2) knowledge base (dRiskKB) from a vast corpus of free-text published biomedical literature. RESULTS: 21,354,075 MEDLINE records comprised the text corpus under study. First, we used one typical disease risk-specific syntactic pattern (i.e. “D1 due to D2”) as a seed to automatically discover other patterns specifying similar semantic relationships among diseases. We then extracted D1 →D2 risk pairs from MEDLINE using the learned patterns. We manually evaluated the precisions of the learned patterns and extracted pairs. Finally, we analyzed the correlations between disease-disease risk pairs and their associated genes and drugs. The newly created dRiskKB consists of a total of 34,448 unique D1 →D2 pairs, representing the risk-specific semantic relationships among 12,981 diseases with each disease linked to its associated genes and drugs. The identified patterns are highly precise (average precision of 0.99) in specifying the risk-specific relationships among diseases. The precisions of extracted pairs are 0.919 for those that are exactly matched and 0.988 for those that are partially matched. By comparing the iterative pattern approach starting from different seeds, we demonstrated that our algorithm is robust in terms of seed choice. We show that diseases and their risk diseases as well as diseases with similar risk profiles tend to share both genes and drugs. CONCLUSIONS: This unique dRiskKB, when combined with existing phenotypic, genetic, and genomic datasets, can have profound implications in our deeper understanding of disease etiology and in drug repositioning. BioMed Central 2014-04-12 /pmc/articles/PMC3998061/ /pubmed/24725842 http://dx.doi.org/10.1186/1471-2105-15-105 Text en Copyright © 2014 Xu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Xu, Rong Li, Li Wang, QuanQiu dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text
title	dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text
title_full	dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text
title_fullStr	dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text
title_full_unstemmed	dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text
title_short	dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text
title_sort	driskkb: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998061/ https://www.ncbi.nlm.nih.gov/pubmed/24725842 http://dx.doi.org/10.1186/1471-2105-15-105
work_keys_str_mv	AT xurong driskkbalargescalediseasediseaseriskrelationshipknowledgebaseconstructedfrombiomedicaltext AT lili driskkbalargescalediseasediseaseriskrelationshipknowledgebaseconstructedfrombiomedicaltext AT wangquanqiu driskkbalargescalediseasediseaseriskrelationshipknowledgebaseconstructedfrombiomedicaltext

dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text

Ejemplares similares