Cargando…

Systematic identification of latent disease-gene associations from PubMed articles

Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publ...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yuji, Shen, Feichen, Mojarad, Majid Rastegar, Li, Dingcheng, Liu, Sijia, Tao, Cui, Yu, Yue, Liu, Hongfang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5786305/
https://www.ncbi.nlm.nih.gov/pubmed/29373609
http://dx.doi.org/10.1371/journal.pone.0191568
_version_ 1783295765307719680
author Zhang, Yuji
Shen, Feichen
Mojarad, Majid Rastegar
Li, Dingcheng
Liu, Sijia
Tao, Cui
Yu, Yue
Liu, Hongfang
author_facet Zhang, Yuji
Shen, Feichen
Mojarad, Majid Rastegar
Li, Dingcheng
Liu, Sijia
Tao, Cui
Yu, Yue
Liu, Hongfang
author_sort Zhang, Yuji
collection PubMed
description Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research.
format Online
Article
Text
id pubmed-5786305
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-57863052018-02-09 Systematic identification of latent disease-gene associations from PubMed articles Zhang, Yuji Shen, Feichen Mojarad, Majid Rastegar Li, Dingcheng Liu, Sijia Tao, Cui Yu, Yue Liu, Hongfang PLoS One Research Article Recent scientific advances have accumulated a tremendous amount of biomedical knowledge providing novel insights into the relationship between molecular and cellular processes and diseases. Literature mining is one of the commonly used methods to retrieve and extract information from scientific publications for understanding these associations. However, due to large data volume and complicated associations with noises, the interpretability of such association data for semantic knowledge discovery is challenging. In this study, we describe an integrative computational framework aiming to expedite the discovery of latent disease mechanisms by dissecting 146,245 disease-gene associations from over 25 million of PubMed indexed articles. We take advantage of both Latent Dirichlet Allocation (LDA) modeling and network-based analysis for their capabilities of detecting latent associations and reducing noises for large volume data respectively. Our results demonstrate that (1) the LDA-based modeling is able to group similar diseases into disease topics; (2) the disease-specific association networks follow the scale-free network property; (3) certain subnetwork patterns were enriched in the disease-specific association networks; and (4) genes were enriched in topic-specific biological processes. Our approach offers promising opportunities for latent disease-gene knowledge discovery in biomedical research. Public Library of Science 2018-01-26 /pmc/articles/PMC5786305/ /pubmed/29373609 http://dx.doi.org/10.1371/journal.pone.0191568 Text en © 2018 Zhang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhang, Yuji
Shen, Feichen
Mojarad, Majid Rastegar
Li, Dingcheng
Liu, Sijia
Tao, Cui
Yu, Yue
Liu, Hongfang
Systematic identification of latent disease-gene associations from PubMed articles
title Systematic identification of latent disease-gene associations from PubMed articles
title_full Systematic identification of latent disease-gene associations from PubMed articles
title_fullStr Systematic identification of latent disease-gene associations from PubMed articles
title_full_unstemmed Systematic identification of latent disease-gene associations from PubMed articles
title_short Systematic identification of latent disease-gene associations from PubMed articles
title_sort systematic identification of latent disease-gene associations from pubmed articles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5786305/
https://www.ncbi.nlm.nih.gov/pubmed/29373609
http://dx.doi.org/10.1371/journal.pone.0191568
work_keys_str_mv AT zhangyuji systematicidentificationoflatentdiseasegeneassociationsfrompubmedarticles
AT shenfeichen systematicidentificationoflatentdiseasegeneassociationsfrompubmedarticles
AT mojaradmajidrastegar systematicidentificationoflatentdiseasegeneassociationsfrompubmedarticles
AT lidingcheng systematicidentificationoflatentdiseasegeneassociationsfrompubmedarticles
AT liusijia systematicidentificationoflatentdiseasegeneassociationsfrompubmedarticles
AT taocui systematicidentificationoflatentdiseasegeneassociationsfrompubmedarticles
AT yuyue systematicidentificationoflatentdiseasegeneassociationsfrompubmedarticles
AT liuhongfang systematicidentificationoflatentdiseasegeneassociationsfrompubmedarticles