Cargando…

Fusing literature and full network data improves disease similarity computation

BACKGROUND: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Ping, Nie, Yaling, Yu, Jingkai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5006367/
https://www.ncbi.nlm.nih.gov/pubmed/27578323
http://dx.doi.org/10.1186/s12859-016-1205-4
_version_ 1782451050042621952
author Li, Ping
Nie, Yaling
Yu, Jingkai
author_facet Li, Ping
Nie, Yaling
Yu, Jingkai
author_sort Li, Ping
collection PubMed
description BACKGROUND: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature. RESULTS: Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively. CONCLUSIONS: Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http://www.digintelli.com:8000/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1205-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5006367
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50063672016-09-07 Fusing literature and full network data improves disease similarity computation Li, Ping Nie, Yaling Yu, Jingkai BMC Bioinformatics Research Article BACKGROUND: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature. RESULTS: Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively. CONCLUSIONS: Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http://www.digintelli.com:8000/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1205-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-30 /pmc/articles/PMC5006367/ /pubmed/27578323 http://dx.doi.org/10.1186/s12859-016-1205-4 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Li, Ping
Nie, Yaling
Yu, Jingkai
Fusing literature and full network data improves disease similarity computation
title Fusing literature and full network data improves disease similarity computation
title_full Fusing literature and full network data improves disease similarity computation
title_fullStr Fusing literature and full network data improves disease similarity computation
title_full_unstemmed Fusing literature and full network data improves disease similarity computation
title_short Fusing literature and full network data improves disease similarity computation
title_sort fusing literature and full network data improves disease similarity computation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5006367/
https://www.ncbi.nlm.nih.gov/pubmed/27578323
http://dx.doi.org/10.1186/s12859-016-1205-4
work_keys_str_mv AT liping fusingliteratureandfullnetworkdataimprovesdiseasesimilaritycomputation
AT nieyaling fusingliteratureandfullnetworkdataimprovesdiseasesimilaritycomputation
AT yujingkai fusingliteratureandfullnetworkdataimprovesdiseasesimilaritycomputation