Cargando…

Fusing literature and full network data improves disease similarity computation

BACKGROUND: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Ping, Nie, Yaling, Yu, Jingkai
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5006367/ https://www.ncbi.nlm.nih.gov/pubmed/27578323 http://dx.doi.org/10.1186/s12859-016-1205-4

_version_	1782451050042621952
author	Li, Ping Nie, Yaling Yu, Jingkai
author_facet	Li, Ping Nie, Yaling Yu, Jingkai
author_sort	Li, Ping
collection	PubMed
description	BACKGROUND: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature. RESULTS: Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively. CONCLUSIONS: Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http://www.digintelli.com:8000/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1205-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5006367
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-50063672016-09-07 Fusing literature and full network data improves disease similarity computation Li, Ping Nie, Yaling Yu, Jingkai BMC Bioinformatics Research Article BACKGROUND: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature. RESULTS: Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively. CONCLUSIONS: Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http://www.digintelli.com:8000/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1205-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-30 /pmc/articles/PMC5006367/ /pubmed/27578323 http://dx.doi.org/10.1186/s12859-016-1205-4 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Li, Ping Nie, Yaling Yu, Jingkai Fusing literature and full network data improves disease similarity computation
title	Fusing literature and full network data improves disease similarity computation
title_full	Fusing literature and full network data improves disease similarity computation
title_fullStr	Fusing literature and full network data improves disease similarity computation
title_full_unstemmed	Fusing literature and full network data improves disease similarity computation
title_short	Fusing literature and full network data improves disease similarity computation
title_sort	fusing literature and full network data improves disease similarity computation
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5006367/ https://www.ncbi.nlm.nih.gov/pubmed/27578323 http://dx.doi.org/10.1186/s12859-016-1205-4
work_keys_str_mv	AT liping fusingliteratureandfullnetworkdataimprovesdiseasesimilaritycomputation AT nieyaling fusingliteratureandfullnetworkdataimprovesdiseasesimilaritycomputation AT yujingkai fusingliteratureandfullnetworkdataimprovesdiseasesimilaritycomputation

Fusing literature and full network data improves disease similarity computation

Ejemplares similares