Cargando…
Fusing literature and full network data improves disease similarity computation
BACKGROUND: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5006367/ https://www.ncbi.nlm.nih.gov/pubmed/27578323 http://dx.doi.org/10.1186/s12859-016-1205-4 |
_version_ | 1782451050042621952 |
---|---|
author | Li, Ping Nie, Yaling Yu, Jingkai |
author_facet | Li, Ping Nie, Yaling Yu, Jingkai |
author_sort | Li, Ping |
collection | PubMed |
description | BACKGROUND: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature. RESULTS: Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively. CONCLUSIONS: Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http://www.digintelli.com:8000/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1205-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5006367 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-50063672016-09-07 Fusing literature and full network data improves disease similarity computation Li, Ping Nie, Yaling Yu, Jingkai BMC Bioinformatics Research Article BACKGROUND: Identifying relatedness among diseases could help deepen understanding for the underlying pathogenic mechanisms of diseases, and facilitate drug repositioning projects. A number of methods for computing disease similarity had been developed; however, none of them were designed to utilize information of the entire protein interaction network, using instead only those interactions involving disease causing genes. Most of previously published methods required gene-disease association data, unfortunately, many diseases still have very few or no associated genes, which impeded broad adoption of those methods. In this study, we propose a new method (MedNetSim) for computing disease similarity by integrating medical literature and protein interaction network. MedNetSim consists of a network-based method (NetSim), which employs the entire protein interaction network, and a MEDLINE-based method (MedSim), which computes disease similarity by mining the biomedical literature. RESULTS: Among function-based methods, NetSim achieved the best performance. Its average AUC (area under the receiver operating characteristic curve) reached 95.2 %. MedSim, whose performance was even comparable to some function-based methods, acquired the highest average AUC in all semantic-based methods. Integration of MedSim and NetSim (MedNetSim) further improved the average AUC to 96.4 %. We further studied the effectiveness of different data sources. It was found that quality of protein interaction data was more important than its volume. On the contrary, higher volume of gene-disease association data was more beneficial, even with a lower reliability. Utilizing higher volume of disease-related gene data further improved the average AUC of MedNetSim and NetSim to 97.5 % and 96.7 %, respectively. CONCLUSIONS: Integrating biomedical literature and protein interaction network can be an effective way to compute disease similarity. Lacking sufficient disease-related gene data, literature-based methods such as MedSim can be a great addition to function-based algorithms. It may be beneficial to steer more resources torward studying gene-disease associations and improving the quality of protein interaction data. Disease similarities can be computed using the proposed methods at http://www.digintelli.com:8000/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1205-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-30 /pmc/articles/PMC5006367/ /pubmed/27578323 http://dx.doi.org/10.1186/s12859-016-1205-4 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Li, Ping Nie, Yaling Yu, Jingkai Fusing literature and full network data improves disease similarity computation |
title | Fusing literature and full network data improves disease similarity computation |
title_full | Fusing literature and full network data improves disease similarity computation |
title_fullStr | Fusing literature and full network data improves disease similarity computation |
title_full_unstemmed | Fusing literature and full network data improves disease similarity computation |
title_short | Fusing literature and full network data improves disease similarity computation |
title_sort | fusing literature and full network data improves disease similarity computation |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5006367/ https://www.ncbi.nlm.nih.gov/pubmed/27578323 http://dx.doi.org/10.1186/s12859-016-1205-4 |
work_keys_str_mv | AT liping fusingliteratureandfullnetworkdataimprovesdiseasesimilaritycomputation AT nieyaling fusingliteratureandfullnetworkdataimprovesdiseasesimilaritycomputation AT yujingkai fusingliteratureandfullnetworkdataimprovesdiseasesimilaritycomputation |