Cargando…

Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks

Biological networks catalog the complex web of interactions happening between different molecules, typically proteins, within a cell. These networks are known to be highly modular, with groups of proteins associated with specific biological functions. Human diseases often arise from the dysfunction...

Descripción completa

Detalles Bibliográficos
Autores principales: Tripathi, Beethika, Parthasarathy, Srinivasan, Sinha, Himanshu, Raman, Karthik, Ravindran, Balaraman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6424898/
https://www.ncbi.nlm.nih.gov/pubmed/30918511
http://dx.doi.org/10.3389/fgene.2019.00164
_version_ 1783404738957541376
author Tripathi, Beethika
Parthasarathy, Srinivasan
Sinha, Himanshu
Raman, Karthik
Ravindran, Balaraman
author_facet Tripathi, Beethika
Parthasarathy, Srinivasan
Sinha, Himanshu
Raman, Karthik
Ravindran, Balaraman
author_sort Tripathi, Beethika
collection PubMed
description Biological networks catalog the complex web of interactions happening between different molecules, typically proteins, within a cell. These networks are known to be highly modular, with groups of proteins associated with specific biological functions. Human diseases often arise from the dysfunction of one or more such proteins of the biological functional group. The ability, to identify and automatically extract these modules has implications for understanding the etiology of different diseases as well as the functional roles of different protein modules in disease. The recent DREAM challenge posed the problem of identifying disease modules from six heterogeneous networks of proteins/genes. There exist many community detection algorithms, but all of them are not adaptable to the biological context, as these networks are densely connected and the size of biologically relevant modules is quite small. The contribution of this study is 3-fold: first, we present a comprehensive assessment of many classic community detection algorithms for biological networks to identify non-overlapping communities, and propose heuristics to identify small and structurally well-defined communities—core modules. We evaluated our performance over 180 GWAS datasets. In comparison to traditional approaches, with our proposed approach we could identify 50% more number of disease-relevant modules. Thus, we show that it is important to identify more compact modules for better performance. Next, we sought to understand the peculiar characteristics of disease-enriched modules and what causes standard community detection algorithms to detect so few of them. We performed a comprehensive analysis of the interaction patterns of known disease genes to understand the structure of disease modules and show that merely considering the known disease genes set as a module does not give good quality clusters, as measured by typical metrics such as modularity and conductance. We go on to present a methodology leveraging these known disease genes, to also include the neighboring nodes of these genes into a module, to form good quality clusters and subsequently extract a “gold-standard set” of disease modules. Lastly, we demonstrate, with justification, that “overlapping” community detection algorithms should be the preferred choice for disease module identification since several genes participate in multiple biological functions.
format Online
Article
Text
id pubmed-6424898
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-64248982019-03-27 Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks Tripathi, Beethika Parthasarathy, Srinivasan Sinha, Himanshu Raman, Karthik Ravindran, Balaraman Front Genet Genetics Biological networks catalog the complex web of interactions happening between different molecules, typically proteins, within a cell. These networks are known to be highly modular, with groups of proteins associated with specific biological functions. Human diseases often arise from the dysfunction of one or more such proteins of the biological functional group. The ability, to identify and automatically extract these modules has implications for understanding the etiology of different diseases as well as the functional roles of different protein modules in disease. The recent DREAM challenge posed the problem of identifying disease modules from six heterogeneous networks of proteins/genes. There exist many community detection algorithms, but all of them are not adaptable to the biological context, as these networks are densely connected and the size of biologically relevant modules is quite small. The contribution of this study is 3-fold: first, we present a comprehensive assessment of many classic community detection algorithms for biological networks to identify non-overlapping communities, and propose heuristics to identify small and structurally well-defined communities—core modules. We evaluated our performance over 180 GWAS datasets. In comparison to traditional approaches, with our proposed approach we could identify 50% more number of disease-relevant modules. Thus, we show that it is important to identify more compact modules for better performance. Next, we sought to understand the peculiar characteristics of disease-enriched modules and what causes standard community detection algorithms to detect so few of them. We performed a comprehensive analysis of the interaction patterns of known disease genes to understand the structure of disease modules and show that merely considering the known disease genes set as a module does not give good quality clusters, as measured by typical metrics such as modularity and conductance. We go on to present a methodology leveraging these known disease genes, to also include the neighboring nodes of these genes into a module, to form good quality clusters and subsequently extract a “gold-standard set” of disease modules. Lastly, we demonstrate, with justification, that “overlapping” community detection algorithms should be the preferred choice for disease module identification since several genes participate in multiple biological functions. Frontiers Media S.A. 2019-03-13 /pmc/articles/PMC6424898/ /pubmed/30918511 http://dx.doi.org/10.3389/fgene.2019.00164 Text en Copyright © 2019 Tripathi, Parthasarathy, Sinha, Raman and Ravindran. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Tripathi, Beethika
Parthasarathy, Srinivasan
Sinha, Himanshu
Raman, Karthik
Ravindran, Balaraman
Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks
title Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks
title_full Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks
title_fullStr Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks
title_full_unstemmed Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks
title_short Adapting Community Detection Algorithms for Disease Module Identification in Heterogeneous Biological Networks
title_sort adapting community detection algorithms for disease module identification in heterogeneous biological networks
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6424898/
https://www.ncbi.nlm.nih.gov/pubmed/30918511
http://dx.doi.org/10.3389/fgene.2019.00164
work_keys_str_mv AT tripathibeethika adaptingcommunitydetectionalgorithmsfordiseasemoduleidentificationinheterogeneousbiologicalnetworks
AT parthasarathysrinivasan adaptingcommunitydetectionalgorithmsfordiseasemoduleidentificationinheterogeneousbiologicalnetworks
AT sinhahimanshu adaptingcommunitydetectionalgorithmsfordiseasemoduleidentificationinheterogeneousbiologicalnetworks
AT ramankarthik adaptingcommunitydetectionalgorithmsfordiseasemoduleidentificationinheterogeneousbiologicalnetworks
AT ravindranbalaraman adaptingcommunitydetectionalgorithmsfordiseasemoduleidentificationinheterogeneousbiologicalnetworks