Cargando…

Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs

BACKGROUND: Extensive and automated data integration in bioinformatics facilitates the construction of large, complex biological networks. However, the challenge lies in the interpretation of these networks. While most research focuses on the unipartite or bipartite case, we address the more general...

Descripción completa

Detalles Bibliográficos
Autores principales: Hartsperger, Mara L, Blöchl, Florian, Stümpflen, Volker, Theis, Fabian J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3247861/
https://www.ncbi.nlm.nih.gov/pubmed/20961418
http://dx.doi.org/10.1186/1471-2105-11-522
_version_ 1782220181585526784
author Hartsperger, Mara L
Blöchl, Florian
Stümpflen, Volker
Theis, Fabian J
author_facet Hartsperger, Mara L
Blöchl, Florian
Stümpflen, Volker
Theis, Fabian J
author_sort Hartsperger, Mara L
collection PubMed
description BACKGROUND: Extensive and automated data integration in bioinformatics facilitates the construction of large, complex biological networks. However, the challenge lies in the interpretation of these networks. While most research focuses on the unipartite or bipartite case, we address the more general but common situation of k-partite graphs. These graphs contain k different node types and links are only allowed between nodes of different types. In order to reveal their structural organization and describe the contained information in a more coarse-grained fashion, we ask how to detect clusters within each node type. RESULTS: Since entities in biological networks regularly have more than one function and hence participate in more than one cluster, we developed a k-partite graph partitioning algorithm that allows for overlapping (fuzzy) clusters. It determines for each node a degree of membership to each cluster. Moreover, the algorithm estimates a weighted k-partite graph that connects the extracted clusters. Our method is fast and efficient, mimicking the multiplicative update rules commonly employed in algorithms for non-negative matrix factorization. It facilitates the decomposition of networks on a chosen scale and therefore allows for analysis and interpretation of structures on various resolution levels. Applying our algorithm to a tripartite disease-gene-protein complex network, we were able to structure this graph on a large scale into clusters that are functionally correlated and biologically meaningful. Locally, smaller clusters enabled reclassification or annotation of the clusters' elements. We exemplified this for the transcription factor MECP2. CONCLUSIONS: In order to cope with the overwhelming amount of information available from biomedical literature, we need to tackle the challenge of finding structures in large networks with nodes of multiple types. To this end, we presented a novel fuzzy k-partite graph partitioning algorithm that allows the decomposition of these objects in a comprehensive fashion. We validated our approach both on artificial and real-world data. It is readily applicable to any further problem.
format Online
Article
Text
id pubmed-3247861
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32478612011-12-30 Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs Hartsperger, Mara L Blöchl, Florian Stümpflen, Volker Theis, Fabian J BMC Bioinformatics Methodology Article BACKGROUND: Extensive and automated data integration in bioinformatics facilitates the construction of large, complex biological networks. However, the challenge lies in the interpretation of these networks. While most research focuses on the unipartite or bipartite case, we address the more general but common situation of k-partite graphs. These graphs contain k different node types and links are only allowed between nodes of different types. In order to reveal their structural organization and describe the contained information in a more coarse-grained fashion, we ask how to detect clusters within each node type. RESULTS: Since entities in biological networks regularly have more than one function and hence participate in more than one cluster, we developed a k-partite graph partitioning algorithm that allows for overlapping (fuzzy) clusters. It determines for each node a degree of membership to each cluster. Moreover, the algorithm estimates a weighted k-partite graph that connects the extracted clusters. Our method is fast and efficient, mimicking the multiplicative update rules commonly employed in algorithms for non-negative matrix factorization. It facilitates the decomposition of networks on a chosen scale and therefore allows for analysis and interpretation of structures on various resolution levels. Applying our algorithm to a tripartite disease-gene-protein complex network, we were able to structure this graph on a large scale into clusters that are functionally correlated and biologically meaningful. Locally, smaller clusters enabled reclassification or annotation of the clusters' elements. We exemplified this for the transcription factor MECP2. CONCLUSIONS: In order to cope with the overwhelming amount of information available from biomedical literature, we need to tackle the challenge of finding structures in large networks with nodes of multiple types. To this end, we presented a novel fuzzy k-partite graph partitioning algorithm that allows the decomposition of these objects in a comprehensive fashion. We validated our approach both on artificial and real-world data. It is readily applicable to any further problem. BioMed Central 2010-10-20 /pmc/articles/PMC3247861/ /pubmed/20961418 http://dx.doi.org/10.1186/1471-2105-11-522 Text en Copyright ©2010 Hartsperger et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Hartsperger, Mara L
Blöchl, Florian
Stümpflen, Volker
Theis, Fabian J
Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs
title Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs
title_full Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs
title_fullStr Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs
title_full_unstemmed Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs
title_short Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs
title_sort structuring heterogeneous biological information using fuzzy clustering of k-partite graphs
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3247861/
https://www.ncbi.nlm.nih.gov/pubmed/20961418
http://dx.doi.org/10.1186/1471-2105-11-522
work_keys_str_mv AT hartspergermaral structuringheterogeneousbiologicalinformationusingfuzzyclusteringofkpartitegraphs
AT blochlflorian structuringheterogeneousbiologicalinformationusingfuzzyclusteringofkpartitegraphs
AT stumpflenvolker structuringheterogeneousbiologicalinformationusingfuzzyclusteringofkpartitegraphs
AT theisfabianj structuringheterogeneousbiologicalinformationusingfuzzyclusteringofkpartitegraphs