Cargando…

Clustering approaches for visual knowledge exploration in molecular interaction networks

BACKGROUND: Biomedical knowledge grows in complexity, and becomes encoded in network-based repositories, which include focused, expert-drawn diagrams, networks of evidence-based associations and established ontologies. Combining these structured information sources is an important computational chal...

Descripción completa

Detalles Bibliográficos
Autores principales: Ostaszewski, Marek, Kieffer, Emmanuel, Danoy, Grégoire, Schneider, Reinhard, Bouvry, Pascal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6116538/
https://www.ncbi.nlm.nih.gov/pubmed/30157777
http://dx.doi.org/10.1186/s12859-018-2314-z
_version_ 1783351624518860800
author Ostaszewski, Marek
Kieffer, Emmanuel
Danoy, Grégoire
Schneider, Reinhard
Bouvry, Pascal
author_facet Ostaszewski, Marek
Kieffer, Emmanuel
Danoy, Grégoire
Schneider, Reinhard
Bouvry, Pascal
author_sort Ostaszewski, Marek
collection PubMed
description BACKGROUND: Biomedical knowledge grows in complexity, and becomes encoded in network-based repositories, which include focused, expert-drawn diagrams, networks of evidence-based associations and established ontologies. Combining these structured information sources is an important computational challenge, as large graphs are difficult to analyze visually. RESULTS: We investigate knowledge discovery in manually curated and annotated molecular interaction diagrams. To evaluate similarity of content we use: i) Euclidean distance in expert-drawn diagrams, ii) shortest path distance using the underlying network and iii) ontology-based distance. We employ clustering with these metrics used separately and in pairwise combinations. We propose a novel bi-level optimization approach together with an evolutionary algorithm for informative combination of distance metrics. We compare the enrichment of the obtained clusters between the solutions and with expert knowledge. We calculate the number of Gene and Disease Ontology terms discovered by different solutions as a measure of cluster quality. Our results show that combining distance metrics can improve clustering accuracy, based on the comparison with expert-provided clusters. Also, the performance of specific combinations of distance functions depends on the clustering depth (number of clusters). By employing bi-level optimization approach we evaluated relative importance of distance functions and we found that indeed the order by which they are combined affects clustering performance. Next, with the enrichment analysis of clustering results we found that both hierarchical and bi-level clustering schemes discovered more Gene and Disease Ontology terms than expert-provided clusters for the same knowledge repository. Moreover, bi-level clustering found more enriched terms than the best hierarchical clustering solution for three distinct distance metric combinations in three different instances of disease maps. CONCLUSIONS: In this work we examined the impact of different distance functions on clustering of a visual biomedical knowledge repository. We found that combining distance functions may be beneficial for clustering, and improve exploration of such repositories. We proposed bi-level optimization to evaluate the importance of order by which the distance functions are combined. Both combination and order of these functions affected clustering quality and knowledge recognition in the considered benchmarks. We propose that multiple dimensions can be utilized simultaneously for visual knowledge exploration. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2314-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6116538
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61165382018-10-02 Clustering approaches for visual knowledge exploration in molecular interaction networks Ostaszewski, Marek Kieffer, Emmanuel Danoy, Grégoire Schneider, Reinhard Bouvry, Pascal BMC Bioinformatics Research Article BACKGROUND: Biomedical knowledge grows in complexity, and becomes encoded in network-based repositories, which include focused, expert-drawn diagrams, networks of evidence-based associations and established ontologies. Combining these structured information sources is an important computational challenge, as large graphs are difficult to analyze visually. RESULTS: We investigate knowledge discovery in manually curated and annotated molecular interaction diagrams. To evaluate similarity of content we use: i) Euclidean distance in expert-drawn diagrams, ii) shortest path distance using the underlying network and iii) ontology-based distance. We employ clustering with these metrics used separately and in pairwise combinations. We propose a novel bi-level optimization approach together with an evolutionary algorithm for informative combination of distance metrics. We compare the enrichment of the obtained clusters between the solutions and with expert knowledge. We calculate the number of Gene and Disease Ontology terms discovered by different solutions as a measure of cluster quality. Our results show that combining distance metrics can improve clustering accuracy, based on the comparison with expert-provided clusters. Also, the performance of specific combinations of distance functions depends on the clustering depth (number of clusters). By employing bi-level optimization approach we evaluated relative importance of distance functions and we found that indeed the order by which they are combined affects clustering performance. Next, with the enrichment analysis of clustering results we found that both hierarchical and bi-level clustering schemes discovered more Gene and Disease Ontology terms than expert-provided clusters for the same knowledge repository. Moreover, bi-level clustering found more enriched terms than the best hierarchical clustering solution for three distinct distance metric combinations in three different instances of disease maps. CONCLUSIONS: In this work we examined the impact of different distance functions on clustering of a visual biomedical knowledge repository. We found that combining distance functions may be beneficial for clustering, and improve exploration of such repositories. We proposed bi-level optimization to evaluate the importance of order by which the distance functions are combined. Both combination and order of these functions affected clustering quality and knowledge recognition in the considered benchmarks. We propose that multiple dimensions can be utilized simultaneously for visual knowledge exploration. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2314-z) contains supplementary material, which is available to authorized users. BioMed Central 2018-08-29 /pmc/articles/PMC6116538/ /pubmed/30157777 http://dx.doi.org/10.1186/s12859-018-2314-z Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ostaszewski, Marek
Kieffer, Emmanuel
Danoy, Grégoire
Schneider, Reinhard
Bouvry, Pascal
Clustering approaches for visual knowledge exploration in molecular interaction networks
title Clustering approaches for visual knowledge exploration in molecular interaction networks
title_full Clustering approaches for visual knowledge exploration in molecular interaction networks
title_fullStr Clustering approaches for visual knowledge exploration in molecular interaction networks
title_full_unstemmed Clustering approaches for visual knowledge exploration in molecular interaction networks
title_short Clustering approaches for visual knowledge exploration in molecular interaction networks
title_sort clustering approaches for visual knowledge exploration in molecular interaction networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6116538/
https://www.ncbi.nlm.nih.gov/pubmed/30157777
http://dx.doi.org/10.1186/s12859-018-2314-z
work_keys_str_mv AT ostaszewskimarek clusteringapproachesforvisualknowledgeexplorationinmolecularinteractionnetworks
AT kiefferemmanuel clusteringapproachesforvisualknowledgeexplorationinmolecularinteractionnetworks
AT danoygregoire clusteringapproachesforvisualknowledgeexplorationinmolecularinteractionnetworks
AT schneiderreinhard clusteringapproachesforvisualknowledgeexplorationinmolecularinteractionnetworks
AT bouvrypascal clusteringapproachesforvisualknowledgeexplorationinmolecularinteractionnetworks