Cargando…

A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data

BACKGROUND: Integrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous: it is a common place to gath...

Descripción completa

Detalles Bibliográficos
Autores principales: Hulot, Audrey, Laloë, Denis, Jaffrézic, Florence
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336092/
https://www.ncbi.nlm.nih.gov/pubmed/34348641
http://dx.doi.org/10.1186/s12859-021-04303-4
_version_ 1783733257109504000
author Hulot, Audrey
Laloë, Denis
Jaffrézic, Florence
author_facet Hulot, Audrey
Laloë, Denis
Jaffrézic, Florence
author_sort Hulot, Audrey
collection PubMed
description BACKGROUND: Integrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous: it is a common place to gather data in the form of trees, networks or factorial maps, as these representations all have an appealing visual interpretation that helps to study grouping patterns and interactions between entities. The question we aim to answer in this paper is that of the integration of such representations. RESULTS: To this end, we provide a simple procedure to compare data with various types, in particular trees or networks, that relies essentially on two steps: the first step projects the representations into a common coordinate system; the second step then uses a multi-table integration approach to compare the projected data. We rely on efficient and well-known methodologies for each step: the projection step is achieved by retrieving a distance matrix for each representation form and then applying multidimensional scaling to provide a new set of coordinates from all the pairwise distances. The integration step is then achieved by applying a multiple factor analysis to the multiple tables of the new coordinates. This procedure provides tools to integrate and compare data available, for instance, as tree or network structures. Our approach is complementary to kernel methods, traditionally used to answer the same question. CONCLUSION: Our approach is evaluated on simulation and used to analyze two real-world data sets: first, we compare several clusterings for different cell-types obtained from a transcriptomics single-cell data set in mouse embryos; second, we use our procedure to aggregate a multi-table data set from the TCGA breast cancer database, in order to compare several protein networks inferred for different breast cancer subtypes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04303-4.
format Online
Article
Text
id pubmed-8336092
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83360922021-08-04 A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data Hulot, Audrey Laloë, Denis Jaffrézic, Florence BMC Bioinformatics Methodology Article BACKGROUND: Integrating data from different sources is a recurring question in computational biology. Much effort has been devoted to the integration of data sets of the same type, typically multiple numerical data tables. However, data types are generally heterogeneous: it is a common place to gather data in the form of trees, networks or factorial maps, as these representations all have an appealing visual interpretation that helps to study grouping patterns and interactions between entities. The question we aim to answer in this paper is that of the integration of such representations. RESULTS: To this end, we provide a simple procedure to compare data with various types, in particular trees or networks, that relies essentially on two steps: the first step projects the representations into a common coordinate system; the second step then uses a multi-table integration approach to compare the projected data. We rely on efficient and well-known methodologies for each step: the projection step is achieved by retrieving a distance matrix for each representation form and then applying multidimensional scaling to provide a new set of coordinates from all the pairwise distances. The integration step is then achieved by applying a multiple factor analysis to the multiple tables of the new coordinates. This procedure provides tools to integrate and compare data available, for instance, as tree or network structures. Our approach is complementary to kernel methods, traditionally used to answer the same question. CONCLUSION: Our approach is evaluated on simulation and used to analyze two real-world data sets: first, we compare several clusterings for different cell-types obtained from a transcriptomics single-cell data set in mouse embryos; second, we use our procedure to aggregate a multi-table data set from the TCGA breast cancer database, in order to compare several protein networks inferred for different breast cancer subtypes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04303-4. BioMed Central 2021-08-04 /pmc/articles/PMC8336092/ /pubmed/34348641 http://dx.doi.org/10.1186/s12859-021-04303-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Hulot, Audrey
Laloë, Denis
Jaffrézic, Florence
A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data
title A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data
title_full A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data
title_fullStr A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data
title_full_unstemmed A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data
title_short A unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data
title_sort unified framework for the integration of multiple hierarchical clusterings or networks from multi-source data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336092/
https://www.ncbi.nlm.nih.gov/pubmed/34348641
http://dx.doi.org/10.1186/s12859-021-04303-4
work_keys_str_mv AT hulotaudrey aunifiedframeworkfortheintegrationofmultiplehierarchicalclusteringsornetworksfrommultisourcedata
AT laloedenis aunifiedframeworkfortheintegrationofmultiplehierarchicalclusteringsornetworksfrommultisourcedata
AT jaffrezicflorence aunifiedframeworkfortheintegrationofmultiplehierarchicalclusteringsornetworksfrommultisourcedata
AT hulotaudrey unifiedframeworkfortheintegrationofmultiplehierarchicalclusteringsornetworksfrommultisourcedata
AT laloedenis unifiedframeworkfortheintegrationofmultiplehierarchicalclusteringsornetworksfrommultisourcedata
AT jaffrezicflorence unifiedframeworkfortheintegrationofmultiplehierarchicalclusteringsornetworksfrommultisourcedata