Cargando…

Clustering of samples and variables with mixed-type data

Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, beco...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hummel, Manuela, Edelmann, Dominic, Kopp-Schneider, Annette
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5705083/ https://www.ncbi.nlm.nih.gov/pubmed/29182671 http://dx.doi.org/10.1371/journal.pone.0188274

_version_	1783281998683439104
author	Hummel, Manuela Edelmann, Dominic Kopp-Schneider, Annette
author_facet	Hummel, Manuela Edelmann, Dominic Kopp-Schneider, Annette
author_sort	Hummel, Manuela
collection	PubMed
description	Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix.
format	Online Article Text
id	pubmed-5705083
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-57050832017-12-08 Clustering of samples and variables with mixed-type data Hummel, Manuela Edelmann, Dominic Kopp-Schneider, Annette PLoS One Research Article Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix. Public Library of Science 2017-11-28 /pmc/articles/PMC5705083/ /pubmed/29182671 http://dx.doi.org/10.1371/journal.pone.0188274 Text en © 2017 Hummel et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Hummel, Manuela Edelmann, Dominic Kopp-Schneider, Annette Clustering of samples and variables with mixed-type data
title	Clustering of samples and variables with mixed-type data
title_full	Clustering of samples and variables with mixed-type data
title_fullStr	Clustering of samples and variables with mixed-type data
title_full_unstemmed	Clustering of samples and variables with mixed-type data
title_short	Clustering of samples and variables with mixed-type data
title_sort	clustering of samples and variables with mixed-type data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5705083/ https://www.ncbi.nlm.nih.gov/pubmed/29182671 http://dx.doi.org/10.1371/journal.pone.0188274
work_keys_str_mv	AT hummelmanuela clusteringofsamplesandvariableswithmixedtypedata AT edelmanndominic clusteringofsamplesandvariableswithmixedtypedata AT koppschneiderannette clusteringofsamplesandvariableswithmixedtypedata

Clustering of samples and variables with mixed-type data

Ejemplares similares