Cargando…

Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations

BACKGROUND: Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be genera...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Lan Huong, Holmes, Susan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5606221/
https://www.ncbi.nlm.nih.gov/pubmed/28929970
http://dx.doi.org/10.1186/s12859-017-1790-x
_version_ 1783265120744374272
author Nguyen, Lan Huong
Holmes, Susan
author_facet Nguyen, Lan Huong
Holmes, Susan
author_sort Nguyen, Lan Huong
collection PubMed
description BACKGROUND: Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous process, often unknown. Estimating data points’ ‘natural ordering’ and their corresponding uncertainties can help researchers draw insights about the mechanisms involved. RESULTS: We introduce a Bayesian Unidimensional Scaling (BUDS) technique which extracts dominant sources of variation in high dimensional datasets and produces their visual data summaries, facilitating the exploration of a hidden continuum. The method maps multivariate data points to latent one-dimensional coordinates along their underlying trajectory, and provides estimated uncertainty bounds. By statistically modeling dissimilarities and applying a DiSTATIS registration method to their posterior samples, we are able to incorporate visualizations of uncertainties in the estimated data trajectory across different regions using confidence contours for individual data points. We also illustrate the estimated overall data density across different areas by including density clouds. One-dimensional coordinates recovered by BUDS help researchers discover sample attributes or covariates that are factors driving the main variability in a dataset. We demonstrated usefulness and accuracy of BUDS on a set of published microbiome 16S and RNA-seq and roll call data. CONCLUSIONS: Our method effectively recovers and visualizes natural orderings present in datasets. Automatic visualization tools for data exploration and analysis are available at: https://nlhuong.shinyapps.io/visTrajectory/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1790-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5606221
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56062212017-09-24 Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations Nguyen, Lan Huong Holmes, Susan BMC Bioinformatics Research BACKGROUND: Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous process, often unknown. Estimating data points’ ‘natural ordering’ and their corresponding uncertainties can help researchers draw insights about the mechanisms involved. RESULTS: We introduce a Bayesian Unidimensional Scaling (BUDS) technique which extracts dominant sources of variation in high dimensional datasets and produces their visual data summaries, facilitating the exploration of a hidden continuum. The method maps multivariate data points to latent one-dimensional coordinates along their underlying trajectory, and provides estimated uncertainty bounds. By statistically modeling dissimilarities and applying a DiSTATIS registration method to their posterior samples, we are able to incorporate visualizations of uncertainties in the estimated data trajectory across different regions using confidence contours for individual data points. We also illustrate the estimated overall data density across different areas by including density clouds. One-dimensional coordinates recovered by BUDS help researchers discover sample attributes or covariates that are factors driving the main variability in a dataset. We demonstrated usefulness and accuracy of BUDS on a set of published microbiome 16S and RNA-seq and roll call data. CONCLUSIONS: Our method effectively recovers and visualizes natural orderings present in datasets. Automatic visualization tools for data exploration and analysis are available at: https://nlhuong.shinyapps.io/visTrajectory/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1790-x) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-13 /pmc/articles/PMC5606221/ /pubmed/28929970 http://dx.doi.org/10.1186/s12859-017-1790-x Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Nguyen, Lan Huong
Holmes, Susan
Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations
title Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations
title_full Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations
title_fullStr Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations
title_full_unstemmed Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations
title_short Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations
title_sort bayesian unidimensional scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5606221/
https://www.ncbi.nlm.nih.gov/pubmed/28929970
http://dx.doi.org/10.1186/s12859-017-1790-x
work_keys_str_mv AT nguyenlanhuong bayesianunidimensionalscalingforvisualizinguncertaintyinhighdimensionaldatasetswithlatentorderingofobservations
AT holmessusan bayesianunidimensionalscalingforvisualizinguncertaintyinhighdimensionaldatasetswithlatentorderingofobservations