Cargando…
Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations
BACKGROUND: Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be genera...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5606221/ https://www.ncbi.nlm.nih.gov/pubmed/28929970 http://dx.doi.org/10.1186/s12859-017-1790-x |
_version_ | 1783265120744374272 |
---|---|
author | Nguyen, Lan Huong Holmes, Susan |
author_facet | Nguyen, Lan Huong Holmes, Susan |
author_sort | Nguyen, Lan Huong |
collection | PubMed |
description | BACKGROUND: Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous process, often unknown. Estimating data points’ ‘natural ordering’ and their corresponding uncertainties can help researchers draw insights about the mechanisms involved. RESULTS: We introduce a Bayesian Unidimensional Scaling (BUDS) technique which extracts dominant sources of variation in high dimensional datasets and produces their visual data summaries, facilitating the exploration of a hidden continuum. The method maps multivariate data points to latent one-dimensional coordinates along their underlying trajectory, and provides estimated uncertainty bounds. By statistically modeling dissimilarities and applying a DiSTATIS registration method to their posterior samples, we are able to incorporate visualizations of uncertainties in the estimated data trajectory across different regions using confidence contours for individual data points. We also illustrate the estimated overall data density across different areas by including density clouds. One-dimensional coordinates recovered by BUDS help researchers discover sample attributes or covariates that are factors driving the main variability in a dataset. We demonstrated usefulness and accuracy of BUDS on a set of published microbiome 16S and RNA-seq and roll call data. CONCLUSIONS: Our method effectively recovers and visualizes natural orderings present in datasets. Automatic visualization tools for data exploration and analysis are available at: https://nlhuong.shinyapps.io/visTrajectory/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1790-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5606221 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-56062212017-09-24 Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations Nguyen, Lan Huong Holmes, Susan BMC Bioinformatics Research BACKGROUND: Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous process, often unknown. Estimating data points’ ‘natural ordering’ and their corresponding uncertainties can help researchers draw insights about the mechanisms involved. RESULTS: We introduce a Bayesian Unidimensional Scaling (BUDS) technique which extracts dominant sources of variation in high dimensional datasets and produces their visual data summaries, facilitating the exploration of a hidden continuum. The method maps multivariate data points to latent one-dimensional coordinates along their underlying trajectory, and provides estimated uncertainty bounds. By statistically modeling dissimilarities and applying a DiSTATIS registration method to their posterior samples, we are able to incorporate visualizations of uncertainties in the estimated data trajectory across different regions using confidence contours for individual data points. We also illustrate the estimated overall data density across different areas by including density clouds. One-dimensional coordinates recovered by BUDS help researchers discover sample attributes or covariates that are factors driving the main variability in a dataset. We demonstrated usefulness and accuracy of BUDS on a set of published microbiome 16S and RNA-seq and roll call data. CONCLUSIONS: Our method effectively recovers and visualizes natural orderings present in datasets. Automatic visualization tools for data exploration and analysis are available at: https://nlhuong.shinyapps.io/visTrajectory/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1790-x) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-13 /pmc/articles/PMC5606221/ /pubmed/28929970 http://dx.doi.org/10.1186/s12859-017-1790-x Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Nguyen, Lan Huong Holmes, Susan Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations |
title | Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations |
title_full | Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations |
title_fullStr | Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations |
title_full_unstemmed | Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations |
title_short | Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations |
title_sort | bayesian unidimensional scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5606221/ https://www.ncbi.nlm.nih.gov/pubmed/28929970 http://dx.doi.org/10.1186/s12859-017-1790-x |
work_keys_str_mv | AT nguyenlanhuong bayesianunidimensionalscalingforvisualizinguncertaintyinhighdimensionaldatasetswithlatentorderingofobservations AT holmessusan bayesianunidimensionalscalingforvisualizinguncertaintyinhighdimensionaldatasetswithlatentorderingofobservations |