Cargando…

Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data

The accumulation of RNA sequencing (RNA-Seq) gene expression data in recent years has resulted in large and complex data sets of high dimensions. Exploratory analysis, including data mining and visualization, reveals hidden patterns and potential outliers in such data, but is often challenged by the...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Wanli, Di, Yanming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6046202/
https://www.ncbi.nlm.nih.gov/pubmed/30013849
http://dx.doi.org/10.7717/peerj.5199
_version_ 1783339788124815360
author Zhang, Wanli
Di, Yanming
author_facet Zhang, Wanli
Di, Yanming
author_sort Zhang, Wanli
collection PubMed
description The accumulation of RNA sequencing (RNA-Seq) gene expression data in recent years has resulted in large and complex data sets of high dimensions. Exploratory analysis, including data mining and visualization, reveals hidden patterns and potential outliers in such data, but is often challenged by the high dimensional nature of the data. The scatterplot matrix is a commonly used tool for visualizing multivariate data, and allows us to view multiple bivariate relationships simultaneously. However, the scatterplot matrix becomes less effective for high dimensional data because the number of bivariate displays increases quadratically with data dimensionality. In this study, we introduce a selection criterion for each bivariate scatterplot and design/implement an algorithm that automatically scan and rank all possible scatterplots, with the goal of identifying the plots in which separation between two pre-defined groups is maximized. By applying our method to a multi-experiment Arabidopsis RNA-Seq data set, we were able to successfully pinpoint the visualization angles where genes from two biological pathways are the most separated, as well as identify potential outliers.
format Online
Article
Text
id pubmed-6046202
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-60462022018-07-16 Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data Zhang, Wanli Di, Yanming PeerJ Plant Science The accumulation of RNA sequencing (RNA-Seq) gene expression data in recent years has resulted in large and complex data sets of high dimensions. Exploratory analysis, including data mining and visualization, reveals hidden patterns and potential outliers in such data, but is often challenged by the high dimensional nature of the data. The scatterplot matrix is a commonly used tool for visualizing multivariate data, and allows us to view multiple bivariate relationships simultaneously. However, the scatterplot matrix becomes less effective for high dimensional data because the number of bivariate displays increases quadratically with data dimensionality. In this study, we introduce a selection criterion for each bivariate scatterplot and design/implement an algorithm that automatically scan and rank all possible scatterplots, with the goal of identifying the plots in which separation between two pre-defined groups is maximized. By applying our method to a multi-experiment Arabidopsis RNA-Seq data set, we were able to successfully pinpoint the visualization angles where genes from two biological pathways are the most separated, as well as identify potential outliers. PeerJ Inc. 2018-07-12 /pmc/articles/PMC6046202/ /pubmed/30013849 http://dx.doi.org/10.7717/peerj.5199 Text en © 2018 Zhang & Di http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Plant Science
Zhang, Wanli
Di, Yanming
Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data
title Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data
title_full Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data
title_fullStr Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data
title_full_unstemmed Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data
title_short Searching for best lower dimensional visualization angles for high dimensional RNA-Seq data
title_sort searching for best lower dimensional visualization angles for high dimensional rna-seq data
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6046202/
https://www.ncbi.nlm.nih.gov/pubmed/30013849
http://dx.doi.org/10.7717/peerj.5199
work_keys_str_mv AT zhangwanli searchingforbestlowerdimensionalvisualizationanglesforhighdimensionalrnaseqdata
AT diyanming searchingforbestlowerdimensionalvisualizationanglesforhighdimensionalrnaseqdata