Cargando…
D-EE: Distributed software for visualizing intrinsic structure of large-scale single-cell data
BACKGROUND: Dimensionality reduction and visualization play vital roles in single-cell RNA sequencing (scRNA-seq) data analysis. While they have been extensively studied, state-of-the-art dimensionality reduction algorithms are often unable to preserve the global structures underlying data. Elastic...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7657844/ https://www.ncbi.nlm.nih.gov/pubmed/33179041 http://dx.doi.org/10.1093/gigascience/giaa126 |
_version_ | 1783608558420492288 |
---|---|
author | An, Shaokun Huang, Jizu Wan, Lin |
author_facet | An, Shaokun Huang, Jizu Wan, Lin |
author_sort | An, Shaokun |
collection | PubMed |
description | BACKGROUND: Dimensionality reduction and visualization play vital roles in single-cell RNA sequencing (scRNA-seq) data analysis. While they have been extensively studied, state-of-the-art dimensionality reduction algorithms are often unable to preserve the global structures underlying data. Elastic embedding (EE), a nonlinear dimensionality reduction method, has shown promise in revealing low-dimensional intrinsic local and global data structure. However, the current implementation of the EE algorithm lacks scalability to large-scale scRNA-seq data. RESULTS: We present a distributed optimization implementation of the EE algorithm, termed distributed elastic embedding (D-EE). D-EE reveals the low-dimensional intrinsic structures of data with accuracy equal to that of elastic embedding, and it is scalable to large-scale scRNA-seq data. It leverages distributed storage and distributed computation, achieving memory efficiency and high-performance computing simultaneously. In addition, an extended version of D-EE, termed distributed optimization implementation of time-series elastic embedding (D-TSEE), enables the user to visualize large-scale time-series scRNA-seq data by incorporating experimentally temporal information. Results with large-scale scRNA-seq data indicate that D-TSEE can uncover oscillatory gene expression patterns by using experimentally temporal information. CONCLUSIONS: D-EE is a distributed dimensionality reduction and visualization tool. Its distributed storage and distributed computation technique allow us to efficiently analyze large-scale single-cell data at the cost of constant time speedup. The source code for D-EE algorithm based on C and MPI tailored to a high-performance computing cluster is available at https://github.com/ShaokunAn/D-EE. |
format | Online Article Text |
id | pubmed-7657844 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-76578442020-11-18 D-EE: Distributed software for visualizing intrinsic structure of large-scale single-cell data An, Shaokun Huang, Jizu Wan, Lin Gigascience Technical Note BACKGROUND: Dimensionality reduction and visualization play vital roles in single-cell RNA sequencing (scRNA-seq) data analysis. While they have been extensively studied, state-of-the-art dimensionality reduction algorithms are often unable to preserve the global structures underlying data. Elastic embedding (EE), a nonlinear dimensionality reduction method, has shown promise in revealing low-dimensional intrinsic local and global data structure. However, the current implementation of the EE algorithm lacks scalability to large-scale scRNA-seq data. RESULTS: We present a distributed optimization implementation of the EE algorithm, termed distributed elastic embedding (D-EE). D-EE reveals the low-dimensional intrinsic structures of data with accuracy equal to that of elastic embedding, and it is scalable to large-scale scRNA-seq data. It leverages distributed storage and distributed computation, achieving memory efficiency and high-performance computing simultaneously. In addition, an extended version of D-EE, termed distributed optimization implementation of time-series elastic embedding (D-TSEE), enables the user to visualize large-scale time-series scRNA-seq data by incorporating experimentally temporal information. Results with large-scale scRNA-seq data indicate that D-TSEE can uncover oscillatory gene expression patterns by using experimentally temporal information. CONCLUSIONS: D-EE is a distributed dimensionality reduction and visualization tool. Its distributed storage and distributed computation technique allow us to efficiently analyze large-scale single-cell data at the cost of constant time speedup. The source code for D-EE algorithm based on C and MPI tailored to a high-performance computing cluster is available at https://github.com/ShaokunAn/D-EE. Oxford University Press 2020-11-11 /pmc/articles/PMC7657844/ /pubmed/33179041 http://dx.doi.org/10.1093/gigascience/giaa126 Text en © The Author(s) 2020. Published by Oxford University Press GigaScience. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note An, Shaokun Huang, Jizu Wan, Lin D-EE: Distributed software for visualizing intrinsic structure of large-scale single-cell data |
title | D-EE: Distributed software for visualizing intrinsic structure of large-scale single-cell data |
title_full | D-EE: Distributed software for visualizing intrinsic structure of large-scale single-cell data |
title_fullStr | D-EE: Distributed software for visualizing intrinsic structure of large-scale single-cell data |
title_full_unstemmed | D-EE: Distributed software for visualizing intrinsic structure of large-scale single-cell data |
title_short | D-EE: Distributed software for visualizing intrinsic structure of large-scale single-cell data |
title_sort | d-ee: distributed software for visualizing intrinsic structure of large-scale single-cell data |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7657844/ https://www.ncbi.nlm.nih.gov/pubmed/33179041 http://dx.doi.org/10.1093/gigascience/giaa126 |
work_keys_str_mv | AT anshaokun deedistributedsoftwareforvisualizingintrinsicstructureoflargescalesinglecelldata AT huangjizu deedistributedsoftwareforvisualizingintrinsicstructureoflargescalesinglecelldata AT wanlin deedistributedsoftwareforvisualizingintrinsicstructureoflargescalesinglecelldata |