Cargando…
Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data
MOTIVATION: Dimensionality reduction is a key step in the analysis of single-cell RNA-sequencing data. It produces a low-dimensional embedding for visualization and as a calculation base for downstream analysis. Nonlinear techniques are most suitable to handle the intrinsic complexity of large, hete...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7520047/ https://www.ncbi.nlm.nih.gov/pubmed/32207520 http://dx.doi.org/10.1093/bioinformatics/btaa198 |
_version_ | 1783587699357122560 |
---|---|
author | Angerer, Philipp Fischer, David S Theis, Fabian J Scialdone, Antonio Marr, Carsten |
author_facet | Angerer, Philipp Fischer, David S Theis, Fabian J Scialdone, Antonio Marr, Carsten |
author_sort | Angerer, Philipp |
collection | PubMed |
description | MOTIVATION: Dimensionality reduction is a key step in the analysis of single-cell RNA-sequencing data. It produces a low-dimensional embedding for visualization and as a calculation base for downstream analysis. Nonlinear techniques are most suitable to handle the intrinsic complexity of large, heterogeneous single-cell data. However, with no linear relation between gene and embedding coordinate, there is no way to extract the identity of genes driving any cell’s position in the low-dimensional embedding, making it difficult to characterize the underlying biological processes. RESULTS: In this article, we introduce the concepts of local and global gene relevance to compute an equivalent of principal component analysis loadings for non-linear low-dimensional embeddings. Global gene relevance identifies drivers of the overall embedding, while local gene relevance identifies those of a defined sub-region. We apply our method to single-cell RNA-seq datasets from different experimental protocols and to different low-dimensional embedding techniques. This shows our method’s versatility to identify key genes for a variety of biological processes. AVAILABILITY AND IMPLEMENTATION: To ensure reproducibility and ease of use, our method is released as part of destiny 3.0, a popular R package for building diffusion maps from single-cell transcriptomic data. It is readily available through Bioconductor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7520047 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-75200472020-09-30 Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data Angerer, Philipp Fischer, David S Theis, Fabian J Scialdone, Antonio Marr, Carsten Bioinformatics Original Papers MOTIVATION: Dimensionality reduction is a key step in the analysis of single-cell RNA-sequencing data. It produces a low-dimensional embedding for visualization and as a calculation base for downstream analysis. Nonlinear techniques are most suitable to handle the intrinsic complexity of large, heterogeneous single-cell data. However, with no linear relation between gene and embedding coordinate, there is no way to extract the identity of genes driving any cell’s position in the low-dimensional embedding, making it difficult to characterize the underlying biological processes. RESULTS: In this article, we introduce the concepts of local and global gene relevance to compute an equivalent of principal component analysis loadings for non-linear low-dimensional embeddings. Global gene relevance identifies drivers of the overall embedding, while local gene relevance identifies those of a defined sub-region. We apply our method to single-cell RNA-seq datasets from different experimental protocols and to different low-dimensional embedding techniques. This shows our method’s versatility to identify key genes for a variety of biological processes. AVAILABILITY AND IMPLEMENTATION: To ensure reproducibility and ease of use, our method is released as part of destiny 3.0, a popular R package for building diffusion maps from single-cell transcriptomic data. It is readily available through Bioconductor. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-03-24 /pmc/articles/PMC7520047/ /pubmed/32207520 http://dx.doi.org/10.1093/bioinformatics/btaa198 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Angerer, Philipp Fischer, David S Theis, Fabian J Scialdone, Antonio Marr, Carsten Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data |
title | Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data |
title_full | Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data |
title_fullStr | Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data |
title_full_unstemmed | Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data |
title_short | Automatic identification of relevant genes from low-dimensional embeddings of single-cell RNA-seq data |
title_sort | automatic identification of relevant genes from low-dimensional embeddings of single-cell rna-seq data |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7520047/ https://www.ncbi.nlm.nih.gov/pubmed/32207520 http://dx.doi.org/10.1093/bioinformatics/btaa198 |
work_keys_str_mv | AT angererphilipp automaticidentificationofrelevantgenesfromlowdimensionalembeddingsofsinglecellrnaseqdata AT fischerdavids automaticidentificationofrelevantgenesfromlowdimensionalembeddingsofsinglecellrnaseqdata AT theisfabianj automaticidentificationofrelevantgenesfromlowdimensionalembeddingsofsinglecellrnaseqdata AT scialdoneantonio automaticidentificationofrelevantgenesfromlowdimensionalembeddingsofsinglecellrnaseqdata AT marrcarsten automaticidentificationofrelevantgenesfromlowdimensionalembeddingsofsinglecellrnaseqdata |