Cargando…

Deterministic column subset selection for single-cell RNA-Seq

Analysis of single-cell RNA sequencing (scRNA-Seq) data often involves filtering out uninteresting or poorly measured genes and dimensionality reduction to reduce noise and simplify data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity an...

Descripción completa

Detalles Bibliográficos
Autores principales: McCurdy, Shannon R., Ntranos, Vasilis, Pachter, Lior
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6347249/
https://www.ncbi.nlm.nih.gov/pubmed/30682053
http://dx.doi.org/10.1371/journal.pone.0210571
_version_ 1783389908038057984
author McCurdy, Shannon R.
Ntranos, Vasilis
Pachter, Lior
author_facet McCurdy, Shannon R.
Ntranos, Vasilis
Pachter, Lior
author_sort McCurdy, Shannon R.
collection PubMed
description Analysis of single-cell RNA sequencing (scRNA-Seq) data often involves filtering out uninteresting or poorly measured genes and dimensionality reduction to reduce noise and simplify data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity and sparsity structures present in the original matrices, and the coordinates of projected cells are not easily interpretable. Commonly used thresholding methods to filter genes avoid those pitfalls, but ignore collinearity and covariance in the original matrix. We show that a deterministic column subset selection (DCSS) method possesses many of the favorable properties of common thresholding methods and PCA, while avoiding pitfalls from both. We derive new spectral bounds for DCSS. We apply DCSS to two measures of gene expression from two scRNA-Seq experiments with different clustering workflows, and compare to three thresholding methods. In each case study, the clusters based on the small subset of the complete gene expression profile selected by DCSS are similar to clusters produced from the full set. The resulting clusters are informative for cell type.
format Online
Article
Text
id pubmed-6347249
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-63472492019-02-02 Deterministic column subset selection for single-cell RNA-Seq McCurdy, Shannon R. Ntranos, Vasilis Pachter, Lior PLoS One Research Article Analysis of single-cell RNA sequencing (scRNA-Seq) data often involves filtering out uninteresting or poorly measured genes and dimensionality reduction to reduce noise and simplify data visualization. However, techniques such as principal components analysis (PCA) fail to preserve non-negativity and sparsity structures present in the original matrices, and the coordinates of projected cells are not easily interpretable. Commonly used thresholding methods to filter genes avoid those pitfalls, but ignore collinearity and covariance in the original matrix. We show that a deterministic column subset selection (DCSS) method possesses many of the favorable properties of common thresholding methods and PCA, while avoiding pitfalls from both. We derive new spectral bounds for DCSS. We apply DCSS to two measures of gene expression from two scRNA-Seq experiments with different clustering workflows, and compare to three thresholding methods. In each case study, the clusters based on the small subset of the complete gene expression profile selected by DCSS are similar to clusters produced from the full set. The resulting clusters are informative for cell type. Public Library of Science 2019-01-25 /pmc/articles/PMC6347249/ /pubmed/30682053 http://dx.doi.org/10.1371/journal.pone.0210571 Text en © 2019 McCurdy et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
McCurdy, Shannon R.
Ntranos, Vasilis
Pachter, Lior
Deterministic column subset selection for single-cell RNA-Seq
title Deterministic column subset selection for single-cell RNA-Seq
title_full Deterministic column subset selection for single-cell RNA-Seq
title_fullStr Deterministic column subset selection for single-cell RNA-Seq
title_full_unstemmed Deterministic column subset selection for single-cell RNA-Seq
title_short Deterministic column subset selection for single-cell RNA-Seq
title_sort deterministic column subset selection for single-cell rna-seq
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6347249/
https://www.ncbi.nlm.nih.gov/pubmed/30682053
http://dx.doi.org/10.1371/journal.pone.0210571
work_keys_str_mv AT mccurdyshannonr deterministiccolumnsubsetselectionforsinglecellrnaseq
AT ntranosvasilis deterministiccolumnsubsetselectionforsinglecellrnaseq
AT pachterlior deterministiccolumnsubsetselectionforsinglecellrnaseq