Cargando…

Sparsity-Penalized Stacked Denoising Autoencoders for Imputing Single-Cell RNA-seq Data

Single-cell RNA-seq (scRNA-seq) is quite prevalent in studying transcriptomes, but it suffers from excessive zeros, some of which are true, but others are false. False zeros, which can be seen as missing data, obstruct the downstream analysis of single-cell RNA-seq data. How to distinguish true zero...

Descripción completa

Detalles Bibliográficos
Autores principales: Chi, Weilai, Deng, Minghua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7291078/
https://www.ncbi.nlm.nih.gov/pubmed/32403260
http://dx.doi.org/10.3390/genes11050532
_version_ 1783545825944666112
author Chi, Weilai
Deng, Minghua
author_facet Chi, Weilai
Deng, Minghua
author_sort Chi, Weilai
collection PubMed
description Single-cell RNA-seq (scRNA-seq) is quite prevalent in studying transcriptomes, but it suffers from excessive zeros, some of which are true, but others are false. False zeros, which can be seen as missing data, obstruct the downstream analysis of single-cell RNA-seq data. How to distinguish true zeros from false ones is the key point of this problem. Here, we propose sparsity-penalized stacked denoising autoencoders (scSDAEs) to impute scRNA-seq data. scSDAEs adopt stacked denoising autoencoders with a sparsity penalty, as well as a layer-wise pretraining procedure to improve model fitting. scSDAEs can capture nonlinear relationships among the data and incorporate information about the observed zeros. We tested the imputation efficiency of scSDAEs on recovering the true values of gene expression and helping downstream analysis. First, we show that scSDAE can recover the true values and the sample–sample correlations of bulk sequencing data with simulated noise. Next, we demonstrate that scSDAEs accurately impute RNA mixture dataset with different dilutions, spike-in RNA concentrations affected by technical zeros, and improves the consistency of RNA and protein levels in CITE-seq data. Finally, we show that scSDAEs can help downstream clustering analysis. In this study, we develop a deep learning-based method, scSDAE, to impute single-cell RNA-seq affected by technical zeros. Furthermore, we show that scSDAEs can recover the true values, to some extent, and help downstream analysis.
format Online
Article
Text
id pubmed-7291078
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-72910782020-06-19 Sparsity-Penalized Stacked Denoising Autoencoders for Imputing Single-Cell RNA-seq Data Chi, Weilai Deng, Minghua Genes (Basel) Article Single-cell RNA-seq (scRNA-seq) is quite prevalent in studying transcriptomes, but it suffers from excessive zeros, some of which are true, but others are false. False zeros, which can be seen as missing data, obstruct the downstream analysis of single-cell RNA-seq data. How to distinguish true zeros from false ones is the key point of this problem. Here, we propose sparsity-penalized stacked denoising autoencoders (scSDAEs) to impute scRNA-seq data. scSDAEs adopt stacked denoising autoencoders with a sparsity penalty, as well as a layer-wise pretraining procedure to improve model fitting. scSDAEs can capture nonlinear relationships among the data and incorporate information about the observed zeros. We tested the imputation efficiency of scSDAEs on recovering the true values of gene expression and helping downstream analysis. First, we show that scSDAE can recover the true values and the sample–sample correlations of bulk sequencing data with simulated noise. Next, we demonstrate that scSDAEs accurately impute RNA mixture dataset with different dilutions, spike-in RNA concentrations affected by technical zeros, and improves the consistency of RNA and protein levels in CITE-seq data. Finally, we show that scSDAEs can help downstream clustering analysis. In this study, we develop a deep learning-based method, scSDAE, to impute single-cell RNA-seq affected by technical zeros. Furthermore, we show that scSDAEs can recover the true values, to some extent, and help downstream analysis. MDPI 2020-05-11 /pmc/articles/PMC7291078/ /pubmed/32403260 http://dx.doi.org/10.3390/genes11050532 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chi, Weilai
Deng, Minghua
Sparsity-Penalized Stacked Denoising Autoencoders for Imputing Single-Cell RNA-seq Data
title Sparsity-Penalized Stacked Denoising Autoencoders for Imputing Single-Cell RNA-seq Data
title_full Sparsity-Penalized Stacked Denoising Autoencoders for Imputing Single-Cell RNA-seq Data
title_fullStr Sparsity-Penalized Stacked Denoising Autoencoders for Imputing Single-Cell RNA-seq Data
title_full_unstemmed Sparsity-Penalized Stacked Denoising Autoencoders for Imputing Single-Cell RNA-seq Data
title_short Sparsity-Penalized Stacked Denoising Autoencoders for Imputing Single-Cell RNA-seq Data
title_sort sparsity-penalized stacked denoising autoencoders for imputing single-cell rna-seq data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7291078/
https://www.ncbi.nlm.nih.gov/pubmed/32403260
http://dx.doi.org/10.3390/genes11050532
work_keys_str_mv AT chiweilai sparsitypenalizedstackeddenoisingautoencodersforimputingsinglecellrnaseqdata
AT dengminghua sparsitypenalizedstackeddenoisingautoencodersforimputingsinglecellrnaseqdata