Cargando…

AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction

Single-cell RNA sequencing (scRNA-seq) technology has become an effective tool for high-throughout transcriptomic study, which circumvents the averaging artifacts corresponding to bulk RNA-seq technology, yielding new perspectives on the cellular diversity of potential superficially homogeneous popu...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Shuchang, Zhang, Li, Liu, Xuejun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Higher Education Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9607720/
https://www.ncbi.nlm.nih.gov/pubmed/36320820
http://dx.doi.org/10.1007/s11704-022-2011-y
_version_ 1784818615765172224
author Zhao, Shuchang
Zhang, Li
Liu, Xuejun
author_facet Zhao, Shuchang
Zhang, Li
Liu, Xuejun
author_sort Zhao, Shuchang
collection PubMed
description Single-cell RNA sequencing (scRNA-seq) technology has become an effective tool for high-throughout transcriptomic study, which circumvents the averaging artifacts corresponding to bulk RNA-seq technology, yielding new perspectives on the cellular diversity of potential superficially homogeneous populations. Although various sequencing techniques have decreased the amplification bias and improved capture efficiency caused by the low amount of starting material, the technical noise and biological variation are inevitably introduced into experimental process, resulting in high dropout events, which greatly hinder the downstream analysis. Considering the bimodal expression pattern and the right-skewed characteristic existed in normalized scRNA-seq data, we propose a customized autoencoder based on a two-part-generalized-gamma distribution (AE-TPGG) for scRNA-seq data analysis, which takes mixed discrete-continuous random variables of scRNA-seq data into account using a two-part model and utilizes the generalized gamma (GG) distribution, for fitting the positive and right-skewed continuous data. The adopted autoencoder enables AE-TPGG to captures the inherent relationship between genes. In addition to the ability of achieving low-dimensional representation, the AE-TPGG model also provides a denoised imputation according to statistical characteristic of gene expression. Results on real datasets demonstrate that our proposed model is competitive to current imputation methods and ameliorates a diverse set of typical scRNA-seq data analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: Supplementary material is available in the online version of this article at 10.1007/s11704-022-2011-y.
format Online
Article
Text
id pubmed-9607720
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Higher Education Press
record_format MEDLINE/PubMed
spelling pubmed-96077202022-10-28 AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction Zhao, Shuchang Zhang, Li Liu, Xuejun Front Comput Sci Research Article Single-cell RNA sequencing (scRNA-seq) technology has become an effective tool for high-throughout transcriptomic study, which circumvents the averaging artifacts corresponding to bulk RNA-seq technology, yielding new perspectives on the cellular diversity of potential superficially homogeneous populations. Although various sequencing techniques have decreased the amplification bias and improved capture efficiency caused by the low amount of starting material, the technical noise and biological variation are inevitably introduced into experimental process, resulting in high dropout events, which greatly hinder the downstream analysis. Considering the bimodal expression pattern and the right-skewed characteristic existed in normalized scRNA-seq data, we propose a customized autoencoder based on a two-part-generalized-gamma distribution (AE-TPGG) for scRNA-seq data analysis, which takes mixed discrete-continuous random variables of scRNA-seq data into account using a two-part model and utilizes the generalized gamma (GG) distribution, for fitting the positive and right-skewed continuous data. The adopted autoencoder enables AE-TPGG to captures the inherent relationship between genes. In addition to the ability of achieving low-dimensional representation, the AE-TPGG model also provides a denoised imputation according to statistical characteristic of gene expression. Results on real datasets demonstrate that our proposed model is competitive to current imputation methods and ameliorates a diverse set of typical scRNA-seq data analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: Supplementary material is available in the online version of this article at 10.1007/s11704-022-2011-y. Higher Education Press 2022-10-26 2023 /pmc/articles/PMC9607720/ /pubmed/36320820 http://dx.doi.org/10.1007/s11704-022-2011-y Text en © Higher Education Press 2023 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Research Article
Zhao, Shuchang
Zhang, Li
Liu, Xuejun
AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction
title AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction
title_full AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction
title_fullStr AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction
title_full_unstemmed AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction
title_short AE-TPGG: a novel autoencoder-based approach for single-cell RNA-seq data imputation and dimensionality reduction
title_sort ae-tpgg: a novel autoencoder-based approach for single-cell rna-seq data imputation and dimensionality reduction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9607720/
https://www.ncbi.nlm.nih.gov/pubmed/36320820
http://dx.doi.org/10.1007/s11704-022-2011-y
work_keys_str_mv AT zhaoshuchang aetpgganovelautoencoderbasedapproachforsinglecellrnaseqdataimputationanddimensionalityreduction
AT zhangli aetpgganovelautoencoderbasedapproachforsinglecellrnaseqdataimputationanddimensionalityreduction
AT liuxuejun aetpgganovelautoencoderbasedapproachforsinglecellrnaseqdataimputationanddimensionalityreduction