Cargando…

Leveraging data-driven self-consistency for high-fidelity gene expression recovery

Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering thes...

Descripción completa

Detalles Bibliográficos
Autores principales: Islam, Md Tauhidul, Wang, Jen-Yeu, Ren, Hongyi, Li, Xiaomeng, Khuzani, Masoud Badiei, Sang, Shengtian, Yu, Lequan, Shen, Liyue, Zhao, Wei, Xing, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681852/
https://www.ncbi.nlm.nih.gov/pubmed/36414658
http://dx.doi.org/10.1038/s41467-022-34595-w
Descripción
Sumario:Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering these omitted expression values presents a challenge because of the large size of the data. Here, we introduce a data-driven gene expression recovery framework, referred to as self-consistent expression recovery machine (SERM), to impute the missing expressions. Using a neural network, the technique first learns the underlying data distribution from a subset of the noisy data. It then recovers the overall expression data by imposing a self-consistency on the expression matrix, thus ensuring that the expression levels are similarly distributed in different parts of the matrix. We show that SERM improves the accuracy of gene imputation with orders of magnitude enhancement in computational efficiency in comparison to the state-of-the-art imputation techniques.