Cargando…

Leveraging data-driven self-consistency for high-fidelity gene expression recovery

Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering thes...

Descripción completa

Detalles Bibliográficos
Autores principales: Islam, Md Tauhidul, Wang, Jen-Yeu, Ren, Hongyi, Li, Xiaomeng, Khuzani, Masoud Badiei, Sang, Shengtian, Yu, Lequan, Shen, Liyue, Zhao, Wei, Xing, Lei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681852/
https://www.ncbi.nlm.nih.gov/pubmed/36414658
http://dx.doi.org/10.1038/s41467-022-34595-w
_version_ 1784834718559109120
author Islam, Md Tauhidul
Wang, Jen-Yeu
Ren, Hongyi
Li, Xiaomeng
Khuzani, Masoud Badiei
Sang, Shengtian
Yu, Lequan
Shen, Liyue
Zhao, Wei
Xing, Lei
author_facet Islam, Md Tauhidul
Wang, Jen-Yeu
Ren, Hongyi
Li, Xiaomeng
Khuzani, Masoud Badiei
Sang, Shengtian
Yu, Lequan
Shen, Liyue
Zhao, Wei
Xing, Lei
author_sort Islam, Md Tauhidul
collection PubMed
description Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering these omitted expression values presents a challenge because of the large size of the data. Here, we introduce a data-driven gene expression recovery framework, referred to as self-consistent expression recovery machine (SERM), to impute the missing expressions. Using a neural network, the technique first learns the underlying data distribution from a subset of the noisy data. It then recovers the overall expression data by imposing a self-consistency on the expression matrix, thus ensuring that the expression levels are similarly distributed in different parts of the matrix. We show that SERM improves the accuracy of gene imputation with orders of magnitude enhancement in computational efficiency in comparison to the state-of-the-art imputation techniques.
format Online
Article
Text
id pubmed-9681852
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96818522022-11-24 Leveraging data-driven self-consistency for high-fidelity gene expression recovery Islam, Md Tauhidul Wang, Jen-Yeu Ren, Hongyi Li, Xiaomeng Khuzani, Masoud Badiei Sang, Shengtian Yu, Lequan Shen, Liyue Zhao, Wei Xing, Lei Nat Commun Article Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering these omitted expression values presents a challenge because of the large size of the data. Here, we introduce a data-driven gene expression recovery framework, referred to as self-consistent expression recovery machine (SERM), to impute the missing expressions. Using a neural network, the technique first learns the underlying data distribution from a subset of the noisy data. It then recovers the overall expression data by imposing a self-consistency on the expression matrix, thus ensuring that the expression levels are similarly distributed in different parts of the matrix. We show that SERM improves the accuracy of gene imputation with orders of magnitude enhancement in computational efficiency in comparison to the state-of-the-art imputation techniques. Nature Publishing Group UK 2022-11-21 /pmc/articles/PMC9681852/ /pubmed/36414658 http://dx.doi.org/10.1038/s41467-022-34595-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Islam, Md Tauhidul
Wang, Jen-Yeu
Ren, Hongyi
Li, Xiaomeng
Khuzani, Masoud Badiei
Sang, Shengtian
Yu, Lequan
Shen, Liyue
Zhao, Wei
Xing, Lei
Leveraging data-driven self-consistency for high-fidelity gene expression recovery
title Leveraging data-driven self-consistency for high-fidelity gene expression recovery
title_full Leveraging data-driven self-consistency for high-fidelity gene expression recovery
title_fullStr Leveraging data-driven self-consistency for high-fidelity gene expression recovery
title_full_unstemmed Leveraging data-driven self-consistency for high-fidelity gene expression recovery
title_short Leveraging data-driven self-consistency for high-fidelity gene expression recovery
title_sort leveraging data-driven self-consistency for high-fidelity gene expression recovery
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681852/
https://www.ncbi.nlm.nih.gov/pubmed/36414658
http://dx.doi.org/10.1038/s41467-022-34595-w
work_keys_str_mv AT islammdtauhidul leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery
AT wangjenyeu leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery
AT renhongyi leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery
AT lixiaomeng leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery
AT khuzanimasoudbadiei leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery
AT sangshengtian leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery
AT yulequan leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery
AT shenliyue leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery
AT zhaowei leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery
AT xinglei leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery