Cargando…
Leveraging data-driven self-consistency for high-fidelity gene expression recovery
Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering thes...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681852/ https://www.ncbi.nlm.nih.gov/pubmed/36414658 http://dx.doi.org/10.1038/s41467-022-34595-w |
_version_ | 1784834718559109120 |
---|---|
author | Islam, Md Tauhidul Wang, Jen-Yeu Ren, Hongyi Li, Xiaomeng Khuzani, Masoud Badiei Sang, Shengtian Yu, Lequan Shen, Liyue Zhao, Wei Xing, Lei |
author_facet | Islam, Md Tauhidul Wang, Jen-Yeu Ren, Hongyi Li, Xiaomeng Khuzani, Masoud Badiei Sang, Shengtian Yu, Lequan Shen, Liyue Zhao, Wei Xing, Lei |
author_sort | Islam, Md Tauhidul |
collection | PubMed |
description | Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering these omitted expression values presents a challenge because of the large size of the data. Here, we introduce a data-driven gene expression recovery framework, referred to as self-consistent expression recovery machine (SERM), to impute the missing expressions. Using a neural network, the technique first learns the underlying data distribution from a subset of the noisy data. It then recovers the overall expression data by imposing a self-consistency on the expression matrix, thus ensuring that the expression levels are similarly distributed in different parts of the matrix. We show that SERM improves the accuracy of gene imputation with orders of magnitude enhancement in computational efficiency in comparison to the state-of-the-art imputation techniques. |
format | Online Article Text |
id | pubmed-9681852 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-96818522022-11-24 Leveraging data-driven self-consistency for high-fidelity gene expression recovery Islam, Md Tauhidul Wang, Jen-Yeu Ren, Hongyi Li, Xiaomeng Khuzani, Masoud Badiei Sang, Shengtian Yu, Lequan Shen, Liyue Zhao, Wei Xing, Lei Nat Commun Article Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering these omitted expression values presents a challenge because of the large size of the data. Here, we introduce a data-driven gene expression recovery framework, referred to as self-consistent expression recovery machine (SERM), to impute the missing expressions. Using a neural network, the technique first learns the underlying data distribution from a subset of the noisy data. It then recovers the overall expression data by imposing a self-consistency on the expression matrix, thus ensuring that the expression levels are similarly distributed in different parts of the matrix. We show that SERM improves the accuracy of gene imputation with orders of magnitude enhancement in computational efficiency in comparison to the state-of-the-art imputation techniques. Nature Publishing Group UK 2022-11-21 /pmc/articles/PMC9681852/ /pubmed/36414658 http://dx.doi.org/10.1038/s41467-022-34595-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Islam, Md Tauhidul Wang, Jen-Yeu Ren, Hongyi Li, Xiaomeng Khuzani, Masoud Badiei Sang, Shengtian Yu, Lequan Shen, Liyue Zhao, Wei Xing, Lei Leveraging data-driven self-consistency for high-fidelity gene expression recovery |
title | Leveraging data-driven self-consistency for high-fidelity gene expression recovery |
title_full | Leveraging data-driven self-consistency for high-fidelity gene expression recovery |
title_fullStr | Leveraging data-driven self-consistency for high-fidelity gene expression recovery |
title_full_unstemmed | Leveraging data-driven self-consistency for high-fidelity gene expression recovery |
title_short | Leveraging data-driven self-consistency for high-fidelity gene expression recovery |
title_sort | leveraging data-driven self-consistency for high-fidelity gene expression recovery |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9681852/ https://www.ncbi.nlm.nih.gov/pubmed/36414658 http://dx.doi.org/10.1038/s41467-022-34595-w |
work_keys_str_mv | AT islammdtauhidul leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery AT wangjenyeu leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery AT renhongyi leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery AT lixiaomeng leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery AT khuzanimasoudbadiei leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery AT sangshengtian leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery AT yulequan leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery AT shenliyue leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery AT zhaowei leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery AT xinglei leveragingdatadrivenselfconsistencyforhighfidelitygeneexpressionrecovery |