Cargando…

Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space

Computational tools for integrative analyses of diverse single-cell experiments are facing formidable new challenges including dramatic increases in data scale, sample heterogeneity, and the need to informatively cross-reference new data with foundational datasets. Here, we present SCALEX, a deep-le...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiong, Lei, Tian, Kang, Li, Yuzhe, Ning, Weixi, Gao, Xin, Zhang, Qiangfeng Cliff
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9574176/
https://www.ncbi.nlm.nih.gov/pubmed/36253379
http://dx.doi.org/10.1038/s41467-022-33758-z
_version_ 1784811048232026112
author Xiong, Lei
Tian, Kang
Li, Yuzhe
Ning, Weixi
Gao, Xin
Zhang, Qiangfeng Cliff
author_facet Xiong, Lei
Tian, Kang
Li, Yuzhe
Ning, Weixi
Gao, Xin
Zhang, Qiangfeng Cliff
author_sort Xiong, Lei
collection PubMed
description Computational tools for integrative analyses of diverse single-cell experiments are facing formidable new challenges including dramatic increases in data scale, sample heterogeneity, and the need to informatively cross-reference new data with foundational datasets. Here, we present SCALEX, a deep-learning method that integrates single-cell data by projecting cells into a batch-invariant, common cell-embedding space in a truly online manner (i.e., without retraining the model). SCALEX substantially outperforms online iNMF and other state-of-the-art non-online integration methods on benchmark single-cell datasets of diverse modalities, (e.g., single-cell RNA sequencing, scRNA-seq, single-cell assay for transposase-accessible chromatin use sequencing, scATAC-seq), especially for datasets with partial overlaps, accurately aligning similar cell populations while retaining true biological differences. We showcase SCALEX’s advantages by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19 patients, each assembled from diverse data sources and growing with every new data. The online data integration capacity and superior performance makes SCALEX particularly appropriate for large-scale single-cell applications to build upon previous scientific insights.
format Online
Article
Text
id pubmed-9574176
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-95741762022-10-17 Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space Xiong, Lei Tian, Kang Li, Yuzhe Ning, Weixi Gao, Xin Zhang, Qiangfeng Cliff Nat Commun Article Computational tools for integrative analyses of diverse single-cell experiments are facing formidable new challenges including dramatic increases in data scale, sample heterogeneity, and the need to informatively cross-reference new data with foundational datasets. Here, we present SCALEX, a deep-learning method that integrates single-cell data by projecting cells into a batch-invariant, common cell-embedding space in a truly online manner (i.e., without retraining the model). SCALEX substantially outperforms online iNMF and other state-of-the-art non-online integration methods on benchmark single-cell datasets of diverse modalities, (e.g., single-cell RNA sequencing, scRNA-seq, single-cell assay for transposase-accessible chromatin use sequencing, scATAC-seq), especially for datasets with partial overlaps, accurately aligning similar cell populations while retaining true biological differences. We showcase SCALEX’s advantages by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19 patients, each assembled from diverse data sources and growing with every new data. The online data integration capacity and superior performance makes SCALEX particularly appropriate for large-scale single-cell applications to build upon previous scientific insights. Nature Publishing Group UK 2022-10-17 /pmc/articles/PMC9574176/ /pubmed/36253379 http://dx.doi.org/10.1038/s41467-022-33758-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Xiong, Lei
Tian, Kang
Li, Yuzhe
Ning, Weixi
Gao, Xin
Zhang, Qiangfeng Cliff
Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
title Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
title_full Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
title_fullStr Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
title_full_unstemmed Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
title_short Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
title_sort online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9574176/
https://www.ncbi.nlm.nih.gov/pubmed/36253379
http://dx.doi.org/10.1038/s41467-022-33758-z
work_keys_str_mv AT xionglei onlinesinglecelldataintegrationthroughprojectingheterogeneousdatasetsintoacommoncellembeddingspace
AT tiankang onlinesinglecelldataintegrationthroughprojectingheterogeneousdatasetsintoacommoncellembeddingspace
AT liyuzhe onlinesinglecelldataintegrationthroughprojectingheterogeneousdatasetsintoacommoncellembeddingspace
AT ningweixi onlinesinglecelldataintegrationthroughprojectingheterogeneousdatasetsintoacommoncellembeddingspace
AT gaoxin onlinesinglecelldataintegrationthroughprojectingheterogeneousdatasetsintoacommoncellembeddingspace
AT zhangqiangfengcliff onlinesinglecelldataintegrationthroughprojectingheterogeneousdatasetsintoacommoncellembeddingspace