Cargando…

Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data

Single-cell sequencing technology enables the simultaneous capture of multiomic data from multiple cells. The captured data can be represented by tensors, i.e. the higher-rank matrices. However, the existing analysis tools often take the data as a collection of two-order matrices, renouncing the cor...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Ruo Han, Wang, Jianping, Li, Shuai Cheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10450184/
https://www.ncbi.nlm.nih.gov/pubmed/37403780
http://dx.doi.org/10.1093/nar/gkad570
_version_ 1785095142146834432
author Wang, Ruo Han
Wang, Jianping
Li, Shuai Cheng
author_facet Wang, Ruo Han
Wang, Jianping
Li, Shuai Cheng
author_sort Wang, Ruo Han
collection PubMed
description Single-cell sequencing technology enables the simultaneous capture of multiomic data from multiple cells. The captured data can be represented by tensors, i.e. the higher-rank matrices. However, the existing analysis tools often take the data as a collection of two-order matrices, renouncing the correspondences among the features. Consequently, we propose a probabilistic tensor decomposition framework, SCOIT, to extract embeddings from single-cell multiomic data. SCOIT incorporates various distributions, including Gaussian, Poisson, and negative binomial distributions, to deal with sparse, noisy, and heterogeneous single-cell data. Our framework can decompose a multiomic tensor into a cell embedding matrix, a gene embedding matrix, and an omic embedding matrix, allowing for various downstream analyses. We applied SCOIT to eight single-cell multiomic datasets from different sequencing protocols. With cell embeddings, SCOIT achieves superior performance for cell clustering compared to nine state-of-the-art tools under various metrics, demonstrating its ability to dissect cellular heterogeneity. With the gene embeddings, SCOIT enables cross-omics gene expression analysis and integrative gene regulatory network study. Furthermore, the embeddings allow cross-omics imputation simultaneously, outperforming current imputation methods with the Pearson correlation coefficient increased by 3.38–39.26%; moreover, SCOIT accommodates the scenario that subsets of the cells are with merely one omic profile available.
format Online
Article
Text
id pubmed-10450184
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104501842023-08-26 Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data Wang, Ruo Han Wang, Jianping Li, Shuai Cheng Nucleic Acids Res Methods Online Single-cell sequencing technology enables the simultaneous capture of multiomic data from multiple cells. The captured data can be represented by tensors, i.e. the higher-rank matrices. However, the existing analysis tools often take the data as a collection of two-order matrices, renouncing the correspondences among the features. Consequently, we propose a probabilistic tensor decomposition framework, SCOIT, to extract embeddings from single-cell multiomic data. SCOIT incorporates various distributions, including Gaussian, Poisson, and negative binomial distributions, to deal with sparse, noisy, and heterogeneous single-cell data. Our framework can decompose a multiomic tensor into a cell embedding matrix, a gene embedding matrix, and an omic embedding matrix, allowing for various downstream analyses. We applied SCOIT to eight single-cell multiomic datasets from different sequencing protocols. With cell embeddings, SCOIT achieves superior performance for cell clustering compared to nine state-of-the-art tools under various metrics, demonstrating its ability to dissect cellular heterogeneity. With the gene embeddings, SCOIT enables cross-omics gene expression analysis and integrative gene regulatory network study. Furthermore, the embeddings allow cross-omics imputation simultaneously, outperforming current imputation methods with the Pearson correlation coefficient increased by 3.38–39.26%; moreover, SCOIT accommodates the scenario that subsets of the cells are with merely one omic profile available. Oxford University Press 2023-07-05 /pmc/articles/PMC10450184/ /pubmed/37403780 http://dx.doi.org/10.1093/nar/gkad570 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Wang, Ruo Han
Wang, Jianping
Li, Shuai Cheng
Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data
title Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data
title_full Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data
title_fullStr Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data
title_full_unstemmed Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data
title_short Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data
title_sort probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10450184/
https://www.ncbi.nlm.nih.gov/pubmed/37403780
http://dx.doi.org/10.1093/nar/gkad570
work_keys_str_mv AT wangruohan probabilistictensordecompositionextractsbetterlatentembeddingsfromsinglecellmultiomicdata
AT wangjianping probabilistictensordecompositionextractsbetterlatentembeddingsfromsinglecellmultiomicdata
AT lishuaicheng probabilistictensordecompositionextractsbetterlatentembeddingsfromsinglecellmultiomicdata