Cargando…

Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT

Recent advances in single-cell technologies enable joint profiling of multiple omics. These profiles can reveal the complex interplay of different regulatory layers in single cells; still, new challenges arise when integrating datasets with some features shared across experiments and others exclusiv...

Descripción completa

Detalles Bibliográficos
Autores principales: Du, Jin-Hong, Cai, Zhanrui, Roeder, Kathryn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9894175/
https://www.ncbi.nlm.nih.gov/pubmed/36459654
http://dx.doi.org/10.1073/pnas.2214414119
_version_ 1784881687180607488
author Du, Jin-Hong
Cai, Zhanrui
Roeder, Kathryn
author_facet Du, Jin-Hong
Cai, Zhanrui
Roeder, Kathryn
author_sort Du, Jin-Hong
collection PubMed
description Recent advances in single-cell technologies enable joint profiling of multiple omics. These profiles can reveal the complex interplay of different regulatory layers in single cells; still, new challenges arise when integrating datasets with some features shared across experiments and others exclusive to a single source; combining information across these sources is called mosaic integration. The difficulties lie in imputing missing molecular layers to build a self-consistent atlas, finding a common latent space, and transferring learning to new data sources robustly. Existing mosaic integration approaches based on matrix factorization cannot efficiently adapt to nonlinear embeddings for the latent cell space and are not designed for accurate imputation of missing molecular layers. By contrast, we propose a probabilistic variational autoencoder model, scVAEIT, to integrate and impute multimodal datasets with mosaic measurements. A key advance is the use of a missing mask for learning the conditional distribution of unobserved modalities and features, which makes scVAEIT flexible to combine different panels of measurements from multimodal datasets accurately and in an end-to-end manner. Imputing the masked features serves as a supervised learning procedure while preventing overfitting by regularization. Focusing on gene expression, protein abundance, and chromatin accessibility, we validate that scVAEIT robustly imputes the missing modalities and features of cells biologically different from the training data. scVAEIT also adjusts for batch effects while maintaining the biological variation, which provides better latent representations for the integrated datasets. We demonstrate that scVAEIT significantly improves integration and imputation across unseen cell types, different technologies, and different tissues.
format Online
Article
Text
id pubmed-9894175
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-98941752023-02-03 Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT Du, Jin-Hong Cai, Zhanrui Roeder, Kathryn Proc Natl Acad Sci U S A Biological Sciences Recent advances in single-cell technologies enable joint profiling of multiple omics. These profiles can reveal the complex interplay of different regulatory layers in single cells; still, new challenges arise when integrating datasets with some features shared across experiments and others exclusive to a single source; combining information across these sources is called mosaic integration. The difficulties lie in imputing missing molecular layers to build a self-consistent atlas, finding a common latent space, and transferring learning to new data sources robustly. Existing mosaic integration approaches based on matrix factorization cannot efficiently adapt to nonlinear embeddings for the latent cell space and are not designed for accurate imputation of missing molecular layers. By contrast, we propose a probabilistic variational autoencoder model, scVAEIT, to integrate and impute multimodal datasets with mosaic measurements. A key advance is the use of a missing mask for learning the conditional distribution of unobserved modalities and features, which makes scVAEIT flexible to combine different panels of measurements from multimodal datasets accurately and in an end-to-end manner. Imputing the masked features serves as a supervised learning procedure while preventing overfitting by regularization. Focusing on gene expression, protein abundance, and chromatin accessibility, we validate that scVAEIT robustly imputes the missing modalities and features of cells biologically different from the training data. scVAEIT also adjusts for batch effects while maintaining the biological variation, which provides better latent representations for the integrated datasets. We demonstrate that scVAEIT significantly improves integration and imputation across unseen cell types, different technologies, and different tissues. National Academy of Sciences 2022-12-02 2022-12-06 /pmc/articles/PMC9894175/ /pubmed/36459654 http://dx.doi.org/10.1073/pnas.2214414119 Text en Copyright © 2022 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Biological Sciences
Du, Jin-Hong
Cai, Zhanrui
Roeder, Kathryn
Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT
title Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT
title_full Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT
title_fullStr Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT
title_full_unstemmed Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT
title_short Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT
title_sort robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scvaeit
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9894175/
https://www.ncbi.nlm.nih.gov/pubmed/36459654
http://dx.doi.org/10.1073/pnas.2214414119
work_keys_str_mv AT dujinhong robustprobabilisticmodelingforsinglecellmultimodalmosaicintegrationandimputationviascvaeit
AT caizhanrui robustprobabilisticmodelingforsinglecellmultimodalmosaicintegrationandimputationviascvaeit
AT roederkathryn robustprobabilisticmodelingforsinglecellmultimodalmosaicintegrationandimputationviascvaeit