Cargando…

Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models

As the number of single‐cell transcriptomics datasets grows, the natural next step is to integrate the accumulating data to achieve a common ontology of cell types and states. However, it is not straightforward to compare gene expression levels across datasets and to automatically assign cell type l...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Chenling, Lopez, Romain, Mehlman, Edouard, Regier, Jeffrey, Jordan, Michael I, Yosef, Nir
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7829634/
https://www.ncbi.nlm.nih.gov/pubmed/33491336
http://dx.doi.org/10.15252/msb.20209620
_version_ 1783641215562940416
author Xu, Chenling
Lopez, Romain
Mehlman, Edouard
Regier, Jeffrey
Jordan, Michael I
Yosef, Nir
author_facet Xu, Chenling
Lopez, Romain
Mehlman, Edouard
Regier, Jeffrey
Jordan, Michael I
Yosef, Nir
author_sort Xu, Chenling
collection PubMed
description As the number of single‐cell transcriptomics datasets grows, the natural next step is to integrate the accumulating data to achieve a common ontology of cell types and states. However, it is not straightforward to compare gene expression levels across datasets and to automatically assign cell type labels in a new dataset based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of scRNA‐seq data, while accounting for uncertainty caused by biological and measurement noise. We also introduce single‐cell ANnotation using Variational Inference (scANVI), a semi‐supervised variant of scVI designed to leverage existing cell state annotations. We demonstrate that scVI and scANVI compare favorably to state‐of‐the‐art methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings. In contrast to existing methods, scVI and scANVI integrate multiple datasets with a single generative model that can be directly used for downstream tasks, such as differential expression. Both methods are easily accessible through scvi‐tools.
format Online
Article
Text
id pubmed-7829634
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-78296342021-01-29 Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models Xu, Chenling Lopez, Romain Mehlman, Edouard Regier, Jeffrey Jordan, Michael I Yosef, Nir Mol Syst Biol Articles As the number of single‐cell transcriptomics datasets grows, the natural next step is to integrate the accumulating data to achieve a common ontology of cell types and states. However, it is not straightforward to compare gene expression levels across datasets and to automatically assign cell type labels in a new dataset based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of scRNA‐seq data, while accounting for uncertainty caused by biological and measurement noise. We also introduce single‐cell ANnotation using Variational Inference (scANVI), a semi‐supervised variant of scVI designed to leverage existing cell state annotations. We demonstrate that scVI and scANVI compare favorably to state‐of‐the‐art methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings. In contrast to existing methods, scVI and scANVI integrate multiple datasets with a single generative model that can be directly used for downstream tasks, such as differential expression. Both methods are easily accessible through scvi‐tools. John Wiley and Sons Inc. 2021-01-25 /pmc/articles/PMC7829634/ /pubmed/33491336 http://dx.doi.org/10.15252/msb.20209620 Text en © 2021 The Authors. Published under the terms of the CC BY 4.0 license. This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Articles
Xu, Chenling
Lopez, Romain
Mehlman, Edouard
Regier, Jeffrey
Jordan, Michael I
Yosef, Nir
Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
title Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
title_full Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
title_fullStr Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
title_full_unstemmed Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
title_short Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
title_sort probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7829634/
https://www.ncbi.nlm.nih.gov/pubmed/33491336
http://dx.doi.org/10.15252/msb.20209620
work_keys_str_mv AT xuchenling probabilisticharmonizationandannotationofsinglecelltranscriptomicsdatawithdeepgenerativemodels
AT lopezromain probabilisticharmonizationandannotationofsinglecelltranscriptomicsdatawithdeepgenerativemodels
AT mehlmanedouard probabilisticharmonizationandannotationofsinglecelltranscriptomicsdatawithdeepgenerativemodels
AT regierjeffrey probabilisticharmonizationandannotationofsinglecelltranscriptomicsdatawithdeepgenerativemodels
AT jordanmichaeli probabilisticharmonizationandannotationofsinglecelltranscriptomicsdatawithdeepgenerativemodels
AT yosefnir probabilisticharmonizationandannotationofsinglecelltranscriptomicsdatawithdeepgenerativemodels