Cargando…

Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data

The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Yifan, Cai, Huiyu, Zhang, Zuobai, Tang, Jian, Li, Yue
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8421403/
https://www.ncbi.nlm.nih.gov/pubmed/34489404
http://dx.doi.org/10.1038/s41467-021-25534-2
_version_ 1783749075718373376
author Zhao, Yifan
Cai, Huiyu
Zhang, Zuobai
Tang, Jian
Li, Yue
author_facet Zhao, Yifan
Cai, Huiyu
Zhang, Zuobai
Tang, Jian
Li, Yue
author_sort Zhao, Yifan
collection PubMed
description The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch-effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 10(6) cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings.
format Online
Article
Text
id pubmed-8421403
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-84214032021-09-22 Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data Zhao, Yifan Cai, Huiyu Zhang, Zuobai Tang, Jian Li, Yue Nat Commun Article The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch-effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 10(6) cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings. Nature Publishing Group UK 2021-09-06 /pmc/articles/PMC8421403/ /pubmed/34489404 http://dx.doi.org/10.1038/s41467-021-25534-2 Text en © The Author(s) 2021, corrected publication 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Zhao, Yifan
Cai, Huiyu
Zhang, Zuobai
Tang, Jian
Li, Yue
Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data
title Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data
title_full Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data
title_fullStr Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data
title_full_unstemmed Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data
title_short Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data
title_sort learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8421403/
https://www.ncbi.nlm.nih.gov/pubmed/34489404
http://dx.doi.org/10.1038/s41467-021-25534-2
work_keys_str_mv AT zhaoyifan learninginterpretablecellularandgenesignatureembeddingsfromsinglecelltranscriptomicdata
AT caihuiyu learninginterpretablecellularandgenesignatureembeddingsfromsinglecelltranscriptomicdata
AT zhangzuobai learninginterpretablecellularandgenesignatureembeddingsfromsinglecelltranscriptomicdata
AT tangjian learninginterpretablecellularandgenesignatureembeddingsfromsinglecelltranscriptomicdata
AT liyue learninginterpretablecellularandgenesignatureembeddingsfromsinglecelltranscriptomicdata