Cargando…
Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data
The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8421403/ https://www.ncbi.nlm.nih.gov/pubmed/34489404 http://dx.doi.org/10.1038/s41467-021-25534-2 |
_version_ | 1783749075718373376 |
---|---|
author | Zhao, Yifan Cai, Huiyu Zhang, Zuobai Tang, Jian Li, Yue |
author_facet | Zhao, Yifan Cai, Huiyu Zhang, Zuobai Tang, Jian Li, Yue |
author_sort | Zhao, Yifan |
collection | PubMed |
description | The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch-effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 10(6) cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings. |
format | Online Article Text |
id | pubmed-8421403 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-84214032021-09-22 Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data Zhao, Yifan Cai, Huiyu Zhang, Zuobai Tang, Jian Li, Yue Nat Commun Article The advent of single-cell RNA sequencing (scRNA-seq) technologies has revolutionized transcriptomic studies. However, large-scale integrative analysis of scRNA-seq data remains a challenge largely due to unwanted batch effects and the limited transferabilty, interpretability, and scalability of the existing computational methods. We present single-cell Embedded Topic Model (scETM). Our key contribution is the utilization of a transferable neural-network-based encoder while having an interpretable linear decoder via a matrix tri-factorization. In particular, scETM simultaneously learns an encoder network to infer cell type mixture and a set of highly interpretable gene embeddings, topic embeddings, and batch-effect linear intercepts from multiple scRNA-seq datasets. scETM is scalable to over 10(6) cells and confers remarkable cross-tissue and cross-species zero-shot transfer-learning performance. Using gene set enrichment analysis, we find that scETM-learned topics are enriched in biologically meaningful and disease-related pathways. Lastly, scETM enables the incorporation of known gene sets into the gene embeddings, thereby directly learning the associations between pathways and topics via the topic embeddings. Nature Publishing Group UK 2021-09-06 /pmc/articles/PMC8421403/ /pubmed/34489404 http://dx.doi.org/10.1038/s41467-021-25534-2 Text en © The Author(s) 2021, corrected publication 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Zhao, Yifan Cai, Huiyu Zhang, Zuobai Tang, Jian Li, Yue Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data |
title | Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data |
title_full | Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data |
title_fullStr | Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data |
title_full_unstemmed | Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data |
title_short | Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data |
title_sort | learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8421403/ https://www.ncbi.nlm.nih.gov/pubmed/34489404 http://dx.doi.org/10.1038/s41467-021-25534-2 |
work_keys_str_mv | AT zhaoyifan learninginterpretablecellularandgenesignatureembeddingsfromsinglecelltranscriptomicdata AT caihuiyu learninginterpretablecellularandgenesignatureembeddingsfromsinglecelltranscriptomicdata AT zhangzuobai learninginterpretablecellularandgenesignatureembeddingsfromsinglecelltranscriptomicdata AT tangjian learninginterpretablecellularandgenesignatureembeddingsfromsinglecelltranscriptomicdata AT liyue learninginterpretablecellularandgenesignatureembeddingsfromsinglecelltranscriptomicdata |