Cargando…

Mapping single-cell data to reference atlases by transfer learning

Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introdu...

Descripción completa

Detalles Bibliográficos
Autores principales: Lotfollahi, Mohammad, Naghipourfar, Mohsen, Luecken, Malte D., Khajavi, Matin, Büttner, Maren, Wagenstetter, Marco, Avsec, Žiga, Gayoso, Adam, Yosef, Nir, Interlandi, Marta, Rybakov, Sergei, Misharin, Alexander V., Theis, Fabian J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group US 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8763644/
https://www.ncbi.nlm.nih.gov/pubmed/34462589
http://dx.doi.org/10.1038/s41587-021-01001-7
_version_ 1784633992637579264
author Lotfollahi, Mohammad
Naghipourfar, Mohsen
Luecken, Malte D.
Khajavi, Matin
Büttner, Maren
Wagenstetter, Marco
Avsec, Žiga
Gayoso, Adam
Yosef, Nir
Interlandi, Marta
Rybakov, Sergei
Misharin, Alexander V.
Theis, Fabian J.
author_facet Lotfollahi, Mohammad
Naghipourfar, Mohsen
Luecken, Malte D.
Khajavi, Matin
Büttner, Maren
Wagenstetter, Marco
Avsec, Žiga
Gayoso, Adam
Yosef, Nir
Interlandi, Marta
Rybakov, Sergei
Misharin, Alexander V.
Theis, Fabian J.
author_sort Lotfollahi, Mohammad
collection PubMed
description Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.
format Online
Article
Text
id pubmed-8763644
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group US
record_format MEDLINE/PubMed
spelling pubmed-87636442022-01-26 Mapping single-cell data to reference atlases by transfer learning Lotfollahi, Mohammad Naghipourfar, Mohsen Luecken, Malte D. Khajavi, Matin Büttner, Maren Wagenstetter, Marco Avsec, Žiga Gayoso, Adam Yosef, Nir Interlandi, Marta Rybakov, Sergei Misharin, Alexander V. Theis, Fabian J. Nat Biotechnol Analysis Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases. Nature Publishing Group US 2021-08-30 2022 /pmc/articles/PMC8763644/ /pubmed/34462589 http://dx.doi.org/10.1038/s41587-021-01001-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Analysis
Lotfollahi, Mohammad
Naghipourfar, Mohsen
Luecken, Malte D.
Khajavi, Matin
Büttner, Maren
Wagenstetter, Marco
Avsec, Žiga
Gayoso, Adam
Yosef, Nir
Interlandi, Marta
Rybakov, Sergei
Misharin, Alexander V.
Theis, Fabian J.
Mapping single-cell data to reference atlases by transfer learning
title Mapping single-cell data to reference atlases by transfer learning
title_full Mapping single-cell data to reference atlases by transfer learning
title_fullStr Mapping single-cell data to reference atlases by transfer learning
title_full_unstemmed Mapping single-cell data to reference atlases by transfer learning
title_short Mapping single-cell data to reference atlases by transfer learning
title_sort mapping single-cell data to reference atlases by transfer learning
topic Analysis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8763644/
https://www.ncbi.nlm.nih.gov/pubmed/34462589
http://dx.doi.org/10.1038/s41587-021-01001-7
work_keys_str_mv AT lotfollahimohammad mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT naghipourfarmohsen mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT lueckenmalted mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT khajavimatin mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT buttnermaren mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT wagenstettermarco mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT avsecziga mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT gayosoadam mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT yosefnir mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT interlandimarta mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT rybakovsergei mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT misharinalexanderv mappingsinglecelldatatoreferenceatlasesbytransferlearning
AT theisfabianj mappingsinglecelldatatoreferenceatlasesbytransferlearning