Cargando…

Integration and transfer learning of single-cell transcriptomes via cFIT

Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Minshi, Li, Yue, Wamsley, Brie, Wei, Yuting, Roeder, Kathryn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7958425/
https://www.ncbi.nlm.nih.gov/pubmed/33658382
http://dx.doi.org/10.1073/pnas.2024383118
_version_ 1783664818686787584
author Peng, Minshi
Li, Yue
Wamsley, Brie
Wei, Yuting
Roeder, Kathryn
author_facet Peng, Minshi
Li, Yue
Wamsley, Brie
Wei, Yuting
Roeder, Kathryn
author_sort Peng, Minshi
collection PubMed
description Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development.
format Online
Article
Text
id pubmed-7958425
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-79584252021-03-19 Integration and transfer learning of single-cell transcriptomes via cFIT Peng, Minshi Li, Yue Wamsley, Brie Wei, Yuting Roeder, Kathryn Proc Natl Acad Sci U S A Biological Sciences Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development. National Academy of Sciences 2021-03-09 2021-03-03 /pmc/articles/PMC7958425/ /pubmed/33658382 http://dx.doi.org/10.1073/pnas.2024383118 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Biological Sciences
Peng, Minshi
Li, Yue
Wamsley, Brie
Wei, Yuting
Roeder, Kathryn
Integration and transfer learning of single-cell transcriptomes via cFIT
title Integration and transfer learning of single-cell transcriptomes via cFIT
title_full Integration and transfer learning of single-cell transcriptomes via cFIT
title_fullStr Integration and transfer learning of single-cell transcriptomes via cFIT
title_full_unstemmed Integration and transfer learning of single-cell transcriptomes via cFIT
title_short Integration and transfer learning of single-cell transcriptomes via cFIT
title_sort integration and transfer learning of single-cell transcriptomes via cfit
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7958425/
https://www.ncbi.nlm.nih.gov/pubmed/33658382
http://dx.doi.org/10.1073/pnas.2024383118
work_keys_str_mv AT pengminshi integrationandtransferlearningofsinglecelltranscriptomesviacfit
AT liyue integrationandtransferlearningofsinglecelltranscriptomesviacfit
AT wamsleybrie integrationandtransferlearningofsinglecelltranscriptomesviacfit
AT weiyuting integrationandtransferlearningofsinglecelltranscriptomesviacfit
AT roederkathryn integrationandtransferlearningofsinglecelltranscriptomesviacfit