Cargando…
SCIM: universal single-cell matching with unpaired feature sets
MOTIVATION: Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of th...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7773480/ https://www.ncbi.nlm.nih.gov/pubmed/33381818 http://dx.doi.org/10.1093/bioinformatics/btaa843 |
_version_ | 1783630054512656384 |
---|---|
author | Stark, Stefan G Ficek, Joanna Locatello, Francesco Bonilla, Ximena Chevrier, Stéphane Singer, Franziska Rätsch, Gunnar Lehmann, Kjong-Van |
author_facet | Stark, Stefan G Ficek, Joanna Locatello, Francesco Bonilla, Ximena Chevrier, Stéphane Singer, Franziska Rätsch, Gunnar Lehmann, Kjong-Van |
author_sort | Stark, Stefan G |
collection | PubMed |
description | MOTIVATION: Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed. RESULTS: We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively. AVAILABILITY AND IMPLEMENTATION: https://github.com/ratschlab/scim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7773480 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-77734802021-01-05 SCIM: universal single-cell matching with unpaired feature sets Stark, Stefan G Ficek, Joanna Locatello, Francesco Bonilla, Ximena Chevrier, Stéphane Singer, Franziska Rätsch, Gunnar Lehmann, Kjong-Van Bioinformatics Data MOTIVATION: Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed. RESULTS: We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively. AVAILABILITY AND IMPLEMENTATION: https://github.com/ratschlab/scim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-12-29 /pmc/articles/PMC7773480/ /pubmed/33381818 http://dx.doi.org/10.1093/bioinformatics/btaa843 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Data Stark, Stefan G Ficek, Joanna Locatello, Francesco Bonilla, Ximena Chevrier, Stéphane Singer, Franziska Rätsch, Gunnar Lehmann, Kjong-Van SCIM: universal single-cell matching with unpaired feature sets |
title | SCIM: universal single-cell matching with unpaired feature sets |
title_full | SCIM: universal single-cell matching with unpaired feature sets |
title_fullStr | SCIM: universal single-cell matching with unpaired feature sets |
title_full_unstemmed | SCIM: universal single-cell matching with unpaired feature sets |
title_short | SCIM: universal single-cell matching with unpaired feature sets |
title_sort | scim: universal single-cell matching with unpaired feature sets |
topic | Data |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7773480/ https://www.ncbi.nlm.nih.gov/pubmed/33381818 http://dx.doi.org/10.1093/bioinformatics/btaa843 |
work_keys_str_mv | AT starkstefang scimuniversalsinglecellmatchingwithunpairedfeaturesets AT ficekjoanna scimuniversalsinglecellmatchingwithunpairedfeaturesets AT locatellofrancesco scimuniversalsinglecellmatchingwithunpairedfeaturesets AT bonillaximena scimuniversalsinglecellmatchingwithunpairedfeaturesets AT chevrierstephane scimuniversalsinglecellmatchingwithunpairedfeaturesets AT singerfranziska scimuniversalsinglecellmatchingwithunpairedfeaturesets AT scimuniversalsinglecellmatchingwithunpairedfeaturesets AT ratschgunnar scimuniversalsinglecellmatchingwithunpairedfeaturesets AT lehmannkjongvan scimuniversalsinglecellmatchingwithunpairedfeaturesets |