Cargando…

SCIM: universal single-cell matching with unpaired feature sets

MOTIVATION: Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of th...

Descripción completa

Detalles Bibliográficos
Autores principales: Stark, Stefan G, Ficek, Joanna, Locatello, Francesco, Bonilla, Ximena, Chevrier, Stéphane, Singer, Franziska, Rätsch, Gunnar, Lehmann, Kjong-Van
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7773480/
https://www.ncbi.nlm.nih.gov/pubmed/33381818
http://dx.doi.org/10.1093/bioinformatics/btaa843
_version_ 1783630054512656384
author Stark, Stefan G
Ficek, Joanna
Locatello, Francesco
Bonilla, Ximena
Chevrier, Stéphane
Singer, Franziska
Rätsch, Gunnar
Lehmann, Kjong-Van
author_facet Stark, Stefan G
Ficek, Joanna
Locatello, Francesco
Bonilla, Ximena
Chevrier, Stéphane
Singer, Franziska
Rätsch, Gunnar
Lehmann, Kjong-Van
author_sort Stark, Stefan G
collection PubMed
description MOTIVATION: Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed. RESULTS: We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively. AVAILABILITY AND IMPLEMENTATION: https://github.com/ratschlab/scim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7773480
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77734802021-01-05 SCIM: universal single-cell matching with unpaired feature sets Stark, Stefan G Ficek, Joanna Locatello, Francesco Bonilla, Ximena Chevrier, Stéphane Singer, Franziska Rätsch, Gunnar Lehmann, Kjong-Van Bioinformatics Data MOTIVATION: Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed. RESULTS: We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively. AVAILABILITY AND IMPLEMENTATION: https://github.com/ratschlab/scim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-12-29 /pmc/articles/PMC7773480/ /pubmed/33381818 http://dx.doi.org/10.1093/bioinformatics/btaa843 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data
Stark, Stefan G
Ficek, Joanna
Locatello, Francesco
Bonilla, Ximena
Chevrier, Stéphane
Singer, Franziska
Rätsch, Gunnar
Lehmann, Kjong-Van
SCIM: universal single-cell matching with unpaired feature sets
title SCIM: universal single-cell matching with unpaired feature sets
title_full SCIM: universal single-cell matching with unpaired feature sets
title_fullStr SCIM: universal single-cell matching with unpaired feature sets
title_full_unstemmed SCIM: universal single-cell matching with unpaired feature sets
title_short SCIM: universal single-cell matching with unpaired feature sets
title_sort scim: universal single-cell matching with unpaired feature sets
topic Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7773480/
https://www.ncbi.nlm.nih.gov/pubmed/33381818
http://dx.doi.org/10.1093/bioinformatics/btaa843
work_keys_str_mv AT starkstefang scimuniversalsinglecellmatchingwithunpairedfeaturesets
AT ficekjoanna scimuniversalsinglecellmatchingwithunpairedfeaturesets
AT locatellofrancesco scimuniversalsinglecellmatchingwithunpairedfeaturesets
AT bonillaximena scimuniversalsinglecellmatchingwithunpairedfeaturesets
AT chevrierstephane scimuniversalsinglecellmatchingwithunpairedfeaturesets
AT singerfranziska scimuniversalsinglecellmatchingwithunpairedfeaturesets
AT scimuniversalsinglecellmatchingwithunpairedfeaturesets
AT ratschgunnar scimuniversalsinglecellmatchingwithunpairedfeaturesets
AT lehmannkjongvan scimuniversalsinglecellmatchingwithunpairedfeaturesets