Cargando…

LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis

MOTIVATION: Modality matching in single-cell omics data analysis—i.e. matching cells across datasets collected using different types of genomic assays—has become an important problem, because unifying perspectives across different technologies holds the promise of yielding biological and clinical di...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng-Papaxanthos, Laetitia, Zhang, Ran, Li, Gang, Cuturi, Marco, Noble, William Stafford, Vert, Jean-Philippe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336029/
https://www.ncbi.nlm.nih.gov/pubmed/37421399
http://dx.doi.org/10.1093/bioinformatics/btad420
_version_ 1785071119814885376
author Meng-Papaxanthos, Laetitia
Zhang, Ran
Li, Gang
Cuturi, Marco
Noble, William Stafford
Vert, Jean-Philippe
author_facet Meng-Papaxanthos, Laetitia
Zhang, Ran
Li, Gang
Cuturi, Marco
Noble, William Stafford
Vert, Jean-Philippe
author_sort Meng-Papaxanthos, Laetitia
collection PubMed
description MOTIVATION: Modality matching in single-cell omics data analysis—i.e. matching cells across datasets collected using different types of genomic assays—has become an important problem, because unifying perspectives across different technologies holds the promise of yielding biological and clinical discoveries. However, single-cell dataset sizes can now reach hundreds of thousands to millions of cells, which remain out of reach for most multimodal computational methods. RESULTS: We propose LSMMD-MA, a large-scale Python implementation of the MMD-MA method for multimodal data integration. In LSMMD-MA, we reformulate the MMD-MA optimization problem using linear algebra and solve it with KeOps, a CUDA framework for symbolic matrix computation in Python. We show that LSMMD-MA scales to a million cells in each modality, two orders of magnitude greater than existing implementations. AVAILABILITY AND IMPLEMENTATION: LSMMD-MA is freely available at https://github.com/google-research/large_scale_mmdma and archived at https://doi.org/10.5281/zenodo.8076311.
format Online
Article
Text
id pubmed-10336029
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103360292023-07-13 LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis Meng-Papaxanthos, Laetitia Zhang, Ran Li, Gang Cuturi, Marco Noble, William Stafford Vert, Jean-Philippe Bioinformatics Applications Note MOTIVATION: Modality matching in single-cell omics data analysis—i.e. matching cells across datasets collected using different types of genomic assays—has become an important problem, because unifying perspectives across different technologies holds the promise of yielding biological and clinical discoveries. However, single-cell dataset sizes can now reach hundreds of thousands to millions of cells, which remain out of reach for most multimodal computational methods. RESULTS: We propose LSMMD-MA, a large-scale Python implementation of the MMD-MA method for multimodal data integration. In LSMMD-MA, we reformulate the MMD-MA optimization problem using linear algebra and solve it with KeOps, a CUDA framework for symbolic matrix computation in Python. We show that LSMMD-MA scales to a million cells in each modality, two orders of magnitude greater than existing implementations. AVAILABILITY AND IMPLEMENTATION: LSMMD-MA is freely available at https://github.com/google-research/large_scale_mmdma and archived at https://doi.org/10.5281/zenodo.8076311. Oxford University Press 2023-07-08 /pmc/articles/PMC10336029/ /pubmed/37421399 http://dx.doi.org/10.1093/bioinformatics/btad420 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Meng-Papaxanthos, Laetitia
Zhang, Ran
Li, Gang
Cuturi, Marco
Noble, William Stafford
Vert, Jean-Philippe
LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis
title LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis
title_full LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis
title_fullStr LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis
title_full_unstemmed LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis
title_short LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis
title_sort lsmmd-ma: scaling multimodal data integration for single-cell genomics data analysis
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336029/
https://www.ncbi.nlm.nih.gov/pubmed/37421399
http://dx.doi.org/10.1093/bioinformatics/btad420
work_keys_str_mv AT mengpapaxanthoslaetitia lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis
AT zhangran lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis
AT ligang lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis
AT cuturimarco lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis
AT noblewilliamstafford lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis
AT vertjeanphilippe lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis