Cargando…
LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis
MOTIVATION: Modality matching in single-cell omics data analysis—i.e. matching cells across datasets collected using different types of genomic assays—has become an important problem, because unifying perspectives across different technologies holds the promise of yielding biological and clinical di...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336029/ https://www.ncbi.nlm.nih.gov/pubmed/37421399 http://dx.doi.org/10.1093/bioinformatics/btad420 |
_version_ | 1785071119814885376 |
---|---|
author | Meng-Papaxanthos, Laetitia Zhang, Ran Li, Gang Cuturi, Marco Noble, William Stafford Vert, Jean-Philippe |
author_facet | Meng-Papaxanthos, Laetitia Zhang, Ran Li, Gang Cuturi, Marco Noble, William Stafford Vert, Jean-Philippe |
author_sort | Meng-Papaxanthos, Laetitia |
collection | PubMed |
description | MOTIVATION: Modality matching in single-cell omics data analysis—i.e. matching cells across datasets collected using different types of genomic assays—has become an important problem, because unifying perspectives across different technologies holds the promise of yielding biological and clinical discoveries. However, single-cell dataset sizes can now reach hundreds of thousands to millions of cells, which remain out of reach for most multimodal computational methods. RESULTS: We propose LSMMD-MA, a large-scale Python implementation of the MMD-MA method for multimodal data integration. In LSMMD-MA, we reformulate the MMD-MA optimization problem using linear algebra and solve it with KeOps, a CUDA framework for symbolic matrix computation in Python. We show that LSMMD-MA scales to a million cells in each modality, two orders of magnitude greater than existing implementations. AVAILABILITY AND IMPLEMENTATION: LSMMD-MA is freely available at https://github.com/google-research/large_scale_mmdma and archived at https://doi.org/10.5281/zenodo.8076311. |
format | Online Article Text |
id | pubmed-10336029 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-103360292023-07-13 LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis Meng-Papaxanthos, Laetitia Zhang, Ran Li, Gang Cuturi, Marco Noble, William Stafford Vert, Jean-Philippe Bioinformatics Applications Note MOTIVATION: Modality matching in single-cell omics data analysis—i.e. matching cells across datasets collected using different types of genomic assays—has become an important problem, because unifying perspectives across different technologies holds the promise of yielding biological and clinical discoveries. However, single-cell dataset sizes can now reach hundreds of thousands to millions of cells, which remain out of reach for most multimodal computational methods. RESULTS: We propose LSMMD-MA, a large-scale Python implementation of the MMD-MA method for multimodal data integration. In LSMMD-MA, we reformulate the MMD-MA optimization problem using linear algebra and solve it with KeOps, a CUDA framework for symbolic matrix computation in Python. We show that LSMMD-MA scales to a million cells in each modality, two orders of magnitude greater than existing implementations. AVAILABILITY AND IMPLEMENTATION: LSMMD-MA is freely available at https://github.com/google-research/large_scale_mmdma and archived at https://doi.org/10.5281/zenodo.8076311. Oxford University Press 2023-07-08 /pmc/articles/PMC10336029/ /pubmed/37421399 http://dx.doi.org/10.1093/bioinformatics/btad420 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Note Meng-Papaxanthos, Laetitia Zhang, Ran Li, Gang Cuturi, Marco Noble, William Stafford Vert, Jean-Philippe LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis |
title | LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis |
title_full | LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis |
title_fullStr | LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis |
title_full_unstemmed | LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis |
title_short | LSMMD-MA: scaling multimodal data integration for single-cell genomics data analysis |
title_sort | lsmmd-ma: scaling multimodal data integration for single-cell genomics data analysis |
topic | Applications Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10336029/ https://www.ncbi.nlm.nih.gov/pubmed/37421399 http://dx.doi.org/10.1093/bioinformatics/btad420 |
work_keys_str_mv | AT mengpapaxanthoslaetitia lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis AT zhangran lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis AT ligang lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis AT cuturimarco lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis AT noblewilliamstafford lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis AT vertjeanphilippe lsmmdmascalingmultimodaldataintegrationforsinglecellgenomicsdataanalysis |