Cargando…

Optimal transport improves cell–cell similarity inference in single-cell omics data

MOTIVATION: High‐throughput single-cell molecular profiling is revolutionizing biology and medicine by unveiling the diversity of cell types and states contributing to development and disease. The identification and characterization of cellular heterogeneity are typically achieved through unsupervis...

Descripción completa

Detalles Bibliográficos
Autores principales: Huizing, Geert-Jan, Peyré, Gabriel, Cantini, Laura
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004651/
https://www.ncbi.nlm.nih.gov/pubmed/35157031
http://dx.doi.org/10.1093/bioinformatics/btac084
_version_ 1784686309547180032
author Huizing, Geert-Jan
Peyré, Gabriel
Cantini, Laura
author_facet Huizing, Geert-Jan
Peyré, Gabriel
Cantini, Laura
author_sort Huizing, Geert-Jan
collection PubMed
description MOTIVATION: High‐throughput single-cell molecular profiling is revolutionizing biology and medicine by unveiling the diversity of cell types and states contributing to development and disease. The identification and characterization of cellular heterogeneity are typically achieved through unsupervised clustering, which crucially relies on a similarity metric. RESULTS: We here propose the use of Optimal Transport (OT) as a cell–cell similarity metric for single-cell omics data. OT defines distances to compare high-dimensional data represented as probability distributions. To speed up computations and cope with the high dimensionality of single-cell data, we consider the entropic regularization of the classical OT distance. We then extensively benchmark OT against state-of-the-art metrics over 13 independent datasets, including simulated, scRNA-seq, scATAC-seq and single-cell DNA methylation data. First, we test the ability of the metrics to detect the similarity between cells belonging to the same groups (e.g. cell types, cell lines of origin). Then, we apply unsupervised clustering and test the quality of the resulting clusters. OT is found to improve cell–cell similarity inference and cell clustering in all simulated and real scRNA-seq data, as well as in scATAC-seq and single-cell DNA methylation data. AVAILABILITY AND IMPLEMENTATION: All our analyses are reproducible through the OT-scOmics Jupyter notebook available at https://github.com/ComputationalSystemsBiology/OT-scOmics. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9004651
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-90046512022-04-13 Optimal transport improves cell–cell similarity inference in single-cell omics data Huizing, Geert-Jan Peyré, Gabriel Cantini, Laura Bioinformatics Original Papers MOTIVATION: High‐throughput single-cell molecular profiling is revolutionizing biology and medicine by unveiling the diversity of cell types and states contributing to development and disease. The identification and characterization of cellular heterogeneity are typically achieved through unsupervised clustering, which crucially relies on a similarity metric. RESULTS: We here propose the use of Optimal Transport (OT) as a cell–cell similarity metric for single-cell omics data. OT defines distances to compare high-dimensional data represented as probability distributions. To speed up computations and cope with the high dimensionality of single-cell data, we consider the entropic regularization of the classical OT distance. We then extensively benchmark OT against state-of-the-art metrics over 13 independent datasets, including simulated, scRNA-seq, scATAC-seq and single-cell DNA methylation data. First, we test the ability of the metrics to detect the similarity between cells belonging to the same groups (e.g. cell types, cell lines of origin). Then, we apply unsupervised clustering and test the quality of the resulting clusters. OT is found to improve cell–cell similarity inference and cell clustering in all simulated and real scRNA-seq data, as well as in scATAC-seq and single-cell DNA methylation data. AVAILABILITY AND IMPLEMENTATION: All our analyses are reproducible through the OT-scOmics Jupyter notebook available at https://github.com/ComputationalSystemsBiology/OT-scOmics. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-02-14 /pmc/articles/PMC9004651/ /pubmed/35157031 http://dx.doi.org/10.1093/bioinformatics/btac084 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Huizing, Geert-Jan
Peyré, Gabriel
Cantini, Laura
Optimal transport improves cell–cell similarity inference in single-cell omics data
title Optimal transport improves cell–cell similarity inference in single-cell omics data
title_full Optimal transport improves cell–cell similarity inference in single-cell omics data
title_fullStr Optimal transport improves cell–cell similarity inference in single-cell omics data
title_full_unstemmed Optimal transport improves cell–cell similarity inference in single-cell omics data
title_short Optimal transport improves cell–cell similarity inference in single-cell omics data
title_sort optimal transport improves cell–cell similarity inference in single-cell omics data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004651/
https://www.ncbi.nlm.nih.gov/pubmed/35157031
http://dx.doi.org/10.1093/bioinformatics/btac084
work_keys_str_mv AT huizinggeertjan optimaltransportimprovescellcellsimilarityinferenceinsinglecellomicsdata
AT peyregabriel optimaltransportimprovescellcellsimilarityinferenceinsinglecellomicsdata
AT cantinilaura optimaltransportimprovescellcellsimilarityinferenceinsinglecellomicsdata