Cargando…

How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data

Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. A...

Descripción completa

Detalles Bibliográficos
Autores principales: Watson, Ebony Rose, Mora, Ariane, Taherian Fard, Atefeh, Mar, Jessica Cara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677483/
https://www.ncbi.nlm.nih.gov/pubmed/36151725
http://dx.doi.org/10.1093/bib/bbac387
_version_ 1784833821182525440
author Watson, Ebony Rose
Mora, Ariane
Taherian Fard, Atefeh
Mar, Jessica Cara
author_facet Watson, Ebony Rose
Mora, Ariane
Taherian Fard, Atefeh
Mar, Jessica Cara
author_sort Watson, Ebony Rose
collection PubMed
description Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the ‘best-performing’ metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data.
format Online
Article
Text
id pubmed-9677483
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-96774832022-11-21 How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data Watson, Ebony Rose Mora, Ariane Taherian Fard, Atefeh Mar, Jessica Cara Brief Bioinform Review Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the ‘best-performing’ metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data. Oxford University Press 2022-09-23 /pmc/articles/PMC9677483/ /pubmed/36151725 http://dx.doi.org/10.1093/bib/bbac387 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review
Watson, Ebony Rose
Mora, Ariane
Taherian Fard, Atefeh
Mar, Jessica Cara
How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data
title How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data
title_full How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data
title_fullStr How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data
title_full_unstemmed How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data
title_short How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data
title_sort how does the structure of data impact cell–cell similarity? evaluating how structural properties influence the performance of proximity metrics in single cell rna-seq data
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677483/
https://www.ncbi.nlm.nih.gov/pubmed/36151725
http://dx.doi.org/10.1093/bib/bbac387
work_keys_str_mv AT watsonebonyrose howdoesthestructureofdataimpactcellcellsimilarityevaluatinghowstructuralpropertiesinfluencetheperformanceofproximitymetricsinsinglecellrnaseqdata
AT moraariane howdoesthestructureofdataimpactcellcellsimilarityevaluatinghowstructuralpropertiesinfluencetheperformanceofproximitymetricsinsinglecellrnaseqdata
AT taherianfardatefeh howdoesthestructureofdataimpactcellcellsimilarityevaluatinghowstructuralpropertiesinfluencetheperformanceofproximitymetricsinsinglecellrnaseqdata
AT marjessicacara howdoesthestructureofdataimpactcellcellsimilarityevaluatinghowstructuralpropertiesinfluencetheperformanceofproximitymetricsinsinglecellrnaseqdata