Cargando…
How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data
Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. A...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677483/ https://www.ncbi.nlm.nih.gov/pubmed/36151725 http://dx.doi.org/10.1093/bib/bbac387 |
_version_ | 1784833821182525440 |
---|---|
author | Watson, Ebony Rose Mora, Ariane Taherian Fard, Atefeh Mar, Jessica Cara |
author_facet | Watson, Ebony Rose Mora, Ariane Taherian Fard, Atefeh Mar, Jessica Cara |
author_sort | Watson, Ebony Rose |
collection | PubMed |
description | Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the ‘best-performing’ metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data. |
format | Online Article Text |
id | pubmed-9677483 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-96774832022-11-21 How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data Watson, Ebony Rose Mora, Ariane Taherian Fard, Atefeh Mar, Jessica Cara Brief Bioinform Review Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the ‘best-performing’ metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data. Oxford University Press 2022-09-23 /pmc/articles/PMC9677483/ /pubmed/36151725 http://dx.doi.org/10.1093/bib/bbac387 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Review Watson, Ebony Rose Mora, Ariane Taherian Fard, Atefeh Mar, Jessica Cara How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data |
title | How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data |
title_full | How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data |
title_fullStr | How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data |
title_full_unstemmed | How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data |
title_short | How does the structure of data impact cell–cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data |
title_sort | how does the structure of data impact cell–cell similarity? evaluating how structural properties influence the performance of proximity metrics in single cell rna-seq data |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9677483/ https://www.ncbi.nlm.nih.gov/pubmed/36151725 http://dx.doi.org/10.1093/bib/bbac387 |
work_keys_str_mv | AT watsonebonyrose howdoesthestructureofdataimpactcellcellsimilarityevaluatinghowstructuralpropertiesinfluencetheperformanceofproximitymetricsinsinglecellrnaseqdata AT moraariane howdoesthestructureofdataimpactcellcellsimilarityevaluatinghowstructuralpropertiesinfluencetheperformanceofproximitymetricsinsinglecellrnaseqdata AT taherianfardatefeh howdoesthestructureofdataimpactcellcellsimilarityevaluatinghowstructuralpropertiesinfluencetheperformanceofproximitymetricsinsinglecellrnaseqdata AT marjessicacara howdoesthestructureofdataimpactcellcellsimilarityevaluatinghowstructuralpropertiesinfluencetheperformanceofproximitymetricsinsinglecellrnaseqdata |