Cargando…

Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods

Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, suc...

Descripción completa

Detalles Bibliográficos
Autores principales: Krzak, Monika, Raykov, Yordan, Boukouvalas, Alexis, Cutillo, Luisa, Angelini, Claudia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6918801/
https://www.ncbi.nlm.nih.gov/pubmed/31921297
http://dx.doi.org/10.3389/fgene.2019.01253
_version_ 1783480663121330176
author Krzak, Monika
Raykov, Yordan
Boukouvalas, Alexis
Cutillo, Luisa
Angelini, Claudia
author_facet Krzak, Monika
Raykov, Yordan
Boukouvalas, Alexis
Cutillo, Luisa
Angelini, Claudia
author_sort Krzak, Monika
collection PubMed
description Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by method-specific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.
format Online
Article
Text
id pubmed-6918801
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-69188012020-01-09 Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods Krzak, Monika Raykov, Yordan Boukouvalas, Alexis Cutillo, Luisa Angelini, Claudia Front Genet Genetics Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by method-specific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters. Frontiers Media S.A. 2019-12-11 /pmc/articles/PMC6918801/ /pubmed/31921297 http://dx.doi.org/10.3389/fgene.2019.01253 Text en Copyright © 2019 Krzak, Raykov, Boukouvalas, Cutillo and Angelini http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Krzak, Monika
Raykov, Yordan
Boukouvalas, Alexis
Cutillo, Luisa
Angelini, Claudia
Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title_full Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title_fullStr Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title_full_unstemmed Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title_short Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods
title_sort benchmark and parameter sensitivity analysis of single-cell rna sequencing clustering methods
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6918801/
https://www.ncbi.nlm.nih.gov/pubmed/31921297
http://dx.doi.org/10.3389/fgene.2019.01253
work_keys_str_mv AT krzakmonika benchmarkandparametersensitivityanalysisofsinglecellrnasequencingclusteringmethods
AT raykovyordan benchmarkandparametersensitivityanalysisofsinglecellrnasequencingclusteringmethods
AT boukouvalasalexis benchmarkandparametersensitivityanalysisofsinglecellrnasequencingclusteringmethods
AT cutilloluisa benchmarkandparametersensitivityanalysisofsinglecellrnasequencingclusteringmethods
AT angeliniclaudia benchmarkandparametersensitivityanalysisofsinglecellrnasequencingclusteringmethods