Cargando…

The shaky foundations of simulating single-cell RNA sequencing data

BACKGROUND: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant—on t...

Descripción completa

Detalles Bibliográficos
Autores principales: Crowell, Helena L., Morillo Leonardo, Sarah X., Soneson, Charlotte, Robinson, Mark D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10061781/
https://www.ncbi.nlm.nih.gov/pubmed/36991470
http://dx.doi.org/10.1186/s13059-023-02904-1
_version_ 1785017362077974528
author Crowell, Helena L.
Morillo Leonardo, Sarah X.
Soneson, Charlotte
Robinson, Mark D.
author_facet Crowell, Helena L.
Morillo Leonardo, Sarah X.
Soneson, Charlotte
Robinson, Mark D.
author_sort Crowell, Helena L.
collection PubMed
description BACKGROUND: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant—on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-02904-1.
format Online
Article
Text
id pubmed-10061781
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-100617812023-03-31 The shaky foundations of simulating single-cell RNA sequencing data Crowell, Helena L. Morillo Leonardo, Sarah X. Soneson, Charlotte Robinson, Mark D. Genome Biol Research BACKGROUND: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant—on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-02904-1. BioMed Central 2023-03-29 /pmc/articles/PMC10061781/ /pubmed/36991470 http://dx.doi.org/10.1186/s13059-023-02904-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Crowell, Helena L.
Morillo Leonardo, Sarah X.
Soneson, Charlotte
Robinson, Mark D.
The shaky foundations of simulating single-cell RNA sequencing data
title The shaky foundations of simulating single-cell RNA sequencing data
title_full The shaky foundations of simulating single-cell RNA sequencing data
title_fullStr The shaky foundations of simulating single-cell RNA sequencing data
title_full_unstemmed The shaky foundations of simulating single-cell RNA sequencing data
title_short The shaky foundations of simulating single-cell RNA sequencing data
title_sort shaky foundations of simulating single-cell rna sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10061781/
https://www.ncbi.nlm.nih.gov/pubmed/36991470
http://dx.doi.org/10.1186/s13059-023-02904-1
work_keys_str_mv AT crowellhelenal theshakyfoundationsofsimulatingsinglecellrnasequencingdata
AT morilloleonardosarahx theshakyfoundationsofsimulatingsinglecellrnasequencingdata
AT sonesoncharlotte theshakyfoundationsofsimulatingsinglecellrnasequencingdata
AT robinsonmarkd theshakyfoundationsofsimulatingsinglecellrnasequencingdata
AT crowellhelenal shakyfoundationsofsimulatingsinglecellrnasequencingdata
AT morilloleonardosarahx shakyfoundationsofsimulatingsinglecellrnasequencingdata
AT sonesoncharlotte shakyfoundationsofsimulatingsinglecellrnasequencingdata
AT robinsonmarkd shakyfoundationsofsimulatingsinglecellrnasequencingdata