Cargando…
The shaky foundations of simulating single-cell RNA sequencing data
BACKGROUND: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant—on t...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10061781/ https://www.ncbi.nlm.nih.gov/pubmed/36991470 http://dx.doi.org/10.1186/s13059-023-02904-1 |
_version_ | 1785017362077974528 |
---|---|
author | Crowell, Helena L. Morillo Leonardo, Sarah X. Soneson, Charlotte Robinson, Mark D. |
author_facet | Crowell, Helena L. Morillo Leonardo, Sarah X. Soneson, Charlotte Robinson, Mark D. |
author_sort | Crowell, Helena L. |
collection | PubMed |
description | BACKGROUND: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant—on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-02904-1. |
format | Online Article Text |
id | pubmed-10061781 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-100617812023-03-31 The shaky foundations of simulating single-cell RNA sequencing data Crowell, Helena L. Morillo Leonardo, Sarah X. Soneson, Charlotte Robinson, Mark D. Genome Biol Research BACKGROUND: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant—on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13059-023-02904-1. BioMed Central 2023-03-29 /pmc/articles/PMC10061781/ /pubmed/36991470 http://dx.doi.org/10.1186/s13059-023-02904-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Crowell, Helena L. Morillo Leonardo, Sarah X. Soneson, Charlotte Robinson, Mark D. The shaky foundations of simulating single-cell RNA sequencing data |
title | The shaky foundations of simulating single-cell RNA sequencing data |
title_full | The shaky foundations of simulating single-cell RNA sequencing data |
title_fullStr | The shaky foundations of simulating single-cell RNA sequencing data |
title_full_unstemmed | The shaky foundations of simulating single-cell RNA sequencing data |
title_short | The shaky foundations of simulating single-cell RNA sequencing data |
title_sort | shaky foundations of simulating single-cell rna sequencing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10061781/ https://www.ncbi.nlm.nih.gov/pubmed/36991470 http://dx.doi.org/10.1186/s13059-023-02904-1 |
work_keys_str_mv | AT crowellhelenal theshakyfoundationsofsimulatingsinglecellrnasequencingdata AT morilloleonardosarahx theshakyfoundationsofsimulatingsinglecellrnasequencingdata AT sonesoncharlotte theshakyfoundationsofsimulatingsinglecellrnasequencingdata AT robinsonmarkd theshakyfoundationsofsimulatingsinglecellrnasequencingdata AT crowellhelenal shakyfoundationsofsimulatingsinglecellrnasequencingdata AT morilloleonardosarahx shakyfoundationsofsimulatingsinglecellrnasequencingdata AT sonesoncharlotte shakyfoundationsofsimulatingsinglecellrnasequencingdata AT robinsonmarkd shakyfoundationsofsimulatingsinglecellrnasequencingdata |