Abstract

Background: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant-on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data.

Results: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity.

Conclusions: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons.

Download full-text PDF

Link Source
Download Source 1https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02904-1Web Search
Download Source 2http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10061781PMC
Download Source 3http://dx.doi.org/10.1186/s13059-023-02904-1DOI Listing

Publication Analysis

Top Keywords

quality control
8
control summaries
8
method comparisons
8
data
5
shaky foundations
4
foundations simulating
4
simulating single-cell
4
single-cell rna
4
rna sequencing
4
sequencing data
4