Abstract
Background: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant-on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data.
Results: Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity.
Conclusions: Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons.
Download full-text PDF |
Link | Source |
|---|---|---|
| Download Source 1 | https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02904-1 | Web Search |
| Download Source 2 | http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10061781 | PMC |
| Download Source 3 | http://dx.doi.org/10.1186/s13059-023-02904-1 | DOI Listing |