Abstract

Background: Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory.

Results: In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms.

Conclusion: We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.

Download full-text PDF

Link Source
Download Source 1https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1900-3Web Search
Download Source 2http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6970290PMC
Download Source 3http://dx.doi.org/10.1186/s13059-019-1900-3DOI Listing

Publication Analysis

Top Keywords

scrna-seq datasets
12
principal component
8
component analysis
8
large-scale scrna-seq
8
fast memory-efficient
8
pca algorithms
8
benchmarking principal
4
analysis large-scale
4
large-scale single-cell
4
single-cell rna-sequencing
4