-
How can I speed up steps in the integration workflow when working with large datasets, or when integrating many different datasets? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Working with multiple or large datasets can reduce the speed of the standard Seurat integration workflow. We offer three strategies, which can be combined, to assist users who wish to speed up these steps. First, Seurat allows for the use of reciprocal PCA (‘rpca’), as an alternative to canonical correlation analysis when identifying anchors, which substantially improves speed and memory. Second, when working with many datasets, users can specify one dataset (or a subset) as a baseline (or ‘reference’), which therefore avoids finding anchors between all pairs of datasets. Finally, anchor-finding between dataset pairs can be run in parallel by setting a parallelization plan (see the future vignette for details). We describe these steps in a vignette, which also demonstrates how to integrate multiple datasets totaling >200,000 cells in ~30 minutes. |
Beta Was this translation helpful? Give feedback.
Working with multiple or large datasets can reduce the speed of the standard Seurat integration workflow. We offer three strategies, which can be combined, to assist users who wish to speed up these steps. First, Seurat allows for the use of reciprocal PCA (‘rpca’), as an alternative to canonical correlation analysis when identifying anchors, which substantially improves speed and memory. Second, when working with many datasets, users can specify one dataset (or a subset) as a baseline (or ‘reference’), which therefore avoids finding anchors between all pairs of datasets. Finally, anchor-finding between dataset pairs can be run in parallel by setting a parallelization plan (see the future…