The preprocessing of the datasets closely follows other works. The FindVariableFeatures() function from Seurat was used to preprocess the small datasets, whereas the scanpy package was used to preprocess the large dataset.
For Stuart and Kazer, we follow the preprocessing steps from seurat website. For Zheng ERCC, Zheng Monocyte, Duo4eq, Duo8eq, we directly utilize the preprocessing script from Townes et.al. 2019. For the other datasets, we largely followed the preprocessing from the Seurat Vignette: https://satijalab.org/seurat/archive/v3.1/immune_alignment.html and https://satijalab.org/seurat/articles/integration_introduction.html.