Preprocessing Questions #238

nc1m · 2024-08-09T13:37:49Z

nc1m
Aug 9, 2024

Hello,

firstly thanks for your great work!
I've been working with you tutorials and have some questions about preprocessing the data.

In the zero-shot tutorials it looks like the raw count data isn't normalized and log1p transformed (just binned), why not/does it matter?
In the Tutorial_Annotation.ipynb the ms data is already preprocessed (data_is_raw=False), but during the preprocessing step the data is still normalized (normalize_total=1e4), is this necessary or does not have an influence on the binning?
If I understood correctly filter_gene_by_counts and filter_cell_by_counts depend on the dataset and normalization and log1p is always done before inputting the data. When do we need to subset_hvg, if the number of genes is longer then the max sequence length?
Does the hvg_flavor matter much? When working with the Myeloid data I can't get the subset_hvg to work with 'cell_ranger' (probably because of this issue. How did you preprocess the Myeloid data?

Best regards,
nc1m