-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharing compressed data #48
Comments
Hi @adamgayoso, thanks a lot for these suggestions! I will regenerate the datasets for the website and notify here as soon as they are up. |
Thank you @emdann!! It would also be beneficial if the genes used to run the scvi model were included (in |
The batch key used is a concatenation of |
The new version seems to give reasonably similar results to the old version which is good. I also noticed that Scanpy umap plotting of the celltype is not working for some reason. |
The .h5ad objects for download should now be updated. Could I ask you to try again to check if the plotting problem persists? |
Notes from troubleshooting attempts: Part of the problem could be the NaNs (scverse/scanpy#2133): I found the maternal contaminants were not flagged correctly in this object, these are cells with After filtering out nans I still get all gray, so it might indeed be a problem with scanpy trying to handle too many categories (and pandas update possibly?). Also setting I usually plot annotations by lineage, using the assignment saved here. The best workaround I can suggest for now is trying something like:
This tells me I should probably save the annotation groups in |
Hello, this is such a cool project!
I was wondering if the compressed anndata objects could be shared on the website. For example, for the full dataset, saving like
write_h5ad(path, compression="gzip")
reduces the file size to ~5gb from 15gb. While it takes a bit longer to save with compression, reading is still pretty fast. I also noticed an issue withadata.obs["donor"]
where it's mixed string and float types, so also saving it withadata.obs["donor"] = adata.obs["donor"].astype(str)
would be appreciated.We are working on faster implementations of scvi-tools using jax. In this notebook we can process 150k cells in <5 minutes on Colab. I was hoping to create a new tutorial with your dataset to show that we can process 900k cells in < 1 hr (integration + visualization, all for free!).
The text was updated successfully, but these errors were encountered: