-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory consumption #14
Comments
Hi Jacob, I suggest you to use HDF5 file for your cell-gene matrix. Here : https://www.biorxiv.org/content/10.1101/2021.08.02.453487v1 in figure 1.D you can see how the the RAM consumption decreases using the same dataset(TENxBraindata) as dense matrix or DelayedArray based on HDF5. The time consumption will increase but I think it is a good trade-off. You need a SingleCellExperiment with a counts assay that is a DelayedArray. The usage of newWave is the same but here, in section "NewWave on DelayedArray" you can find an example : https://fedeago.github.io/SurfingNewWave/articles/vignette.html#newwave-on-delayedarray-1 Another tips, if for tyo it is good to have a dispersion parameter different for each gene I suggest you to use these parameters: I hope that those information will be useful. Best regards Federico |
Hi Federico, Many thanks for your suggestions. I will try these now. best wishes |
Dear Federico, My SCE object
Transforming the "batch" field from colData to a factor
Converting the assay (counts) to a DelayedArray
Running newWaveA1Bzinb <- newWave(A1Bmtx_sce, K=9, X="~batch", children=24, n_gene_par=1500, n_cell_par=30000, verbose=FALSE, commondispersion=FALSE) I submitted this code as job script to a large-memory node (768 Gb) from my cluster, and it stopped after ~30 min with the SGE exit status #37: "failed 37 : qmaster enforced h_rt, h_cpu, or h_vmem limit", reaching a maxvmem of 487 Gb, while I've asked for 720Gb. I wonder if you would have any tip to circumvent this issue. Thanks in advance. |
Hi Elton, thank you for your interest on NewWave functionalities. I think that transform it in a DelayedArray object is not enough because it is still based on an in-memory object instead of an on-disk object. You should save it as an hdf5 file and then read it using the Delayedarray backend. https://bioconductor.org/packages/release/bioc/html/DelayedArray.html Please let me know for any other doubts, |
Thanks for your directions, Federico. Here's what I've done following your suggestion:
The problem now appears to be that, even though HDF5Matrix and HDF5Array are sort of DelayedMatrix and DelayedArray, respectively, newWave doesn't seem to recognise them as such:
I tried converting those HDF5 Matrix and Array classes to DelayedArray ones with:
Any shedded light would be much appreciated, as it seems I'm nearly there. |
Dear Federico, Following up on the issue above, I wonder whether there's a possibility/capability for newWave to read HDF5Matrix and HDF5Array classes, which are generated by the HDF5Array package that uses DelayedArray. Thanks, |
Hi Federico,
Slot "name": Slot "dim":
|
Hi there,
Thank you for releasing the code for your algorithm. I have a question regarding the memory-efficiency.
My cell-gene matrix is approximately 600,000 cells x 6,000 genes. Assuming 32-bit floats, this should be around 15gb.
Despite this, I am running out of memory despite having 128gb RAM. How can I run this algorithm memory-efficient?
The command i'm using so far is:
res <- newWave(sce,X = "~site.Site", K=10, verbose = FALSE, children=6, n_gene_disp = 100, n_gene_par = 100, n_cell_par = 100)
Many thanks
The text was updated successfully, but these errors were encountered: