Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive memory required in MergeBsseqObjects #29

Open
Nick-Eagles opened this issue Sep 15, 2023 · 0 comments
Open

Excessive memory required in MergeBsseqObjects #29

Nick-Eagles opened this issue Sep 15, 2023 · 0 comments

Comments

@Nick-Eagles
Copy link
Member

Merging bsseq objects is extraordinarily memory inefficient currently. In initial tests, merging a list (even a list of 2 elements!) of bsseq objects often results in peak memory exceeding 10 times the size of the sum of memories occupied by individual objects; the theoretical worst-case optimal behavior of any merging algorithm should not exceed 2 times this sum. Note that in these tests, only in-memory portions of the object were quantified (not HDF5-backed assays).

The end goal here will either be to 1. fix any inefficient code on my end (e.g. is do.call even expected to do things in a memory-efficient way?), and/or 2. raise GitHub issues on dependent buggy packages.

(1) Questionable pieces of my code include:

  • using do.call on a list of objects: should we expect do.call to iterate over the list in a memory-efficient way?
  • using rbind instead of combine or combineList (the officially intended methods for this purpose), though I'm only using rbind because of this open bug, and a bsseq contributor claims rbind is suitable for our case
  • possibly not using required HDF5Array or DelayedArray settings, such as setAutoRealizationBackend("HDF5Array")

(2) Regarding probable issues in dependent packages: combining just the rowRanges of two bsseq objects results in peak memory usage hitting ~4 times the sum of memory sizes of individual ranges, which is arguably a bug I'll need to make a reprex and issue for on the GenomicsRanges GitHub. But, as mentioned above, merging 2 bsseq objects is even more memory inefficient, so this GenomicsRanges bug is only part of the problem. It's possible that there are failures to use the HDF5 backend when merging assays, such as noted in this (currently) open issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant