Request: increase memory allocation for repeatmasker_wrapper #839

fubar2 · 2024-09-17T10:14:13Z

Currently, repeatmasker_wrapper has

Cores Allocated  16
Memory Allocated (MB) 59392

A single chromosome works but a whole VGP haplotype fails OOM.
Currently trying to get a RAM graph from running the same job but will take a while.

The text was updated successfully, but these errors were encountered:

mvdbeek · 2024-09-17T10:17:41Z

It might be quite reasonable to split by chromsomes, should be able to do this: https://usegalaxy.org/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fbgruening%2Fsplit_file_to_collection%2Fsplit_file_to_collection%2F0.5.2&version=latest
That'll both be faster and more memory efficient.

fubar2 · 2024-09-17T11:25:15Z

That's an interesting option to pursue @mvdbeek. Thanks!
Fasta contigs can be concatenated, but joining dozens of GFF with headers will probably need a new tool so probably not practicable for me - but if someone wants to take care of thatprepare a demonstration, it could be a solution.

Since it works fine on EU and there are other things to do, I'll remove it from the workflow for now, until that's done.

fubar2 · 2024-09-17T21:52:49Z

@mvdbeek: Here's why that job failed OOM with 59GB - run on a local Galaxy with 12 cores so < 1 GB RAM for most of the run, but right at the end RAM seems to blow out - max at the end ~63 or so GB - just a few more would probably work on .org

fubar2 · 2024-09-26T06:11:03Z

@mvdbeek: TreeValGal ignores the fasta output you might be assuming and only uses the GFF3.

A test at the GMOD gff3 tester shows that concatenating 2 or more GFF3, each with correct headers, will create an invalid GFF3. The message explains that it can be fixed and correctly ordered with one of their tools. If someone wants to wrap that new tool, it could be a solution. Sounds like more work than getting the allocation right.

mvdbeek · 2024-09-26T08:45:19Z

Do you have maybe the top 100 lines of 2 valid GFF files ? Nothing I find on the web actually validates against https://genometools.org/cgi-bin/gff3validator.cgi. https://usegalaxy.org/u/marius/w/merge-gff3 probably works, but hard to test if nothing actually validates. And the one file I fixed up manually complains about overlapping ids when I duplicate it 😆

mvdbeek · 2024-09-26T09:11:11Z

Ugh, this was hard, but finally I got 2 input files that actually validate. Here's an example run https://usegalaxy.org/workflows/invocations/84e15596bd4fc608?from_panel=true

fubar2 · 2024-09-26T09:43:15Z

@mvdbeek: Thanks! Will give that a try tomorrow.

fubar2 · 2024-09-27T07:56:32Z

@mvdbeek: More and more layers - it's not that simple of course. Ignoring the gff fixer for a moment for simplicity, a contig split repeatmasker test with a 500MB fish fasta fails red on usegalaxy.org.

natefoo · 2024-10-02T14:35:40Z

I can increase this of course but I'm very confused since afaict EU allocates only 40 GB (it is in their local tools.yml but it doesn't look like they override memory).

@fubar do you have a run on EU you can check the memory allocation/usage of?

natefoo · 2024-10-02T14:37:30Z

Ah I forgot about their automatic resubmission.

natefoo · 2024-10-02T21:46:11Z

Bumped to 76GB.

fubar2 · 2024-10-03T11:23:16Z

For efficiency, @mvdbeek's solution for getting a valid GFF after splitting into contigs could be very helpful. Now that it seems to have enough RAM, the WF starts and some parts run, but it does not end well. Repeatmasker is a very unruly tool but not sure how much more effort it deserves - unless this stress test provides a useful edge case for workflow job submission?

fubar2 · 2024-10-05T04:12:12Z

@natefoo: Sadly https://vgp.usegalaxy.org/datasets/f9cad7b01a4721353343582b8c4d1cc2/preview job ended green but with empty outputs ~28 hours after starting with mongo RAM allocation. See @mvdbeek's sensible map reduce suggestion and the conclusion of an attempt at implementing it above.

No need for more effort trying to tame this unruly tool for VGP scale operation. TreeValGal still has a windowmasker model free repeat density bigwig - so not crucial.

OTOH: If repeatmasker's dodgy code is effectively and properly isolated as a tool, maybe the failing workflow here is useful as an edge case for testing extremely resource hungry hammering during workflow invocation over a collection.

nekrut mentioned this issue Oct 8, 2024

TreeValGal status and issues galaxyproject/iwc#553

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: increase memory allocation for repeatmasker_wrapper #839

Request: increase memory allocation for repeatmasker_wrapper #839

fubar2 commented Sep 17, 2024

mvdbeek commented Sep 17, 2024 •

edited

Loading

fubar2 commented Sep 17, 2024 •

edited

Loading

fubar2 commented Sep 17, 2024 •

edited

Loading

fubar2 commented Sep 26, 2024 •

edited

Loading

mvdbeek commented Sep 26, 2024

mvdbeek commented Sep 26, 2024

fubar2 commented Sep 26, 2024

fubar2 commented Sep 27, 2024 •

edited

Loading

natefoo commented Oct 2, 2024 •

edited

Loading

natefoo commented Oct 2, 2024

natefoo commented Oct 2, 2024

fubar2 commented Oct 3, 2024 •

edited

Loading

fubar2 commented Oct 5, 2024 •

edited

Loading

Request: increase memory allocation for repeatmasker_wrapper #839

Request: increase memory allocation for repeatmasker_wrapper #839

Comments

fubar2 commented Sep 17, 2024

mvdbeek commented Sep 17, 2024 • edited Loading

fubar2 commented Sep 17, 2024 • edited Loading

fubar2 commented Sep 17, 2024 • edited Loading

fubar2 commented Sep 26, 2024 • edited Loading

mvdbeek commented Sep 26, 2024

mvdbeek commented Sep 26, 2024

fubar2 commented Sep 26, 2024

fubar2 commented Sep 27, 2024 • edited Loading

natefoo commented Oct 2, 2024 • edited Loading

natefoo commented Oct 2, 2024

natefoo commented Oct 2, 2024

fubar2 commented Oct 3, 2024 • edited Loading

fubar2 commented Oct 5, 2024 • edited Loading

mvdbeek commented Sep 17, 2024 •

edited

Loading

fubar2 commented Sep 17, 2024 •

edited

Loading

fubar2 commented Sep 17, 2024 •

edited

Loading

fubar2 commented Sep 26, 2024 •

edited

Loading

fubar2 commented Sep 27, 2024 •

edited

Loading

natefoo commented Oct 2, 2024 •

edited

Loading

fubar2 commented Oct 3, 2024 •

edited

Loading

fubar2 commented Oct 5, 2024 •

edited

Loading