Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long staging #196

Open
jordandekraker opened this issue Jun 14, 2022 · 1 comment
Open

long staging #196

jordandekraker opened this issue Jun 14, 2022 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@jordandekraker
Copy link
Collaborator

Lately i've been submitting hippunfold jobs from within a regularSubmit -j LongSkinny job because building the DAG and staging takes so long.
Possible bottlenecks:

  • If my understanding is correct, i think separate jobs are being submitted for most (all?) rules but i thought we had planned to submit one job per subject using snakemake's group:"subj" system. I think this must not be working. I wonder if there is some interaction due to different resources or container for each rule?
  • searching the input BIDS directory for the required files could be taking quite long. i think this relies on snakebids and pybids but i wonder if it could be sped up?
  • printing every job to terminal is a bit much to look at and might be slowing things down a tad. maybe we could suppress this
  • maybe i should experiment more with the --group-components flag, and we should recommend this in the readthedocs
@akhanf akhanf added the documentation Improvements or additions to documentation label Oct 20, 2022
@akhanf
Copy link
Member

akhanf commented Oct 20, 2022

I think we connected off-line about this, but closing the loop here too --

The group: subj indeed will make sure only one cluster job is submitted per subject, but it still needs to do all the accounting for all the rules. The long delay when running on graham is usually related to the slow network file system (/project,/scratch,/home) , especially if running on a large dataset. Snakemake writing to the .snakemake folder can also be slow if on the network file system.

I don't think the printing itself slows things down, and I agree it is a lot of text, but not sure there is an easy way to suppress that (if it's not possible via a snakemake option) without suppressing other necessary information..

The --group-components is mainly useful if you want more than one subject in a cluster job (e.g. --group-components subj=5 will group 5 subjects per job), which is useful when you have e.g. >1000 subjects, since only 1000 jobs can be submitted at a time on graham.. But it won't speed up the submission at all (but might save some time overall if jobs wait in the queue less). Note: group-components is mentioned in the docs, but in the Contributing to HippUnfold -> Instructions for Compute Canada..

That said, the most efficient way to run hippunfold for a large number of subjects is with a wrapper script -- this wrapper was made for this purpose: https://github.com/akhanf/batch-hippunfold

I'm leaving this issue open as a reminder to point to this wrapper in the hippunfold docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants