Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't run Raw.filter with multiple cores #2

Open
kalenkovich opened this issue Jul 9, 2021 · 6 comments
Open

can't run Raw.filter with multiple cores #2

kalenkovich opened this issue Jul 9, 2021 · 6 comments

Comments

@kalenkovich
Copy link
Owner

Attempting to run the filtering rule with multiple cores leads to "Python has stopped working" error and nothing is filtered. There is a rule parameter threads that limits the number of threads per rule. I think setting it to 1 should work but I haven't tested it yet.

As to why this happens at all: no idea. I don't even know what we could check or who we could ask (Snakemake, mne-python?). What would a minimal reproducible example look like?

@kalenkovich
Copy link
Owner Author

Other rules having problems when running with multiple cores:

  • make_artifact_epochs, probably because the data are filtered before looking for artifacts,
  • select_artifact_components, results in Segmentation Fault, no idea why.

@kalenkovich
Copy link
Owner Author

I believe now that the threads parameter determines the number of threads per job, not per rule. This SO answer points to the use of resources to restrict the number of parallel jobs. snakemake would then need to be run with the --resources flag and setting the number of resources. If it coincides with the number of "resources" need by a job then no parallel jobs will be run.

@kalenkovich
Copy link
Owner Author

So, it kind of works. I added the following to the rules apply_linear_filter and make_artifact_epochs:

rule ...:
    ...
    resources:
        filtering_process = 1

And then called snakemake with

snakemake -j16 --resources filtering_process=1

There is even a way to do this without modifying Snakefile:

snakemake -j16 --resources filtering_process=1 --set-resources apply_linear_filter:filtering_process=1

Not sure whether there should be --set-resources for each rule or not. Not even sure how to test it.

@kalenkovich
Copy link
Owner Author

So, the reason it only kind of works is that while the jobs are not run in parallel, the segmentation fault happens even if a job is run in parallel with another job.

I do have a shitty workaround for that: we could set resources: filtering_process = (1/(workflow.cores + 1)) for every other rule. This way, there aren't enough resources to run any other job when a job with filtering_process = 1 is run.

I do not like this workaround though. First, I am not even sure it would work 😄 Second, it pollutes Snakefile and makes it confusing for newcomers.

@levchenkoegor, thoughts?

@kalenkovich
Copy link
Owner Author

Actually, filtering_process = (1/(workflow.cores + 1)) won't work: resources must be either ints or strs. The actual workaround is a bit different than I described above then:

  • set resources: filtering_process = 1 for every non-annoying rule,
  • set resources: filtering_process = workflow.cores for the annoying rules,
  • run snakemake -j16 --resources filtering_process=16

@kalenkovich
Copy link
Owner Author

Found another flag that might help: --default-resources. We can then do

snakemake -j16 --resources filtering_process=16 --default-resources filtering_process=1

Again, not sure if this would work but worth a try. If it does work, we would only have to set resources for the annoying rules. Or even set them through --set-resources so that we don't have to change Snakefile at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant