Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pbs provider configurability vs partial-allocated nodes #3616

Open
benclifford opened this issue Sep 13, 2024 · 0 comments
Open

pbs provider configurability vs partial-allocated nodes #3616

benclifford opened this issue Sep 13, 2024 · 0 comments
Labels

Comments

@benclifford
Copy link
Collaborator

Describe the bug
see LSST PR lsst/ctrl_bps_parsl#36

PBS at IPMU requires a user to specify how many cores they want on a node, and allocates nodes fractionally based on this.

This LSST PR tries to do that by setting tasks_per_node on the PBS provider, which is not the right thing to do: setting that does indeed request more cores, but it also causes the launcher lay to run that many copies of the worker pool.

This value is hard coded to 1 in htex:

job_id = self.provider.submit(launch_cmd, 1, job_name)

and the htex code mostly assumes that the code will be allocated an entire node. (c.f. the slurm provider code which has a default-on exclusive flag to get an entire node).

The current provider/launcher abstraction isn't able to deal with this value being different for these two different use cases: we want to request many cores per node, but then only have the launcher layer launch one process worker pool per node.

cc @ryanchard who knows the most about parsl+pbs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant