pbs provider configurability vs partial-allocated nodes #3616

benclifford · 2024-09-13T14:24:26Z

Describe the bug
see LSST PR lsst/ctrl_bps_parsl#36

PBS at IPMU requires a user to specify how many cores they want on a node, and allocates nodes fractionally based on this.

This LSST PR tries to do that by setting tasks_per_node on the PBS provider, which is not the right thing to do: setting that does indeed request more cores, but it also causes the launcher lay to run that many copies of the worker pool.

This value is hard coded to 1 in htex:

parsl/parsl/executors/status_handling.py

Line 251 in dd9150d

job_id = self.provider.submit(launch_cmd, 1, job_name)

and the htex code mostly assumes that the code will be allocated an entire node. (c.f. the slurm provider code which has a default-on exclusive flag to get an entire node).

The current provider/launcher abstraction isn't able to deal with this value being different for these two different use cases: we want to request many cores per node, but then only have the launcher layer launch one process worker pool per node.

cc @ryanchard who knows the most about parsl+pbs

benclifford added the bug label Sep 13, 2024

This was referenced Sep 13, 2024

Re-work tasks_per_node #943

Open

Running htex/exex with TorqueProvider hardcodes ppn=1 #669

Open

add support for torque in IPMU lsst/ctrl_bps_parsl#36

Open

consider remove/rework provider tasks_per_node options #3617

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pbs provider configurability vs partial-allocated nodes #3616

pbs provider configurability vs partial-allocated nodes #3616

benclifford commented Sep 13, 2024

pbs provider configurability vs partial-allocated nodes #3616

pbs provider configurability vs partial-allocated nodes #3616

Comments

benclifford commented Sep 13, 2024