-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Placement Groups #32
Comments
I agree that placement groups will need to be considered alongside high-performance networking. You have identified the main problem with implementing them which is the difference in work-mode between how Slurm is usually configured (a fixed list of nodes with names and properties) and being able to create nodes for a particular job (we fake it by using Are you imagining creating a set of nodes within a placement group for every "HPC" job that is submitted? Would they then be available for reuse by another job or would a new set be created? If we submit two jobs (each wanting 10 nodes) using the same instance type, would that go to two different placement groups or would it put the new nodes into the existing placement group? I wonder if Slurm's job-submit plugins could help here? I've played with them in the past and have written a plugin which allows you to write them in Python (rather than C or Lua) at slurm-job-submit-python. These plugins allow you to add any information you want into a job's definition so could be used to dynamically add reservations, node lists, constraints etc. to a job. I haven't started any work on this so I welcome you to start looking into it. I imagine that answers to some of the questions I have raised above will only become clear as the work progresses. |
In the "Cloud-Ideal" world, for each "HPC" job, we would create a new placement group, and start nodes in that group. This does require then that an instantiation of an instance is not then re-used between jobs. If you've lots of smaller jobs, this overhead could become problematic and expensive. The tradeoff option I am suggesting is that we create a placement group per instance type. All nodes of type 'X' will be added to the placement group for type 'X'. As jobs start and stop, Slurm creates and destroys nodes as it normally does. To specifically answer your questions:
|
As part of Terraform/#63 (AWS EFA support), support for AWS Placement groups are required. I've been contemplating this a bit recently, as placement groups (AWS, Azure) and GCP Group Placement Policies are somewhat important good performance with certain HPC jobs.
Placement groups are a great match to a single HPC job, or a static set of nodes. They're not really conducive to very elastic environments, or environments where you may mix & match instance types. While they can work there, you're just more likely to get capacity issues and instances failing to launch.
There are also some restrictions that are challenging to support:
Thus, placement groups need to be a somewhat optional feature, and it would be nice to treat both AWS and GCP similarly, even though they have different restrictions.
I don't believe that we can create the placement groups as part of the Terraform process, as at that point,
limits.yaml
doesn't exist, and we don't know how big the cluster could be (affects GCP).I don't believe that we can create the placement groups as part of the SLURM
ResumeProgram
call tostartnode.py
, as this isn't directly linked to a single job. Creating a group for everystartnode
call will get messy as the nodes not all terminate at a set time, so cleanup becomes a challenge. That said, I do believe thatstartnode
ought to change to enable all the nodes which SLURM wishes to start at once be done in a single API call - it's more likely that the cloud scheduler will be able to find space for the set of nodes, placed compactly (in the placement group) if they are all started in a single call.Suggesting course
I'm currently thinking that making changes to
update_config.py
is our best spot for creating for placement groups. Each call toupdate_config
could clean up/terminate existing placement groups that are part of our${cluster_id}
, and create new placement group(s).I feel like creating a placement group per shape defined in
limits.yaml
would make the most sense. This way, we would, for example, group C5n instances together, and group C6gn instances together, without trying to get AWS to find a way to compactly mix ARM and x86 instances.We would also want to update
startnode
to add the placement policy to the instance starts, in the case where we have placement group created. (ie, we wouldn't create them for AWS t3a instances, as they're burstable, or n1 instances on GCP).Is there already work in progress to support Placement Groups? If not, does my suggested course of action seem reasonable? I can work on this, and offer patches, but I wanted to make sure that the plan seems reasonable to the core team first.
The text was updated successfully, but these errors were encountered: