Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Placement Groups #32

Open
ghost opened this issue Jan 7, 2021 · 2 comments
Open

Support for Placement Groups #32

ghost opened this issue Jan 7, 2021 · 2 comments
Labels
AWS enhancement New feature or request Google

Comments

@ghost
Copy link

ghost commented Jan 7, 2021

As part of Terraform/#63 (AWS EFA support), support for AWS Placement groups are required. I've been contemplating this a bit recently, as placement groups (AWS, Azure) and GCP Group Placement Policies are somewhat important good performance with certain HPC jobs.

Placement groups are a great match to a single HPC job, or a static set of nodes. They're not really conducive to very elastic environments, or environments where you may mix & match instance types. While they can work there, you're just more likely to get capacity issues and instances failing to launch.

There are also some restrictions that are challenging to support:

  • On GCP, Group Placement Policies are limited to only C2 node types (which aren't really supported by CitC yet), and only up to 22 VM instances. The number of instances that will be in the Group Placement policy must be set when creating the policy.
  • On AWS, Cluster placement groups don't support all VM types (ie, Burstable vCPU (T-series) and Mac)

Thus, placement groups need to be a somewhat optional feature, and it would be nice to treat both AWS and GCP similarly, even though they have different restrictions.

I don't believe that we can create the placement groups as part of the Terraform process, as at that point, limits.yaml doesn't exist, and we don't know how big the cluster could be (affects GCP).

I don't believe that we can create the placement groups as part of the SLURM ResumeProgram call to startnode.py, as this isn't directly linked to a single job. Creating a group for every startnode call will get messy as the nodes not all terminate at a set time, so cleanup becomes a challenge. That said, I do believe that startnode ought to change to enable all the nodes which SLURM wishes to start at once be done in a single API call - it's more likely that the cloud scheduler will be able to find space for the set of nodes, placed compactly (in the placement group) if they are all started in a single call.

Suggesting course

I'm currently thinking that making changes to update_config.py is our best spot for creating for placement groups. Each call to update_config could clean up/terminate existing placement groups that are part of our ${cluster_id}, and create new placement group(s).

I feel like creating a placement group per shape defined in limits.yaml would make the most sense. This way, we would, for example, group C5n instances together, and group C6gn instances together, without trying to get AWS to find a way to compactly mix ARM and x86 instances.

We would also want to update startnode to add the placement policy to the instance starts, in the case where we have placement group created. (ie, we wouldn't create them for AWS t3a instances, as they're burstable, or n1 instances on GCP).

Is there already work in progress to support Placement Groups? If not, does my suggested course of action seem reasonable? I can work on this, and offer patches, but I wanted to make sure that the plan seems reasonable to the core team first.

@milliams milliams added AWS enhancement New feature or request Google labels Jan 13, 2021
@milliams
Copy link
Member

I agree that placement groups will need to be considered alongside high-performance networking.

You have identified the main problem with implementing them which is the difference in work-mode between how Slurm is usually configured (a fixed list of nodes with names and properties) and being able to create nodes for a particular job (we fake it by using CLOUD nodes, but they must all be defined in advance). Until now, this has not been a problem since all nodes are independent and can work just as well for one job as another.

Are you imagining creating a set of nodes within a placement group for every "HPC" job that is submitted? Would they then be available for reuse by another job or would a new set be created?

If we submit two jobs (each wanting 10 nodes) using the same instance type, would that go to two different placement groups or would it put the new nodes into the existing placement group?

I wonder if Slurm's job-submit plugins could help here? I've played with them in the past and have written a plugin which allows you to write them in Python (rather than C or Lua) at slurm-job-submit-python. These plugins allow you to add any information you want into a job's definition so could be used to dynamically add reservations, node lists, constraints etc. to a job.

I haven't started any work on this so I welcome you to start looking into it. I imagine that answers to some of the questions I have raised above will only become clear as the work progresses.

@ghost
Copy link
Author

ghost commented Jan 13, 2021

In the "Cloud-Ideal" world, for each "HPC" job, we would create a new placement group, and start nodes in that group. This does require then that an instantiation of an instance is not then re-used between jobs. If you've lots of smaller jobs, this overhead could become problematic and expensive.

The tradeoff option I am suggesting is that we create a placement group per instance type. All nodes of type 'X' will be added to the placement group for type 'X'. As jobs start and stop, Slurm creates and destroys nodes as it normally does.

To specifically answer your questions:

  • "If we submit two jobs (each wanting 10 nodes) using the same instance type, would that go to two different placement groups or would it put the new nodes into the existing placement group?"
    • They would go into a single placement group.
  • "Are you imagining creating a set of nodes within a placement group for every "HPC" job that is submitted? Would they then be available for reuse by another job or would a new set be created?"
    • The suggested course is attempting to work within the way Slurm currently works with node creation. The placement group would not be tied to jobs, but rather instance types. Nodes will be started when Slurm asks for them, and re-used as Slurm is used to. This is definitely a tradeoff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AWS enhancement New feature or request Google
Projects
None yet
Development

No branches or pull requests

1 participant