Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mutating schedulingGates when the Jobset is suspended #623

Merged
merged 1 commit into from
Jul 25, 2024

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Jul 24, 2024

SchedulingGates can be mutated for suspeded batch Jobs, so we would like to have similar capabilities for JobSet from Kueue perspective: https://github.com/kubernetes/kubernetes/blob/ceb58a4dbc671b9d0a2de6d73a1616bc0c299863/pkg/apis/batch/validation/validation.go#L662

We are considering to use schedulingGates in Kueue for fine-grained scheduling where a subset of pods may get an additional set of node labels, so we don't update the entire PodTemplate, but specific pods. In that case Kueue would have a controller which ungates the pods once adjusted.

This is an extension to #580.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 24, 2024
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 24, 2024
Copy link

netlify bot commented Jul 24, 2024

Deploy Preview for kubernetes-sigs-jobset canceled.

Name Link
🔨 Latest commit dd2dd59
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-jobset/deploys/66a28ebb1d49b70009fab951

@mimowo
Copy link
Contributor Author

mimowo commented Jul 24, 2024

/cc @danielvegamyhre @kannon92

Comment on lines +322 to +323
// Pod Scheduling Gates can be updated for batch/v1 Job: https://github.com/kubernetes/kubernetes/blob/ceb58a4dbc671b9d0a2de6d73a1616bc0c299863/pkg/apis/batch/validation/validation.go#L662
mungedSpec.ReplicatedJobs[index].Template.Spec.Template.Spec.SchedulingGates = oldJS.Spec.ReplicatedJobs[index].Template.Spec.Template.Spec.SchedulingGates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the batch/job, we can mutate the schedulingGates only when the Job didn't start: https://github.com/kubernetes/kubernetes/blob/ceb58a4dbc671b9d0a2de6d73a1616bc0c299863/pkg/registry/batch/job/strategy.go#L194

So, shouldn't we introduce the same criteria for the mutable scheduling directive here, right?

Copy link
Contributor Author

@mimowo mimowo Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For JobSet we just check if the Job is suspended. This is done in the entry if above if ptr.Deref(oldJS.Spec.Suspend, false) {.

The other part of the check is based on the status.startTime, but this field is not in the JobSet API:

type JobSetStatus struct {
// +optional
// +listType=map
// +listMapKey=type
Conditions []metav1.Condition `json:"conditions,omitempty"`
// Restarts tracks the number of times the JobSet has restarted (i.e. recreated in case of RecreateAll policy).
Restarts int32 `json:"restarts,omitempty"`
// RestartsCountTowardsMax tracks the number of times the JobSet has restarted that counts towards the maximum allowed number of restarts.
RestartsCountTowardsMax int32 `json:"restartsCountTowardsMax,omitempty"`
// TerminalState the state of the JobSet when it finishes execution.
// It can be either Complete or Failed. Otherwise, it is empty by default.
TerminalState string `json:"terminalState,omitempty"`
// ReplicatedJobsStatus track the number of JobsReady for each replicatedJob.
// +optional
// +listType=map
// +listMapKey=name
ReplicatedJobsStatus []ReplicatedJobStatus `json:"replicatedJobsStatus,omitempty"`
}
.

The difference between Job and JobSet is related to all of the fields (Annotations, Labels, NodeSelector, Tolerations too),.

Copy link
Contributor Author

@mimowo mimowo Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the check above only verifies that oldJS is suspended. Which means that, IIUC, there is a bug that we cannot modify the template on suspending (oldJS unsuspended, but js suspended). I think this could render failures in Kueue for the suspend, I will double-check and open a separate PR / Issue for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other part of the check is based on the status.startTime, but this field is not in the JobSet API:

Oh, I didn't know that! Thank you for the explanation!

I think this could render failures in Kueue for the suspend, I will double-check and open a separate PR / Issue for it.

I also feel that the behavior is a bug/regression. But, I agree with treating the regression as a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I have confirmed that this currently makes the suspend fail for JobSet in Kueue: kubernetes-sigs/kueue#2691. I have opened an issue for JobSet #624, and the PR: #625. PTAL.

Copy link
Contributor Author

@mimowo mimowo Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is rather straightforward (and follows the patterns of other fields) so I wasn't planning an e2e test here.

I think once this PR lands, we could have an extended e2e test scenario added for the other PR, where we:

  1. unsuspend a JobSet adding scheduling gates, and other PodTemplate fields
  2. suspend the JobSet reverting the scheduling gates and other PodTemplate fields (as Kueue would do)
  3. unsuspend again the JobSet with different values of the PodTemplate fields and check that the pods executes (JobSet completes)

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit + integration tests should be sufficient here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mimowo that sounds good. Maybe we create an issue documenting the goal of the e2e test so we can reference it.

Copy link
Contributor Author

@mimowo mimowo Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to just reference the proposal from the issue here: #624? I have added a comment there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit + integration tests should be sufficient here.

Ok, I have added the unit test dedicated for scheduling gates, while the integration test I extended the existing one for schedulingGates, because integration tests are heavier than unit, and the test case verifies that the mutation can happen, so any field that is immutable would fail the test (I confirmed it was failing before the change).

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
I think that this would be useful feature!

Comment on lines +322 to +323
// Pod Scheduling Gates can be updated for batch/v1 Job: https://github.com/kubernetes/kubernetes/blob/ceb58a4dbc671b9d0a2de6d73a1616bc0c299863/pkg/apis/batch/validation/validation.go#L662
mungedSpec.ReplicatedJobs[index].Template.Spec.Template.Spec.SchedulingGates = oldJS.Spec.ReplicatedJobs[index].Template.Spec.Template.Spec.SchedulingGates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other part of the check is based on the status.startTime, but this field is not in the JobSet API:

Oh, I didn't know that! Thank you for the explanation!

I think this could render failures in Kueue for the suspend, I will double-check and open a separate PR / Issue for it.

I also feel that the behavior is a bug/regression. But, I agree with treating the regression as a separate issue.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 25, 2024
@danielvegamyhre danielvegamyhre self-assigned this Jul 25, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 25, 2024
@danielvegamyhre
Copy link
Contributor

/lgtm

Will approve once test finish running and I confirm they all are passing.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2024
@danielvegamyhre
Copy link
Contributor

/approve

Thanks!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danielvegamyhre, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2024
@k8s-ci-robot k8s-ci-robot merged commit dd5bc69 into kubernetes-sigs:main Jul 25, 2024
13 checks passed
@danielvegamyhre danielvegamyhre mentioned this pull request Aug 19, 2024
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants