Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2083] Shim: Handle missing applicationID cleanly in standard mode #708

Closed
wants to merge 1 commit into from

Conversation

craigcondit
Copy link
Contributor

What is this PR for?

If a Pod with schedulerName: yunikorn is encountered with missing applicationID metadata, generate an applicationID using the same algorithm as the admission controller would have. This allows these Pods to be scheduled successfully.

What type of PR is it?

  • - Bug Fix
  • - Improvement
  • - Feature
  • - Documentation
  • - Hot Fix
  • - Refactoring

Todos

  • - Task

What is the Jira issue?

https://issues.apache.org/jira/browse/YUNIKORN-2083

How should this be tested?

Updated unit tests to handle new logic (both plugin / standard mode). Also manually created a Pod with the necessary conditions and verified that it scheduled properly and was assigned the appropriate applicationID by YuniKorn.

Screenshots (if appropriate)

Questions:

  • - The licenses files need update.
  • - There is breaking changes for older versions.
  • - It needs documentation.

@craigcondit craigcondit self-assigned this Oct 27, 2023
@@ -73,19 +74,24 @@ const (
CMKubeQPS = PrefixKubernetes + "qps"
CMKubeBurst = PrefixKubernetes + "burst"

// admissioncontroller
PrefixAMFiltering = PrefixAdmissionController + "filtering."
AMFilteringGenerateUniqueAppIds = PrefixAMFiltering + "generateUniqueAppId"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: This is needed as the specific algorithm used for generating the appID is controlled by an admission-controller specific property. We use that here as well to maintain backwards compatibility. The value is duplicated form am_conf.go as we would create a circular package reference otherwise.

@@ -208,6 +216,7 @@ func handleNonReloadableConfig(old *SchedulerConf, new *SchedulerConf) {
checkNonReloadableBool(CMSvcDisableGangScheduling, &old.DisableGangScheduling, &new.DisableGangScheduling)
checkNonReloadableString(CMSvcPlaceholderImage, &old.PlaceHolderImage, &new.PlaceHolderImage)
checkNonReloadableString(CMSvcNodeInstanceTypeNodeLabelKey, &old.InstanceTypeNodeLabelKey, &new.InstanceTypeNodeLabelKey)
checkNonReloadableBool(AMFilteringGenerateUniqueAppIds, &old.GenerateUniqueAppIds, &new.GenerateUniqueAppIds)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: We mark the generateUniqueAppIds property as non-reloadable since changing this value would cause the computed applicationIDs of existing Pods to change. This would completely mess up internal state tracking.

@codecov
Copy link

codecov bot commented Oct 27, 2023

Codecov Report

Merging #708 (f83dcb1) into master (1ccc1b1) will decrease coverage by 0.12%.
Report is 1 commits behind head on master.
The diff coverage is 80.00%.

@@            Coverage Diff             @@
##           master     #708      +/-   ##
==========================================
- Coverage   71.98%   71.87%   -0.12%     
==========================================
  Files          49       49              
  Lines        7949     7956       +7     
==========================================
- Hits         5722     5718       -4     
- Misses       2031     2041      +10     
- Partials      196      197       +1     
Files Coverage Δ
pkg/cache/context.go 49.94% <ø> (-0.33%) ⬇️
pkg/conf/schedulerconf.go 74.69% <100.00%> (+0.47%) ⬆️
pkg/shim/scheduler.go 73.05% <0.00%> (ø)
pkg/cache/task.go 69.84% <0.00%> (-0.39%) ⬇️
pkg/common/utils/utils.go 86.90% <87.50%> (+0.01%) ⬆️

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

…d mode

If a Pod with schedulerName: yunikorn is encountered with missing applicationID
metadata, generate an applicationID using the same algorithm as the admission
controller would have. This allows these Pods to be scheduled successfully.
Copy link
Contributor

@pbacsko pbacsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@craigcondit craigcondit deleted the YUNIKORN-2083 branch November 2, 2023 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants