Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add default_ineligible configuration option #13082

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wjordan
Copy link
Contributor

@wjordan wjordan commented May 19, 2022

Specifies if scheduling eligibility should be disabled by default for new client nodes.

Fixes #13081, see this issue for discussion of the use-case.

Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wjordan and thanks for taking an initial pass at this. I have added some inline comments around the functionality.

I am also curious of how this approach would work. If the value is set via a config file value, would the operator be expected to modify this value and then restart the agent?

The Nomad API also includes a method for updating node eligibility which operators could use. If this was the case, and the agent restarted (crash or otherwise) would the config value override what the API call did?

client/client.go Outdated
Comment on lines 1386 to 1390
if node == nil {
node = &structs.Node{}
if c.config.DefaultIneligible {
node.SchedulingEligibility = api.NodeSchedulingIneligible
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nil check is a safety check to avoid a panic in the event the node config object is not correctly set. If we chase the function calls from this client.setupNode call we see:

Therefore during normal and hopefully every client setup this condition is never hit and therefore the eligibility toggle is not used.

@@ -1795,3 +1796,18 @@ func TestClient_ReconnectAllocs(t *testing.T) {
require.False(t, invalid, "expected alloc to not be marked invalid")
require.Equal(t, unknownAlloc.AllocModifyIndex, finalAlloc.AllocModifyIndex)
}

func TestClient_DefaultIneligible(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems like it is attempting to match the exact conditions seen within the code which as previously noted is a bit of a red herring.

@wjordan wjordan force-pushed the default_ineligible branch from e649855 to cb2484b Compare May 20, 2022 17:23
@wjordan
Copy link
Contributor Author

wjordan commented May 20, 2022

Thanks @jrasell for quick feedback and pointing me in the right direction with the implementation. I've updated the code to pass DefaultIneligible as a NodeRegisterRequest argument to Node.Register RPC, so that the server can apply the configurable default appropriately: https://github.com/hashicorp/nomad/blob/cb2484b3f88585974fbd993e2976b2e96440ed53/nomad/node_endpoint.go#L124-L131

I've updated the test as well. Please take a look and let me know if this is headed in the right direction!

@wjordan
Copy link
Contributor Author

wjordan commented May 20, 2022

To answer your questions:

I am also curious of how this approach would work. If the value is set via a config file value, would the operator be expected to modify this value and then restart the agent?

The intended behavior is to only apply the config value as an initial default for newly-registered nodes, so the operator would be expected to set this via a config file value before starting the agent the first time.

The Nomad API also includes a method for updating node eligibility which operators could use. If this was the case, and the agent restarted (crash or otherwise) would the config value override what the API call did?

No, the config value is only intended to allow configuring the default for newly-registered nodes, so it wouldn't override anything once the node has been registered.

Specifies if scheduling eligibility should be
disabled by default for new client nodes.
@wjordan wjordan force-pushed the default_ineligible branch from 31e6a62 to abc5188 Compare May 20, 2022 21:29
@wjordan
Copy link
Contributor Author

wjordan commented May 20, 2022

Third attempt at this in abc5188, just set the client node's SchedulingEligibility value based on DefaultIneligible in setupNode: https://github.com/hashicorp/nomad/blob/31e6a62381ba8e9b89f831bc570799c03affd118/client/client.go#L1475-L1479

This value will be sent to the server on the initial registration. Re-registering an existing node (e.g., on agent restart) should already correctly preserve the existing value, thanks to existing logic in upsertNodeTxn that retains SchedulingEligibility:

node.SchedulingEligibility = exist.SchedulingEligibility // Retain the eligibility

@Tyrael
Copy link

Tyrael commented Oct 27, 2023

any update on this? was surprised that this isn't supported yet.

@miguelwhite
Copy link

Are there any updates on this?

@tgross tgross added the stage/needs-rebase This PR needs to be rebased on main before it can be backported to pick up new BPA workflows label May 17, 2024
@luizrojo
Copy link

Any updates on this change?

@josegonzalez
Copy link
Contributor

@wjordan would you be able to rebase the PR? If not, mind if I reissue it to get this feature merged by the hashicorp folks?

@4W5ZeZPMc
Copy link

@tgross @jrasell is it possible to have a look again at this PR, this feature would be awesome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/needs-rebase This PR needs to be rebased on main before it can be backported to pick up new BPA workflows type/enhancement
Projects
8 participants