-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(planstate): make plan propagation safer #474
Conversation
if err != nil { | ||
return err | ||
} | ||
if err := d.SetServiceArgs(mappedArgs); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SetServiceArgs
requires the service manager plan so we have to move it down slightly. Care must be taken to not make direct Plan Manager calls before the state engine is running, as the lock-less load happens during this time. We could add the lock for the initial load / propagate if really needed.
internals/overlord/overlord.go
Outdated
@@ -379,15 +371,34 @@ func (o *Overlord) Loop() { | |||
} | |||
} | |||
}) | |||
o.ensureWaitRun() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also discussing this proposal with @pedronis in a separate thread, to make sure this is compatible with snapd.
@@ -89,7 +88,7 @@ type Overlord struct { | |||
ensureLock sync.Mutex | |||
ensureTimer *time.Timer | |||
ensureNext time.Time | |||
ensureRun int32 | |||
ensureRun chan struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am extending an existing feature here to also allow for waiting, not only checking if ensure ran already.
2f30648
to
b7e28c8
Compare
@@ -38,8 +37,7 @@ const ( | |||
|
|||
// CheckManager starts and manages the health checks. | |||
type CheckManager struct { | |||
state *state.State | |||
ensureDone atomic.Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic, which would also have been needed in derived project managers, can now safely be removed. The race is prevented in the overlord.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Just a couple of style comments.
d29c117
to
b0df183
Compare
The original Pebble plan was part of the service manager, only concerned with services. The plan was lazy loaded, so the first service manager public method called (internally or over the HTTP API) would result in plan layers getting loaded, combined and propagated. This scheme was later extended to support additional managers, which also added the
PlanChanged
subscription mechanism. Subscribed managers would receive updates of the plan on the initial load and subsequent runtime changes, due to layersadd
ed using the HTTP API.The plan manager decoupled the Pebble Plan from the service manager. The first version of the plan manager simply loaded the plan from disk as early as possible, before the HTTP API endpoints were activated. However, since the load also triggered a
PlanChanged
update to all subscribed managers, these managers received an update of the plan before the State Engine was running. This resulted in extra code required incheckstate
to defer callingEnsure()
inside thePlanChanged
callback.The following changes are proposed:
Adapt the planstate load and plan changed notifications to only use standard State Engine events
StartUp()
andEnsure()
during startup. Note that because we now guarantee a call to Ensure before the HTTP API is enabled, it is safe to do the initial load and propagate lock-less (care has to be taken so external callers do to call us before the state engine is running).Make a change to the overlord to only enable the HTTP API endpoints once both
StartUp
and at least oneEnsure()
pass was completed. Note that this does not impact startup performance of managers, but would ensure external HTTP API calls, and self-calls (i.e. autostart) would only happen afterStartUp
and at least oneEnsure()
completed, giving managers a guarantee on startup behaviour.Performing a simple startup duration test from the application entrypoint until HTTP API is enabled resulted in no measurable difference.