Skip to content

Commit

Permalink
status: don't treat kvstore as ready during initialization
Browse files Browse the repository at this point in the history
Currently, the kvstore status is considered ready even during the
initialization phase, when the connection has not been established
yet. This behavior is potentially confusing, as the agents/operator
eventually turn ready, breaking out of the startup probe phase, even
if an essential subsystem is not ready at all, only to start crashing
a few minutes later due to other timeouts kicking in.

Let's modify this behavior so that the kvstore subsystem, and in turn
the Cilium agents are marked ready only after successfully establishing
the connection to etcd. Yet, we enable this behavior only if the support
for running etcd in pod network is disabled. Differently, we need to
preserve the previous behavior, to break the chicken-and-egg dependency
during startup.

Signed-off-by: Marco Iorio <[email protected]>
  • Loading branch information
giorio94 authored and julianwiedmann committed Nov 25, 2024
1 parent bd7fcd0 commit 6d5fd6d
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 2 deletions.
8 changes: 8 additions & 0 deletions daemon/cmd/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -596,6 +596,14 @@ func (d *Daemon) startStatusCollector(cleaner *daemonCleanup) {
}

if kvstore, ok := status.Data.(*models.Status); ok {
if kvstore.State == models.StatusStateWarning && option.Config.KVstorePodNetworkSupport {
// Don't treat warnings as errors when the support for running
// etcd in pod network is enabled. This is necessary to allow
// Cilium turning ready even before connecting to the kvstore,
// and break the chicken-and-egg dependency during startup.
kvstore.State = models.StatusStateOk
}

d.statusResponse.Kvstore = kvstore
}
},
Expand Down
8 changes: 7 additions & 1 deletion operator/api/health.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import (
k8sClient "github.com/cilium/cilium/pkg/k8s/client"
"github.com/cilium/cilium/pkg/kvstore"
"github.com/cilium/cilium/pkg/logging/logfields"
"github.com/cilium/cilium/pkg/option"
)

type kvstoreEnabledFunc func() bool
Expand Down Expand Up @@ -86,7 +87,12 @@ func (h *healthHandler) checkStatus() error {
}

status := client.Status()
if status.State != models.StatusStateOk {
if status.State != models.StatusStateOk &&
// Don't treat warnings as errors when the support for running
// etcd in pod network is enabled. This is necessary to allow
// Cilium turning ready even before connecting to the kvstore,
// and break the chicken-and-egg dependency during startup.
!(status.State == models.StatusStateWarning && option.Config.KVstorePodNetworkSupport) {
return errors.New(status.Msg)
}
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/kvstore/etcd.go
Original file line number Diff line number Diff line change
Expand Up @@ -544,7 +544,7 @@ func connectEtcdClient(ctx context.Context, config *client.Config, cfgPath strin
configPath: cfgPath,
firstSession: make(chan struct{}),
status: models.Status{
State: models.StatusStateOk,
State: models.StatusStateWarning,
Msg: "Waiting for initial connection to be established",
},
stopStatusChecker: make(chan struct{}),
Expand Down

0 comments on commit 6d5fd6d

Please sign in to comment.