Skip to content

Commit

Permalink
config-manager: add kludge/workaround for CRI-O.
Browse files Browse the repository at this point in the history
It looks like in the case of CRI-O we need to give it some time
after we have been started up but before we kick it in the head
to restart it over D-Bus. Otherwise it will always report a 255
(-1) exit status for us. Since we run as an init-container, a
non-zero exit status would prevent other containers in our pod
from ever starting up.

It would be good to try and find out what is the exact reason
why this behavior is exhibited by CRI-O but not containerd, and
if it could be fixed in CRI-O. Until then... this.

Signed-off-by: Krisztian Litkey <[email protected]>
(cherry picked from commit 8146d2f)
  • Loading branch information
klihub authored and marquiz committed Oct 27, 2023
1 parent a43cb59 commit f7efa40
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions cmd/config-manager/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"context"
"fmt"
"os"
"time"

"github.com/coreos/go-systemd/v22/dbus"
tomlv2 "github.com/pelletier/go-toml/v2"
Expand Down Expand Up @@ -62,6 +63,19 @@ func main() {
log.Fatalf("error enabling NRI: %v", err)
}

//
// TODO(klihub): Kludge warning...
// If the runtime is CRI-O, it looks like we need to cut it some
// slack, after we've been started up by it but before we restart
// it. Otherwise it always reports our exit status as -1 (255).
// We are an init-container so a non-zero exit status would prevent
// other containers in our pod from ever starting...
//

if unit == crioUnit {
time.Sleep(3 * time.Second)
}

if err = restartSystemdUnit(conn, unit); err != nil {
log.Fatalf("failed to restart %q unit: %v", unit, err)
}
Expand Down

0 comments on commit f7efa40

Please sign in to comment.