config-manager: add kludge/workaround for CRI-O.

It looks like in the case of CRI-O we need to give it some time after we have been started up but before we kick it in the head to restart it over D-Bus. Otherwise it will always report a 255 (-1) exit status for us. Since we run as an init-container, a non-zero exit status would prevent other containers in our pod from ever starting up. It would be good to try and find out what is the exact reason why this behavior is exhibited by CRI-O but not containerd, and if it could be fixed in CRI-O. Until then... this. Signed-off-by: Krisztian Litkey <[email protected]> (cherry picked from commit 8146d2f)
marquiz · Oct 27, 2023 · f7efa40 · f7efa40
1 parent a43cb59
commit f7efa40
Showing 1 changed file with 14 additions and 0 deletions.
diff --git a/cmd/config-manager/main.go b/cmd/config-manager/main.go
@@ -22,6 +22,7 @@ import (
 	"context"
 	"fmt"
 	"os"
+	"time"
 
 	"github.com/coreos/go-systemd/v22/dbus"
 	tomlv2 "github.com/pelletier/go-toml/v2"
@@ -62,6 +63,19 @@ func main() {
 		log.Fatalf("error enabling NRI: %v", err)
 	}
 
+	//
+	// TODO(klihub): Kludge warning...
+	//   If the runtime is CRI-O, it looks like we need to cut it some
+	//   slack, after we've been started up by it but before we restart
+	//   it. Otherwise it always reports our exit status as -1 (255).
+	//   We are an init-container so a non-zero exit status would prevent
+	//   other containers in our pod from ever starting...
+	//
+
+	if unit == crioUnit {
+		time.Sleep(3 * time.Second)
+	}
+
 	if err = restartSystemdUnit(conn, unit); err != nil {
 		log.Fatalf("failed to restart %q unit: %v", unit, err)
 	}