Rework Initializer container and handling its verdict #450

frittentheke · 2024-08-22T11:55:33Z

Use exit codes (of k6 archive + inspect) as sole indication for initialization success

Introduce an EmptyDir volume for temporary data
This removes the need for any mkdir and avoids writing to the container filesystem,
allowing for e.g. readOnlyRootFilesystem security contexts ([1]).
Remove all output redirection and grepping for error indicating strings in favor of
using the exit codes from the commands to indicate success or failure reestablishing the
clear contract of the inspect command to judge the provided config.

This approach is much cleaner in regards to any changes to the output of K6 and the vast
space of error there is.

k6 inspect --execution-requirements ([2,3]) while also printing warnings or the JSON
should clearly indicate via err if there are issues with the provided tests, which
are then converted to a non-zero RC ([4]).

Set TerminationMessagePolicy on Initializer to use logs as fallback

While not explicitly used, with the removal of the log fetching from the
Initializer (Job->Pod->Container) any potential source for verbose information
about the error might be lost. The Kubernetes approach to provide more details
than the exit code about the reason a container terminated is the termination
message (see [5]).

Per default this message needs to be explictly written to /dev/termination-log,
but there also is the option to use the last bit of log output as fallback source.

Remove fetching of container logs from initializer

Following the focus on the exit code of the command using in the initializer job (Pod),
this commit removes the fetching of the container log.

There was only some basic JSON unmarshalling applied with no interpretation of what it contained.
This is either covered by k6 inspect exiting with rc=0 or should be added to the initializer
job.

If further details about the failure reason of the initalizer container was to be required,
the source should be the termination message. It could be used to enrich the TestRun CR to provide
a higher level of detail about the failure to the user.

[1] https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
[2] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L34
[3] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L64
[4] https://github.com/grafana/k6/blob/master/cmd/root.go#L90-L118
[5] https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/

Fixes: #435

…alization success 1) Introduce an EmptyDir volume for temporary data This removes the need for any mkdir and avoids writing to the container filesystem, allowing for e.g. `readOnlyRootFilesystem` security contexts ([1]). 2) Remove all output redirection and grepping for error indicating strings in favor of using the exit codes from the commands to indicate success or failure. This approach is much cleaner in regards to any changes to the output of K6 and the vast space of error there is. It also reestablishes the clear contract of the `inspect` command to judge the provided config. `k6 inspect --execution-requirements` ([2,3]) while also printing warnings or the JSON should clearly indicate via `err` if there are issues with the provided tests, which are then converted to a non-zero RC ([4]). [1] https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ [2] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L34 [3] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L64 [4] https://github.com/grafana/k6/blob/master/cmd/root.go#L90-L118 Fixes: grafana#435

While not explicitly used, with the removal of the log fetching from the Initializer (Job->Pod->Container) any potential source for verbose information about the error might be lost. The Kubernetes approach to provide more details than the exit code about the reason a container terminated is the termination message (see [1]). Per default this message needs to be explictly written to `/dev/termination-log`, but there also is the option to use the last bit of log output as fallback source. [1] https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/

Following the focus on the exit code of the command using in the initializer job (Pod), this commit removes the fetching of the container log. There was only a basic JSON unmarshalling applied with no interpretion of what it contained. This is either covered by `k6 inspect` exiting with rc=0 or should be added to the initializer job. If further details about the failure reason of the initalizer container was to be required, the source should be the termination message. It could be used to enrich the TestRun CR to provide a higher level of detail about the failure to the user. [1] https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/#customizing-the-termination-message

yorugac · 2024-11-05T13:03:41Z

Hi @frittentheke, thanks for the suggestions. I'm afraid I don't see how to make these work right now 😔

Let's go point-by-point:

EmptyDir: I'm yet to hear anyone bringing up readOnlyRootFilesystem as a requirement (unless you meant this requirement is present in your setup?). Nevertheless, it's a valid point. But by your PR, it is unclear what part is supposed to read that volume. For example, right now, the logs are being read by the app directly, but I don't think there's a guarantee that EmptyDir of another pod (initializer) would continue to exist long enough nor that the app and initializer will be scheduled on the same node in the first place.

Going by the rest of the description, esp. point 4, I assume you suggest to actually remove parsing the log completely. That cannot be done ATM, at all. Why we're reading the logs is mentioned in this comment.
Remove all output redirection and grepping for error indicating strings in favor of
using the exit codes from the commands to indicate success or failure reestablishing the
clear contract of the inspect command to judge the provided config.

Sadly, k6 is not consistent enough around logging and exit codes both to risk such big adjustments here. If you're curious, you can see issues like this or this. The current logic of a complex command in initializer did not appear out of thin air, so to say: it is meant to cover all cases possible.

Also, I suppose your PR 453 is kind of a continuation of the same idea: there is a more specific discussion over there.
TerminationMessagePolicy on initializer: sounds interesting, but also a bit redundant, given our current reading of initializer logs.
Remove fetching of container logs from initializer

Already covered above.

frittentheke added 3 commits August 22, 2024 13:54

frittentheke force-pushed the rcInitializer_435 branch from d302e11 to 1fd1c56 Compare August 22, 2024 12:04

frittentheke mentioned this pull request Aug 22, 2024

Issues within initalizer error handling if script is incorrect #435

Open

frittentheke marked this pull request as draft August 23, 2024 07:23

frittentheke mentioned this pull request Oct 1, 2024

Combine all initializer commands with && to catch any failing commands #453

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework Initializer container and handling its verdict #450

Rework Initializer container and handling its verdict #450

frittentheke commented Aug 22, 2024 •

edited

Loading

yorugac commented Nov 5, 2024 •

edited

Loading

Rework Initializer container and handling its verdict #450

Are you sure you want to change the base?

Rework Initializer container and handling its verdict #450

Conversation

frittentheke commented Aug 22, 2024 • edited Loading

yorugac commented Nov 5, 2024 • edited Loading

frittentheke commented Aug 22, 2024 •

edited

Loading

yorugac commented Nov 5, 2024 •

edited

Loading