Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework Initializer container and handling its verdict #450

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

frittentheke
Copy link
Contributor

@frittentheke frittentheke commented Aug 22, 2024

Use exit codes (of k6 archive + inspect) as sole indication for initialization success

  1. Introduce an EmptyDir volume for temporary data
    This removes the need for any mkdir and avoids writing to the container filesystem,
    allowing for e.g. readOnlyRootFilesystem security contexts ([1]).

  2. Remove all output redirection and grepping for error indicating strings in favor of
    using the exit codes from the commands to indicate success or failure reestablishing the
    clear contract of the inspect command to judge the provided config.

This approach is much cleaner in regards to any changes to the output of K6 and the vast
space of error there is.

k6 inspect --execution-requirements ([2,3]) while also printing warnings or the JSON
should clearly indicate via err if there are issues with the provided tests, which
are then converted to a non-zero RC ([4]).

  1. Set TerminationMessagePolicy on Initializer to use logs as fallback

While not explicitly used, with the removal of the log fetching from the
Initializer (Job->Pod->Container) any potential source for verbose information
about the error might be lost. The Kubernetes approach to provide more details
than the exit code about the reason a container terminated is the termination
message (see [5]).

Per default this message needs to be explictly written to /dev/termination-log,
but there also is the option to use the last bit of log output as fallback source.

  1. Remove fetching of container logs from initializer

Following the focus on the exit code of the command using in the initializer job (Pod),
this commit removes the fetching of the container log.

There was only some basic JSON unmarshalling applied with no interpretation of what it contained.
This is either covered by k6 inspect exiting with rc=0 or should be added to the initializer
job.

If further details about the failure reason of the initalizer container was to be required,
the source should be the termination message. It could be used to enrich the TestRun CR to provide
a higher level of detail about the failure to the user.

[1] https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
[2] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L34
[3] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L64
[4] https://github.com/grafana/k6/blob/master/cmd/root.go#L90-L118
[5] https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/

Fixes: #435

…alization success

1) Introduce an EmptyDir volume for temporary data
This removes the need for any mkdir and avoids writing to the container filesystem,
allowing for e.g. `readOnlyRootFilesystem` security contexts ([1]).

2) Remove all output redirection and grepping for error indicating strings in favor of
using the exit codes from the commands to indicate success or failure.
This approach is much cleaner in regards to any changes to the output of K6 and the vast
space of error there is.

It also reestablishes the clear contract of the `inspect` command to judge the provided
config.

`k6 inspect --execution-requirements` ([2,3]) while also printing warnings or the JSON
should clearly indicate via `err` if there are issues with the provided tests, which
are then converted to a non-zero RC ([4]).

[1] https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
[2] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L34
[3] https://github.com/grafana/k6/blob/c212bd277aeedbf8395b917df11bd9be4da490a4/cmd/inspect.go#L64
[4] https://github.com/grafana/k6/blob/master/cmd/root.go#L90-L118

Fixes: grafana#435
While not explicitly used, with the removal of the log fetching from the
Initializer (Job->Pod->Container) any potential source for verbose information
about the error might be lost. The Kubernetes approach to provide more details
than the exit code about the reason a container terminated is the termination
message (see [1]).

Per default this message needs to be explictly written to `/dev/termination-log`,
but there also is the option to use the last bit of log output as fallback source.

[1] https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/
Following the focus on the exit code of the command using in the initializer job (Pod),
this commit removes the fetching of the container log.

There was only a basic JSON unmarshalling applied with no interpretion of what it contained.
This is either covered by `k6 inspect` exiting with rc=0 or should be added to the initializer
job.

If further details about the failure reason of the initalizer container was to be required,
the source should be the termination message. It could be used to enrich the TestRun CR to provide
a higher level of detail about the failure to the user.

[1] https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/#customizing-the-termination-message
@yorugac
Copy link
Collaborator

yorugac commented Nov 5, 2024

Hi @frittentheke, thanks for the suggestions. I'm afraid I don't see how to make these work right now 😔

Let's go point-by-point:

  1. EmptyDir: I'm yet to hear anyone bringing up readOnlyRootFilesystem as a requirement (unless you meant this requirement is present in your setup?). Nevertheless, it's a valid point. But by your PR, it is unclear what part is supposed to read that volume. For example, right now, the logs are being read by the app directly, but I don't think there's a guarantee that EmptyDir of another pod (initializer) would continue to exist long enough nor that the app and initializer will be scheduled on the same node in the first place.

    Going by the rest of the description, esp. point 4, I assume you suggest to actually remove parsing the log completely. That cannot be done ATM, at all. Why we're reading the logs is mentioned in this comment.

  2. Remove all output redirection and grepping for error indicating strings in favor of
    using the exit codes from the commands to indicate success or failure reestablishing the
    clear contract of the inspect command to judge the provided config.

    Sadly, k6 is not consistent enough around logging and exit codes both to risk such big adjustments here. If you're curious, you can see issues like this or this. The current logic of a complex command in initializer did not appear out of thin air, so to say: it is meant to cover all cases possible.

    Also, I suppose your PR 453 is kind of a continuation of the same idea: there is a more specific discussion over there.

  3. TerminationMessagePolicy on initializer: sounds interesting, but also a bit redundant, given our current reading of initializer logs.

  4. Remove fetching of container logs from initializer

    Already covered above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues within initalizer error handling if script is incorrect
2 participants