-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8s] Update waiting logic for init containers #3762
Changes from all commits
7048b0d
d1b275e
727a3cb
8221c35
fd66d03
0d45a11
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -222,6 +222,32 @@ def _wait_for_pods_to_run(namespace, new_nodes): | |
Pods may be pulling images or may be in the process of container | ||
creation. | ||
""" | ||
|
||
def _check_init_containers(pod): | ||
# Check if any of the init containers failed | ||
# to start. Could be because the init container | ||
# command failed or failed to pull image etc. | ||
for init_status in pod.status.init_container_statuses: | ||
init_terminated = init_status.state.terminated | ||
if init_terminated: | ||
if init_terminated.exit_code != 0: | ||
msg = init_terminated.message if ( | ||
init_terminated.message) else str(init_terminated) | ||
raise config_lib.KubernetesError( | ||
'Failed to run init container for pod ' | ||
f'{pod.metadata.name}. Error details: {msg}.') | ||
continue | ||
init_waiting = init_status.state.waiting | ||
if (init_waiting is not None and init_waiting.reason | ||
not in ['ContainerCreating', 'PodInitializing']): | ||
Comment on lines
+241
to
+242
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Curious, if It would be great if we can have a reference link to those states in the comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately there's no listing of state.reason available (it's marked as a str without a list of enum values). Asked an AI assistant and it listed these states:
I don't fully trust this list so I don't want to include it in comments, but given this looks like the only successful states are There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we add a TODO here saying that there might be other states we need to include here when it occurs during usage? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea - added now. |
||
# TODO(romilb): There may be more states to check for. Add | ||
# them as needed. | ||
msg = init_waiting.message if ( | ||
init_waiting.message) else str(init_waiting) | ||
raise config_lib.KubernetesError( | ||
'Failed to create init container for pod ' | ||
f'{pod.metadata.name}. Error details: {msg}.') | ||
|
||
while True: | ||
all_pods_running = True | ||
# Iterate over each pod to check their status | ||
|
@@ -246,12 +272,15 @@ def _wait_for_pods_to_run(namespace, new_nodes): | |
# See list of possible reasons for waiting here: | ||
# https://stackoverflow.com/a/57886025 | ||
waiting = container_status.state.waiting | ||
if (waiting is not None and | ||
waiting.reason != 'ContainerCreating'): | ||
raise config_lib.KubernetesError( | ||
'Failed to create container while launching ' | ||
'the node. Error details: ' | ||
f'{container_status.state.waiting.message}.') | ||
if waiting is not None: | ||
if waiting.reason == 'PodInitializing': | ||
_check_init_containers(pod) | ||
elif waiting.reason != 'ContainerCreating': | ||
msg = waiting.message if waiting.message else str( | ||
waiting) | ||
raise config_lib.KubernetesError( | ||
'Failed to create container while launching ' | ||
f'the node. Error details: {msg}.') | ||
# Reaching this point means that one of the pods had an issue, | ||
# so break out of the loop, and wait until next second. | ||
break | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we still need to check
init_waiting
ifinit_terminated
is true?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, checking
init_waiting
is not required wheninit_terminated
is not None. Updated.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, should this be
return
orcontinue
? Will there be multiple init_containers?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops good catch - should be
continue