Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Non Reportable Errors #1925

Merged

Conversation

Chris-Peterson444
Copy link
Contributor

The original attempt to make a new class of non apport reportable errors in #1897 didn't quite cover everything needed, especially in regards to the changes needed in the API to inform the client of the nuance of the error state. These changes include:

  • A refactor to make the apport generation suppression generic and tested
  • Changes to the /meta/status API response object (ApplicationStatus) to include a new field nonreportable_error
    • Reportable errors are still available under the error field, so disambiguating the error difference requires checking which of error and nonreportable_error isn't None
  • The endpoints /meta/restart (POST) and /meta/ssh_info (GET) are now available before the server has finished loading all of its controllers.
  • The server now sets the interactive state as the first step in loading the autoinstall data, not the last
  • The subiquity TUI can discern between reportable and non-reportable errors and will show a different overlay accordingly

@Chris-Peterson444
Copy link
Contributor Author

The endpoints /meta/restart (POST) and /meta/ssh_info (GET) are now available before the server has finished loading all of its controllers.

I wanted to expand on this a bit but without cluttering up the summary. This is motivated by the fact that exceptions thrown during the autoinstall loading/validation phase necessarily means the controllers won't be fully loaded. For the restart endpoint I think this is probably fine (you should be able to restart whenever you want I suppose), but the ssh_info endpoint is a little hairy. As long as the client waits until after the CLOUD_INIT_WAIT state, then this change should be fine. In the case of autoinstall errors, this is fine since we don't start loading the autoinstall until after we've got the password information from cloud-init.

I suppose one way to ensure this behavior would be to create some state variable to make the ssh_info function blocking until cloudinit has finished?

Comment on lines 102 to 109
state: ApplicationState
confirming_tty: str
error: Optional[ErrorReportRef]
nonreportable_error: Optional[NonReportableError]
cloud_init_ok: Optional[bool]
interactive: Optional[bool]
echo_syslog_id: str
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-loose I am proposing to change the response object for /meta/status to include a new field nonreportable_error, to represent errors subiquity doesn't automatically generate apport reports for. In the case that state is ApplicationState.ERROR, if the error results in an apport report then error will be populated with the reportref as before; otherwise, it will be null/none and nonreportable_error will be filled in with a NonReportableError. Does this seem okay from the u-d-b side of things? The one thing I'm concerned about is if there's any spots that always expect error to be not empty when state == ApplicationState.ERROR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads-up! I don't see any problems with this on our side. Currently any ApplicationState.ERROR is treated as fatal to the installation process. The error isn't being used anywhere so far.

@spydon FYI

Copy link
Member

@ogayot ogayot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

subiquity/ui/views/error.py Outdated Show resolved Hide resolved
subiquity/server/server.py Outdated Show resolved Hide resolved
subiquity/server/server.py Outdated Show resolved Hide resolved
subiquity/server/server.py Outdated Show resolved Hide resolved
subiquity/server/server.py Show resolved Hide resolved
subiquity/ui/views/error.py Outdated Show resolved Hide resolved
subiquity/client/controllers/progress.py Outdated Show resolved Hide resolved
subiquity/common/apidef.py Show resolved Hide resolved
subiquity/server/server.py Outdated Show resolved Hide resolved
examples/autoinstall/bad-early-command.yaml Outdated Show resolved Hide resolved
subiquity/ui/views/error.py Outdated Show resolved Hide resolved
Adds a field to the ApplicationStatus struct, nonreportable_error,
to be filled when the server enters an error state due to a
non-reportable error/exception type.
Change when the server discovers if the install is interactive or not.
This allows clients to display autoinstall errors in an interactive
way, if applicable. This also enables accessing the ssh_info endpoint
before all of the controllers are loaded. Autoinstall loading happens
after the loading cloudinit stage, so this should be accessible by then.
If a failure happens during/before cloudinit is finished, `interactive`
will still be set to `None` and clients should default to the
non-interactive case.
@Chris-Peterson444
Copy link
Contributor Author

Updated to address the feedback so far. The only thing left to do is figure out what to do about ssh_info during an early fail. Does it make sense to manually start up the network controller during an early fail? Or should the client detect ssh=True and ApplicationState=ERROR and print out that the installer stopped before the network could be configured?

@Chris-Peterson444
Copy link
Contributor Author

Chris-Peterson444 commented Mar 5, 2024

Updated to remove the special handling of the the ssh_info endpoint. The subiquity client just now skips getting the info if the server is in an error state.

Now, if the install is interactive then the dialog with the error will display to the terminal. If it's not interactive, the client will inform about the error and drop into a shell session.

Copy link
Collaborator

@dbungert dbungert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please "s/interative-sessions/interactive-sections/" which was in a previous PR.

@Chris-Peterson444
Copy link
Contributor Author

Done, thanks!

Copy link
Member

@ogayot ogayot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the changes now. Please fix the assignment of error_is_not_reportable and I will approve.

subiquity/client/controllers/progress.py Outdated Show resolved Hide resolved
Adds support for AutoinstallValidation errors, the first class
of non-reportable errors. Includes a separate error overaly to
display a warning to the user about the issue.

Changes to the server to allow restarting the installer before all
of the controllers are loaded, since the error means the controllers
won't ever be loaded. Adds special handling to the ProgressView to
change the Reboot (the machine) button to a Restart (the installer) button
for this case.
Copy link
Member

@ogayot ogayot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Chris-Peterson444 Chris-Peterson444 merged commit 8721395 into canonical:main Mar 7, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants