Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash on continue in missing package dialog #5937

Merged

Conversation

M4rtinK
Copy link
Contributor

@M4rtinK M4rtinK commented Oct 15, 2024

Due to the way how exceptions propagate from the Anaconda installation tasks we ended up with a non-critical error exception interrupting the iteration of an important task iteration loop installing the payload.

Due to not being handled "deep enough" but instead "bubbling up" to a top level error handler the loop apparently gets interrupted & remaining tasks skipped.

This resulted in an unrelated crash ("kernel version list no available") due to all the important installation tasks being skipped, packages not being installed and installation related data not being populated.

The end result was that a non-critical error (such as a missing package) would trigger a dialog asking the user to quit or continue - but clicking "continue" would result in a weird crash.

So move the error handler check closer to the task execution, to prevent the loop from being interrupted. That way the loop will resume its iteration if "continue" is clicked in the UI.

Also convert the non-critical error if it gets raised (user deciding not to continue after the non-critical error) to a fatal one. This is necessary, as otherwise the top level error handler would get triggered, asking the user again to quit or continue.

NOTE: Longer term we really should clean this up & have all installation tasks gathered, ordered and executed from a single place. Then all the error handling could be in a single place, making things much simpler.

Resolves: INSTALLER-4045
Related: rhbz#2238045
Related: RHEL-57699

Due to the way how exceptions propagate from the Anaconda installation tasks we ended
up with a non-critical error exception interrupting the iteration of an important task
iteration loop installing the payload.

Due to not being handled "deep enough" but instead "bubbling up" to a top level
error handler the loop apparently gets interrupted & remaining tasks
skipped.

This resulted in an unrelated crash ("kernel version list no available")
due to all the important installation tasks being skipped, packages not
being installed and installation related data not being populated.

The end result was that a non-critical error (such as a missing package)
would trigger a dialog asking the user to quit or continue - but clicking
"continue" would result in a weird crash.

So move the error handler check closer to the task execution, to
prevent the loop from being interrupted. That way the loop will resume
its iteration if "continue" is clicked in the UI.

Also convert the non-critical error if it gets raised (user deciding
*not* to continue after the non-critical error) to a fatal one. This is
necessary, as otherwise the top level error handler would get triggered,
asking the user again to quit or continue.

NOTE: Longer term we really should clean this up & have all installation
tasks gathered, ordered and executed from a single place. Then all the
error handling could be in a single place, making things much simpler.

Resolves: INSTALLER-4045
Related: rhbz#2238045
Related: RHEL-57699
@M4rtinK
Copy link
Contributor Author

M4rtinK commented Oct 15, 2024

This took quite a bit of time to track down what the heck is actually happening, why & what to do about it - ideally without having to change some of the tricky Anaconda innards :P Thankfully I seems to have arrive on a fairly minimal solution that seems to fix the issue as far without visibly breaking something else, as far as I can tell.

This is how it looks like - first the user will see a continue yes/no dialog - same as before, nothing new here:
missing_package_question

Clicking "Yes" should no longer crash. Clicking "No" will now show this:
missing_package_final_error

While this is a side effect of making sure the user is not asked again to continue after pressing "No" (as the top level error handler is still active) it actually IMHO looks a lot better than the previous messy exception dialog that was shown after pressing the "No" button:
missing_package_error

Feedback welcome - if you have an idea to do this better or managed to hit some edge case this has introduced, let me know! :)

@M4rtinK
Copy link
Contributor Author

M4rtinK commented Oct 15, 2024

NOTE, to easily reproduce the original issue, just create a kickstart like this:

rootpw anaconda
keyboard us
lang en_US.UTF-8
timezone America/New_York

%packages
no-such-package
%end

zerombr
clearpart --all --initlabel
autopart

And run the installation with it (by for example injecting it to an updated boot.iso via the local development scripts).

@M4rtinK
Copy link
Contributor Author

M4rtinK commented Oct 15, 2024

I'll open a RHEL 10 version of this PR once we are sure this is what we want to do to fix this issue (which is IMHO quite likely :) ).

@M4rtinK M4rtinK changed the title Fix crash on continue after a missing package non-critical error Fix crash on continue in missing package dialog Oct 15, 2024
@M4rtinK
Copy link
Contributor Author

M4rtinK commented Oct 15, 2024

/kickstart-test --skip-testtypes whatever

@M4rtinK
Copy link
Contributor Author

M4rtinK commented Oct 15, 2024

/build-image --boot.iso

Copy link

Images built based on commit 73b412f:

  • boot.iso: success

Download the images from the bottom of the job status page.

Copy link
Contributor

@rvykydal rvykydal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
I think the dialog showed on "No" is fine. Esp given how simple is the patch. (And how hard was to find it :) )

@rvykydal
Copy link
Contributor

rvykydal commented Oct 16, 2024

/kickstart-test --skip-testtypes whatever

The failed tests seem all like known flakes. When I run scripts/classify-failures (its latest version from rhinstaller/kickstart-tests#1315) on a folder containing only the failures I am getting:

[1261] https://github.com/rhinstaller/kickstart-tests/issues/1261
#: 1
./kstest-bindtomac-onboot-activate-httpks.2024_10_15-10_51_48.o69heokj/virt-install.log
11:04:11,746 WARNING anaconda:anaconda: display: Wayland startup failed: systemd exited with status 1
-------------------------------------------------------------------------------
[1296] https://github.com/rhinstaller/kickstart-tests/issues/1296
#: 1
./kstest-initial-setup-gui.2024_10_15-11_17_59.mbzbb1ch/virt-install.log
11:30:57,815 ERR anaconda:Exception ignored in atexit callback <function shutdown at 0x7f41b34c3740>:
-------------------------------------------------------------------------------
[1312] https://github.com/rhinstaller/kickstart-tests/issues/1312
#: 1
./kstest-bond-vlan-pre.2024_10_15-12_33_37.5jsg0i5g/virt-install.log
12:36:02,462 WARNING org.fedoraproject.Anaconda.Modules.Storage:gi.overrides.BlockDev.MpathError: Process reported exit code 1: Job for multipathd.service failed.

ie known issues, I guess ftp will is a flake as well.

I think we need to support the --retry option in the workflow (or run it by default). ... #5941

Copy link
Contributor

@jstodola jstodola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me from the user point of view.

@M4rtinK
Copy link
Contributor Author

M4rtinK commented Oct 16, 2024

/kickstart-test --testtype smoke

@M4rtinK M4rtinK merged commit fea233c into rhinstaller:master Oct 16, 2024
22 of 24 checks passed
@opoplawski
Copy link

Thank you very much for fixing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
f42 Fedora 42
Development

Successfully merging this pull request may close these issues.

4 participants