Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the fallback mechanism when menu entries fail to boot #195

Open
wants to merge 281 commits into
base: fedora-42
Choose a base branch
from

Conversation

chewi
Copy link

@chewi chewi commented Oct 24, 2024

@martinezjavier's c3fcfe5 commit accidentally broke the fallback mechanism, although it also exposed another bug in upstream GRUB.

We cannot rely on grub_errno in grub_menu_execute_entry() because grub_script_execute_new_scope() calls grub_print_error(), which always resets grub_errno back to GRUB_ERR_NONE. It may also get reset by grub_wait_after_message().

grub_script_execute_sourcecode() parses and executes code one line at a time, updating the return code each time because only the last line determines the final status. However, trailing new lines were also executed, masking any failure on the previous line. Fix this by only trying to execute the command when there is actually one present. A new test case is included.

See the commit messages for further details.

I can send the first change upstream, although it's unlikely to cause issues without your change, hence why it went unnoticed for so long. I gather you are starting to upstream some of your changes, so perhaps I can leave it with you?

vathpela and others added 30 commits September 25, 2024 10:59
This patch adds grub-get-kernel-settings, which reads the system kernel
installation configuration from /etc/sysconfig/kernel, and outputs
${GRUB_...} variables suitable for evaluation by grub-mkconfig.  Those
variables are then used by 10_linux to choose whether or not to create
debug stanzas.

Resolves: rhbz#1226325
[rharwood: migrate man page to h2m]
This needs to be hooked up to --program-transform=, but I haven't had
time.

Signed-off-by: Peter Jones <[email protected]>
Since our bugs tell us that the xnu boot entries really just don't work
most of the time, and they create piles of extra boot entries, because
they can't quite figure out 32-vs-64 and other stuff like that.

It's rediculous, and we should just boot their bootloader through the
chainloader instead.

So this patch does that.

Resolves: rhbz#893179

Signed-off-by: Peter Jones <[email protected]>
This patch adds the ability to specify a different root on a btrfs
filesystem too boot from other than the default one.

btrfs-list-snapshots <dev> will list the subvolumes available on the
filesystem.

set btrfs_subvol=<path> and set btrfs_subvolid=<subvolid> will specify
which subvolume to use and any pathnames provided with either of those
variables set will start using that root. If the subvolume or subvolume id
doesn't exist, then an error case will result.

It is possible to boot into a separate GRUB instance by exporting the
variable and loading the config file from the subvolume.

Signed-off-by: Jeff Mahoney <[email protected]>
current gcc complains:

  grub-core/fs/btrfs.c: In function ‘grub_cmd_btrfs_info’:
  grub-core/fs/btrfs.c:2745:7: error: the comparison will always evaluate as ‘true’ for the address of ‘label’ will never be NULL [-Werror=address]
   2745 |   if (data->sblock.label)
        |       ^~~~
  grub-core/fs/btrfs.c:92:8: note: ‘label’ declared here
     92 |   char label[0x100];
        |        ^~~~~
  cc1: all warnings being treated as errors

Obviously this check should be on the first data byte instead of the
symbol itself.

Signed-off-by: Peter Jones <[email protected]>
We should export btrfs_subvol and btrfs_subvolid to have both visible
to subsidiary configuration files loaded using configfile.

Signed-off-by: Michael Chang <[email protected]>
Signed-off-by: Michael Chang <[email protected]>
Signed-off-by: Robbie Harwood <[email protected]>
Signed-off-by: Michael Chang <[email protected]>
Signed-off-by: Robbie Harwood <[email protected]>
Signed-off-by: Michael Chang <[email protected]>
Signed-off-by: Robbie Harwood <[email protected]>
This uses grub_efi_allocate_pool(), grub_efi_free_pool(), and
grub_efi_free_pages() instead of open-coded efi_call_N() calls, so we
get more reasonable type checking.

Signed-off-by: Peter Jones <[email protected]>
This avoids syntax checkers getting confused about if it's llx or lx.

Signed-off-by: Peter Jones <[email protected]>
This should never be trying this, and since we've consolidated the
grubenv to always be on /boot/efi/EFI/fedora/, this code causes it to
always make the wrong decision.

Resolves: rhbz#1484474

Signed-off-by: Peter Jones <[email protected]>
GRUB now has BootLoaderSpec support, the user can choose to use this by
setting GRUB_ENABLE_BLSCFG to true in /etc/default/grub. On this setup,
the boot menu entries are not added to the grub.cfg, instead BLS config
files are parsed by blscfg command and the entries created dynamically.

A 10_linux_bls grub.d snippet to generate menu entries from BLS files
is also added that can be used on platforms where the bootloader doesn't
have BLS support and only can parse a normal grub configuration file.

Portions of the 10_linux_bls were taken from the ostree-grub-generator
script that's included in the OSTree project.

Fixes to support multi-devices and generate a BLS section even if no
kernels are found in the boot directory were proposed by Yclept Nemo
and Tom Gundersen respectively.

Signed-off-by: Peter Jones <[email protected]>
[javierm: remove outdated URL for BLS document]
Signed-off-by: Javier Martinez Canillas <[email protected]>
[[email protected]: skip machine ID check when updating entries]
Signed-off-by: Ian Wienand <[email protected]>
[rharwood: use sort(1), commit message composits, drop man pages]
Signed-off-by: Robbie Harwood <[email protected]>
The emu platform doesn't have a grub_backtrace() implementation, so this
causes a build error. Don't attempt to call this when building grub-emu.

Signed-off-by: Javier Martinez Canillas <[email protected]>
Signed-off-by: Peter Jones <[email protected]>
Signed-off-by: Javier Martinez Canillas <[email protected]>
[jhlavac: Use ${etcdefaultgrub} instead of /etc/default/grub]
Signed-off-by: Jan Hlavac <[email protected]>
[rharwood: skip on ostree installations, migrate man to h2m]
Signed-off-by: Robbie Harwood <[email protected]>
Signed-off-by: Peter Jones <[email protected]>
Commit 318ee04 ("make better backtraces") reworked the backtrace logic
but the changes lead to the following build error on the grub-emu platform:

grub_emu_lite-symlist.o:(.data+0xf08): undefined reference to `start'
collect2: error: ld returned 1 exit status
make[3]: *** [Makefile:25959: grub-emu-lite] Error 1
make[3]: *** Waiting for unfinished jobs....
cat kernel_syms.input | grep -v '^#' | sed -n \
  -e '/EXPORT_FUNC *([a-zA-Z0-9_]*)/{s/.*EXPORT_FUNC *(\([a-zA-Z0-9_]*\)).*/defined kernel '""'\1/;p;}' \
  -e '/EXPORT_VAR *([a-zA-Z0-9_]*)/{s/.*EXPORT_VAR *(\([a-zA-Z0-9_]*\)).*/defined kernel '""'\1/;p;}' \
  | sort -u >kernel_syms.lst

The problem is that start and _start symbols are exported unconditionally,
but these aren't defined for grub-emu since is an emultaed platform so it
doesn't have a startup logic. Don't attempt to export those for grub-emu.

Signed-off-by: Javier Martinez Canillas <[email protected]>
vathpela and others added 28 commits September 25, 2024 10:59
On mustang, our memory map looks like:

Type      Physical start  - end             #Pages        Size Attributes
reserved  0000004000000000-00000040001fffff 00000200      2MiB UC WC WT WB
conv-mem  0000004000200000-0000004393ffffff 00393e00  14654MiB UC WC WT WB
ldr-code  0000004394000000-00000043f7ffffff 00064000   1600MiB UC WC WT WB
BS-data   00000043f8000000-00000043f801ffff 00000020    128KiB UC WC WT WB
conv-mem  00000043f8020000-00000043fa15bfff 0000213c  34032KiB UC WC WT WB
ldr-code  00000043fa15c000-00000043fa2a1fff 00000146   1304KiB UC WC WT WB
ldr-data  00000043fa2a2000-00000043fa3e8fff 00000147   1308KiB UC WC WT WB
conv-mem  00000043fa3e9000-00000043fa3e9fff 00000001      4KiB UC WC WT WB
ldr-data  00000043fa3ea000-00000043fa3eafff 00000001      4KiB UC WC WT WB
ldr-code  00000043fa3eb000-00000043fa4affff 000000c5    788KiB UC WC WT WB
BS-code   00000043fa4b0000-00000043fa59ffff 000000f0    960KiB UC WC WT WB
RT-code   00000043fa5a0000-00000043fa5affff 00000010     64KiB RT UC WC WT WB
RT-data   00000043fa5b0000-00000043fa5bffff 00000010     64KiB RT UC WC WT WB
RT-code   00000043fa5c0000-00000043fa5cffff 00000010     64KiB RT UC WC WT WB
ldr-data  00000043fa5d0000-00000043fa5d0fff 00000001      4KiB UC WC WT WB
BS-code   00000043fa5d1000-00000043fa5ddfff 0000000d     52KiB UC WC WT WB
reserved  00000043fa5de000-00000043fa60ffff 00000032    200KiB UC WC WT WB
ACPI-rec  00000043fa610000-00000043fa6affff 000000a0    640KiB UC WC WT WB
ACPI-nvs  00000043fa6b0000-00000043fa6bffff 00000010     64KiB UC WC WT WB
ACPI-rec  00000043fa6c0000-00000043fa70ffff 00000050    320KiB UC WC WT WB
RT-code   00000043fa710000-00000043fa72ffff 00000020    128KiB RT UC WC WT WB
RT-data   00000043fa730000-00000043fa78ffff 00000060    384KiB RT UC WC WT WB
RT-code   00000043fa790000-00000043fa79ffff 00000010     64KiB RT UC WC WT WB
RT-data   00000043fa7a0000-00000043fa99ffff 00000200      2MiB RT UC WC WT WB
RT-code   00000043fa9a0000-00000043fa9affff 00000010     64KiB RT UC WC WT WB
RT-data   00000043fa9b0000-00000043fa9cffff 00000020    128KiB RT UC WC WT WB
BS-code   00000043fa9d0000-00000043fa9d9fff 0000000a     40KiB UC WC WT WB
reserved  00000043fa9da000-00000043fa9dbfff 00000002      8KiB UC WC WT WB
conv-mem  00000043fa9dc000-00000043fc29dfff 000018c2  25352KiB UC WC WT WB
BS-data   00000043fc29e000-00000043fc78afff 000004ed   5044KiB UC WC WT WB
conv-mem  00000043fc78b000-00000043fca01fff 00000277   2524KiB UC WC WT WB
BS-data   00000043fca02000-00000043fcea3fff 000004a2   4744KiB UC WC WT WB
conv-mem  00000043fcea4000-00000043fcea4fff 00000001      4KiB UC WC WT WB
BS-data   00000043fcea5000-00000043fd192fff 000002ee   3000KiB UC WC WT WB
conv-mem  00000043fd193000-00000043fd2b0fff 0000011e   1144KiB UC WC WT WB
BS-data   00000043fd2b1000-00000043ff80ffff 0000255f  38268KiB UC WC WT WB
BS-code   00000043ff810000-00000043ff99ffff 00000190   1600KiB UC WC WT WB
RT-code   00000043ff9a0000-00000043ff9affff 00000010     64KiB RT UC WC WT WB
conv-mem  00000043ff9b0000-00000043ff9bffff 00000010     64KiB UC WC WT WB
RT-data   00000043ff9c0000-00000043ff9effff 00000030    192KiB RT UC WC WT WB
conv-mem  00000043ff9f0000-00000043ffa05fff 00000016     88KiB UC WC WT WB
BS-data   00000043ffa06000-00000043ffffffff 000005fa   6120KiB UC WC WT WB
MMIO      0000000010510000-0000000010510fff 00000001      4KiB RT
MMIO      0000000010548000-0000000010549fff 00000002      8KiB RT
MMIO      0000000017000000-0000000017001fff 00000002      8KiB RT
MMIO      000000001c025000-000000001c025fff 00000001      4KiB RT

This patch adds a requirement when we're trying to find the base of ram, that
the memory we choose is actually /allocatable/ conventional memory, not merely
write-combining.  On this machine that means we wind up with an allocation
around 0x4392XXXXXX, which is a reasonable address.

This also changes grub_efi_allocate_pages_real() so that if 0 is allocated, it
tries to allocate again starting with the same max address it did the first
time, rather than interposing GRUB_EFI_MAX_USABLE_ADDRESS there, so that any
per-platform constraints on its given address are maintained.

Signed-off-by: Peter Jones <[email protected]>
We are currently allocating just enough memory for the file size,
which means that the kernel BSS is in limbo (and not even zeroed).

We are also not honoring the alignment specified in the image
PE header.

This makes us use the PE optional header in which the kernel puts the
actual size it needs, including BSS, and make sure we clear it, and
honors the specified alignment for the image.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
[pjones: arm: check for the PE magic for the compiled arch]
Signed-off-by: Peter Jones <[email protected]>
Signed-off-by: Robbie Harwood <[email protected]>
In order to properly validate a loaded kernel's support for being loaded
without a writable stack or executable, we need to be able to properly
parse arbitrary PE headers.

Currently, pe32.h is written in such a way that the MS-DOS header that
tells us where to find the PE header in the binary can't be accessed.
Further, for some reason it calls the DOS MZ magic "GRUB_PE32_MAGIC".

This patch adds the structure for the DOS header, renames the DOS magic
define, and adds defines for the actual PE magic.

Signed-off-by: Peter Jones <[email protected]>
The aarch64 loader doesn't use efi bootservices, and
therefor it has a very minimal loader which makes a lot
of assumptions about the kernel layout. With the ZBOOT
changes, the layout has changed a bit and we not should
really be parsing the PE sections to determine how much
data to copy, otherwise the BSS won't be setup properly.

This code still makes a lot of assumptions about the
the kernel layout, so its far from ideal, but it works.

Resolves: rhbz#2125069

Signed-off-by: Jeremy Linton <[email protected]>
Currently, the kernel pages are allocated with type EFI_LOADER_DATA.
While the vast majority of systems will happily execute code from those
pages (i.e. don't care about memory protection), the Microsoft Surface
Pro X stalls, as this memory is not designated as "executable".

Therefore, allocate the kernel pages as EFI_LOADER_CODE to request
memory that is actually executable.

Signed-off-by: Maximilian Luz <[email protected]>
Commit de735a4 added a grub_env_set where the prefix contains
the arch name in the pathname. This create issues when trying to
load modules using this prefix as the pathname contains a "doubled"
arch name in it (ie .../powerpc-ieee1275/powerpc-ieee1275/).

Signed-off-by: Nicolas Frayer <[email protected]>
The directory extent list does not have to be a continuous list of data
blocks. When GRUB tries to read a non-existant member of the list,
grub_xfs_read_file() will return a block of zero'ed memory. Checking for
a zero'ed magic number is sufficient to skip this non-existant data block.

Prior to commit 07318ee (fs/xfs: Fix XFS directory extent parsing)
this was handled as a subtle side effect of reading the (non-existant)
tail data structure. Since the block was zero'ed the computation of the
number of directory entries in the block would return 0 as well.

Fixes: 07318ee (fs/xfs: Fix XFS directory extent parsing)
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2254370

Signed-off-by: Jon DeVree <[email protected]>
Reviewed-By: Vladimir Serbinenko <[email protected]>
Reviewed-by: Daniel Kiper <[email protected]>
Add -Wno-incompatible-pointer-types to ignore warnings for incompatible
types

Signed-off-by: Nicolas Frayer <[email protected]>
The initial fix implemented a new flag that forces the grub cfg
stub to be located on the same disk as grub. This created several
issues such as RAID machines not being able to boot as their
partition names under grub were different from the partition where
grub is located. It also simply means that any machines with the
/boot partition located on a disk other than the one containing grub
won't boot.
This commit denies booting if the grub cfg stub is located on a USB
drive with a duplicated UUID (UUID being the same as the partition
containing the actual grub cfg stub)

Signed-off-by: Nicolas Frayer <[email protected]>
For NX, our kernel loaders need to set write and execute page
permissions on allocated pages and the stack.

This patch adds those calls.

Signed-off-by: Peter Jones <[email protected]>
[rharwood: fix stack_attrs undefined, fix aarch64 callsites]
Signed-off-by: Robbie Harwood <[email protected]>
These can be used to register a different implementation later,
for example, when shim provides a protocol with those functions.

Signed-off-by: Mate Kukri <[email protected]>
For those manual assembly files created where no '.note.GNU-stack'
section is explicitly added, linker defaults it as executable and this
is the reason that RHEL CI rpminspect & annocheck tests are
failing. The proposed change sets the corresponding GNU-stack
sections otherwise CI detects the following security vulnerability

   $ annocheck annocheck --ignore-unknown --verbose --profile=el9 *.rpm 2>&1 | grep FAIL | grep stack
   (standard input):(standard input):Hardened: ./usr/lib/grub/x86_64-efi/kernel.exec: FAIL: gnu-stack test because .note.GNU-stack section has execute permission
   (standard input):(standard input):Hardened: ./usr/lib/grub/x86_64-efi/kernel.img: FAIL: gnu-stack test because .note.GNU-stack section has execute permission

Signed-off-by: Leo Sandoval <[email protected]>
Stricker permissions are required on the grub.cfg file, resulting in
at most 0600 owner's file permissions. This resolves conflicting
requirement permissions on grub2-pc package's grub2.cfg file.

Signed-off-by: Leo Sandoval <[email protected]>
This patch adds support for Radix, Xive and Radix_gtse in Options
vector5 which is required for KVM LPARs. KVM LPARs ONLY support
Radix and not the Hash. Not enabling Radix on any PowerVM KVM LPARs
will result in boot failure.

Signed-off-by: Avnish Chouhan <[email protected]>
Reviewed-by: Daniel Kiper <[email protected]>
/boot/efi/EFI/$os_name/grub.cfg contains a grub cfg stub
that should not be overwritten by grub2-mkconfig.
Ensure that we prevent this from happening.

Signed-off-by: Marta Lewandowska <[email protected]>
Signed-off-by: Nicolas Frayer <[email protected]>
Remove mountpoint when checking whether or not the grub cfg stub
exists and add -s to the test. This should cover scenarios where
the ESP doesn't have a seperate partition but still uses a grub
cfg stub

Signed-off-by: Nicolas Frayer <[email protected]>
These two commands may fail interrumping the normal boot process,
so placing these inside test expressions is a safer approach.

Resolves: #2305291
Suggested-by: Kan-Ru Chen: <[email protected]>
Signed-off-by: Leo Sandoval <[email protected]>
Call grub_efi_check_nx_required() passing it nx_required to
assign the correct value

Signed-off-by: Nicolas Frayer <[email protected]>
… boot

Commit 972aa68 ("Make a "gdb" dprintf that tells us load addresses.")
added some debug prints to help with running gdb against grub.

Besides adding a new grub_dl_print_gdb_info () function which uses
`grub_qdprintf ("gdb", ...);` it also adds a new grub_efi_print_gdb_info ()
call to grub_efi_init ().

grub_efi_print_gdb_info () is intended for the gdbinfo command and uses
a non debug grub_printf () call leading to grub now always printing this
message during boot breaking flicker-free boot.

Add a new "debug" parameter to grub_efi_print_gdb_info () and use
`grub_qdprintf ("gdb", ...);` when this is set to silence the printing
done from grub_efi_init () when debugging is not enabled.

Fixes: 972aa68 ("Make a "gdb" dprintf that tells us load addresses.")
Signed-off-by: Hans de Goede <[email protected]>
The calculation of the size of the table was incorrect (copy/pasta from
grub_acpi_rsdt_find_table() I assume...). The entries are 64-bit long.

This causes us to access beyond the end of the table which is causing
crashes during boot on some systems. Typically this is causing a crash
on VMWare when using UEFI and enabling serial autodetection, as

grub_acpi_find_table (GRUB_ACPI_SPCR_SIGNATURE);

Will goes past the end of the table (the SPCR table doesn't exits)

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
When querying about a partition UUID, we're not checking
for get_device_uuid() return value, which can possibly
result in dereferencing a NULL pointer.

Signed-off-by: Nicolas Frayer <[email protected]>
Co-authored-by: Chuong Tran <[email protected]>
grub_script_execute_sourcecode() parses and executes code one line at a
time, updating the return code each time because only the last line
determines the final status. However, trailing new lines were also
executed, masking any failure on the previous line. Fix this by only
trying to execute the command when there is actually one present.

This has presumably never been noticed because this code is not used by
regular functions, only in special cases like eval and menu entries. The
latter generally don't return at all, having booted an OS. When failing
to boot, upstream GRUB triggers the fallback mechanism regardless of the
return code.

We noticed the problem while using Red Hat's patches, which change this
behaviour to take account of the return code. In that case, a failure
takes you back to the menu rather than triggering a fallback.

Signed-off-by: James Le Cuirot <[email protected]>
Don't rely on grub_errno here because grub_script_execute_new_scope()
calls grub_print_error(), which always resets grub_errno back to
GRUB_ERR_NONE. It may also get reset by grub_wait_after_message().

This problem was observed when a "bad signature" error resulted in the
menu being redisplayed rather than the fallback mechanism being
triggered, although another change was also needed to fix it. This only
happens with Red Hat's patches because upstream GRUB triggers the
fallback mechanism regardless of the return code.

Signed-off-by: James Le Cuirot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.