-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel panics when single-stepping [SOLVED: KPTI #PF for kernel IRQ] #45
Comments
Thanks for the report, not sure what goes wrong here exactly. It seems the kernel says the RIP at In principle sometimes things go wrong when the kernel and libsgxstep both want to access/program the APIC timer and the kernel interrupts the libsgxstep interrupt handler.. This used to be a frequent cause of kernel crashes, but has been much improved since, see #23 It could be related to this (but I do not see any #GP), or it could be something completely different. Maybe some page-table entries are corrupted somehow(?) I'd have to investigate closer to reproduce and pinpoint this, but I won't have time for this any time soon I'm afraid -- hope the crashes are not too frequent and it is still usable for you! |
Hi, thanks so much for the suggestions! Actually I follow all the system configurations in readme (use the same kernel version and microcode version), although I still see crashes, it is more stable now (step over 100000 instructions compared to 1000). Here is some kernel log (e.g., run $ NUM=100000 STRLEN=1 make run): [ 4381.638366] BUG: unable to handle page fault for address: 0000561106405000 I observe #PF here (not #GP you mentioned). Do you have any ideas if it is caused by the same issue in #23? I am not sure because I use the latest commit which should already include the fix in #23. Thanks! |
Hi tonitick, Thanks for the additional information. This indeed seems like a bug.. (I am aware that SGX-Step can sometimes cause unpredictable crashes :/ In my experience the best to do at these points is hard rebooting the system and applying all the recommended config options for stabilization -- while it can certainly be very annoying, hopefully crashes are not too frequent and it remains usable.) Still not sure what goes on here exactly. The root of the problem abstractly speaking I think is that SGX-Step performs kernel tasks and kernel resources like page tables and timers in user space and Linux is not at all expecting that and panics when it happens to interfere at the wrong times.. That being said, the log you provided may help pinpointing this issue and hopefully find a fix. Especially the first line seems interesting:
This may indicate a misconfiguration of page table or IDT entries setup by |
FWIW: some further pointers to hopefully help narrowing this down:
So it seems to be a user-space PTE that the kernel wants to execute. My first thought: did you make sure to disable SMEP with |
Hi, thanks so much for the reply!
Yes. here is my grub parameter: The binaries can be found using the links. The corresponding kernel logs: Some other information: Please kindly let me know if there are any other information that may help. Thanks! |
Thanks for following up with additional info. My first thought: this could be an exception on the Page fault error codeAfais, it cannot be a non-present exception: From Figure 4-12. Page-Fault Error Code in Intel SDM and Page fault code 0x11:
From dmesg, CR4=00000000000606e0:
--> so then it seem it must be somehow that the address being fetched does not have the executable rights?! Instruction-fetch fault
--> so then XD=1 on one of the page-table levels somehow? Page table walkFrom dmesg above: PGD 8000000837dbb067 P4D 8000000837dbb067 PUD 85a1fa067 PMD 81b3f5067 PTE 7fbe55025; which corresponds to (using libsgxstep
--> so it seems somehow XD=1 on the top-level PGD entry for the faulting address?! Conclusion: I have no idea why PGD.XD=1, but this is clearly a violation so something is going wrong somewhere. I'd be interested to better understand which address is causing this: can you check if this is the |
So trying to further understand the dmesg output, it seems the faulting address indeed corresponds to the user-space
in the
Also, not sure why there is also the user-space segment being printed further on (I assume this is where
|
PGD/P4D on user memory range seems to set XD bit: this does _not_ cause a page fault for user mode dereferences, but somehow faults in kernel mode, even when CR4.SMEP/SMAP is cleared.
Update: I could reproduce this issue minimally on a separate branch in the commit linked above. The problem seems to be that the PGD/P4D on user memory range seems to set XD bit: this does not cause a page fault for user mode dereferences, but somehow faults in kernel mode, even when CR4.SMEP/SMAP is cleared. I am trying to find out if this x86 behavior is clearly documented, so as to write a proper patch. Interestingly the Intel SDM only includes the sentence for supervisor mode accesses:
On my machine the MWE gives:
|
I confirmed that this MWE works perfectly when rebooting with the Linux kernel options @tonitick : you could try the above kernel options as a quick patch that hopefully solves your problem -- do let me know whether it improves or solves the crashes you observed? It should also be possible to write a patch that clears the XD bits in the PUD/PGD user-space entries at runtime, but I'd first like to properly understand and confirm this x86+Linux behavior to hopefully write a proper patch :) For what it's worth, the reason why you only sometimes see this crash, is that, in my understanding, it would only get triggered when the kernel is executing during the APIC timer handler firing -- which is not the intention and normally doesn't happen, but it can ofc sometimes happen that the kernel interrupts the sgx-step application just before the apic timer fires (cf this caused the tricky bug in #23 that has since been fixed). Reference output on my machine:
|
Interestingly, digging further with
This leads to an interesting SGX-Step bug! I think I now get what's going on here:
The MWE can be explained because So, I should look into properly disabling this XD "poison" bit from In conclusion, before a proper patch is available, I'm quite confident these panics should go away by rebooting the kernel with |
Closing this for now, as the bug has been identified and can be prevented by passing the I might later on still consider implementing a kernel fix later that clears the XD "poison" bits in the PGD/P4D kernel KPTI page table, but I think for now the @tonitick Thanks again for reporting! This was an interesting bug and I'm glad I could pinpoint and fix this to hopefully have single-stepping much more stable now :) |
Hi @jovanbulck, thanks so much for the help. It works perfectly with my code. Sorry for the late reply because I was dealing with another issues in my code and just figured out that it is not related to this issue. Thanks again and I did learn a lot from your post! |
User `kpti=0` instead of `noexec=off` since several example programs make use of MARK_EXECUTE_DISABLE functionality to cause enclave page faults.
Hi, I am trying to run the single-step bench and sometimes encounter kernel bug especially when step over 1000s times. Here is an example from the kernel log:
[ 132.182650] BUG: unable to handle kernel paging request at 000055bb86c8b000
[ 132.182657] IP: 0x55bb86c8b000
[ 132.182658] PGD 80000007b65d0067 P4D 80000007b65d0067 PUD 7ad9f5067 PMD 7f8e31067 PTE 7bed47025
[ 132.182661] Oops: 0011 [#1] SMP PTI
[ 132.182663] Modules linked in: sgx_step(OE) msr thunderbolt rfcomm cmac snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic bnep intel_wmi_thunderbolt wmi_bmof arc4 intel_rapl iwlmvm x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mac80211 pcbc aesni_intel rtsx_pci_ms aes_x86_64 crypto_simd iwlwifi glue_helper memstick cryptd intel_cstate intel_rapl_perf btusb btrtl cfg80211 btbcm btintel joydev input_leds bluetooth ecdh_generic snd_hda_intel ir_rc6_decoder snd_hda_codec snd_hda_core snd_hwdep rc_rc6_mce snd_pcm ir_lirc_codec snd_seq_midi lirc_dev snd_seq_midi_event i915 snd_rawmidi ite_cir rc_core drm_kms_helper snd_seq video drm snd_seq_device snd_timer i2c_algo_bit fb_sys_fops syscopyarea acpi_pad sysfillrect mei_me snd sysimgblt wmi
[ 132.182690] mei mac_hid soundcore intel_pch_thermal sch_fq_codel binfmt_misc kvm_intel kvm isgx(OE) parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid rtsx_pci_sdmmc ahci e1000e rtsx_pci libahci
[ 132.182699] CPU: 1 PID: 3739 Comm: app Tainted: G OE 4.15.18+ #3
[ 132.182700] Hardware name: Intel Corporation NUC7i7BNH/NUC7i7BNB, BIOS BNKBL357.86A.0062.2018.0222.1644 02/22/2018
[ 132.182701] RIP: 0010:0x55bb86c8b000
[ 132.182702] RSP: 0000:ffffaac644e87ee8 EFLAGS: 00010002
[ 132.182703] RAX: 0000000000000008 RBX: 0000000000000008 RCX: 0000000000000000
[ 132.182704] RDX: ffff932c01c80000 RSI: 0000000000000008 RDI: ffffaac644e87f58
[ 132.182704] RBP: ffffaac644e87f28 R08: 0000000000000000 R09: 0000000000000000
[ 132.182705] R10: 0000000000000000 R11: 0000000000000000 R12: ffffaac644e87f58
[ 132.182706] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 132.182707] FS: 00007f34f50e4b80(0000) GS:ffff932c01c80000(0000) knlGS:0000000000000000
[ 132.182708] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 132.182709] CR2: 000055bb86c8b000 CR3: 00000007acf9a001 CR4: 00000000000606e0
[ 132.182709] Call Trace:
[ 132.182713] ? exit_to_usermode_loop+0x4f/0xd0
[ 132.182715] prepare_exit_to_usermode+0x83/0x90
[ 132.182718] retint_user+0x8/0x8
[ 132.182719] RIP: 0033:0x55bb86c8a2fd
[ 132.182720] RSP: 002b:00007ffd60331b60 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff02
[ 132.182721] RAX: 0000000000000003 RBX: 00007f34f3a76000 RCX: 000055bb86c8a2fd
[ 132.182721] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 132.182722] RBP: 00007ffd60332050 R08: 0000000000000000 R09: 0000000000000000
[ 132.182723] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 132.182723] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 132.182724] Code: Bad RIP value.
[ 132.182726] RIP: 0x55bb86c8b000 RSP: ffffaac644e87ee8
[ 132.182726] CR2: 000055bb86c8b000
[ 132.182728] ---[ end trace cad0a7670dc9a000 ]---
[ 132.182829] mm/pgtable-generic.c:40: bad pmd 00000000b3c05ac0(00000007b2884047)
Some info that may help to reproduce the bug:
commands: cd app/bench && NUM=10000 STRLEN=1 make run
kernel version: Ubuntu-4.15.0-135.139 (git://kernel.ubuntu.com/ubuntu/ubuntu-bionic.git)
cpu model: Intel(R) Core(TM) i7-7567U
kernel parameters: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nox2apic iomem=relaxed no_timer_check nosmep nosmap clearcpuid=514 isolcpus=1 nmi_watchdog=0"
Can you help to check and advise what is the potential causes of this? Thanks so much
The text was updated successfully, but these errors were encountered: