Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PCI/ASPM] Revert commit 456d8aa to avoid kernel panics in 6.1.94 #448

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

arista-nwolfe
Copy link

@arista-nwolfe arista-nwolfe commented Dec 11, 2024

Fixing: sonic-net/sonic-buildimage#20901
Reverting: torvalds/linux@456d8aa

After the 6.1.94 kernel bump up torvalds/linux@456d8aa was absorbed.
This change can cause kernel panics during reboot.
The correct fix/solution is still under discussion so for now working around this by reverting the commit

Here is a snippet of the kernel panic (full logs can be found in the issue referenced above):

2024 Nov 14 23:21:46.345708 str2-7804-sup-1 WARNING kernel: [ 1286.839869] general protection fault, probably for non-canonical address 0x32b727d667b7999a: 0000 [#1] PREEMPT SMP PTI
2024 Nov 14 23:21:51.054518 str2-7804-sup-1 WARNING kernel: [ 1286.968107] CPU: 11 PID: 151 Comm: irq/46-pciehp Tainted: G           OE      6.1.0-22-2-amd64 #1  Debian 6.1.94-1
2024 Nov 14 23:21:51.054538 str2-7804-sup-1 WARNING kernel: [ 1287.092181] Hardware name: Intel Camelback Mountain CRB/Camelback Mountain CRB, BIOS Aboot-norcal7-7.1.4-14169220 11/09/2019
2024 Nov 14 23:21:51.054540 str2-7804-sup-1 WARNING kernel: [ 1287.226668] RIP: 0010:pcie_config_aspm_link+0x48/0x330
2024 Nov 14 23:21:51.054541 str2-7804-sup-1 WARNING kernel: [ 1287.288242] Code: 48 8b 04 25 28 00 00 00 48 89 44 24 30 31 c0 8b 47 30 4c 8b 47 08 83 e3 7f c1 e8 0e f7 d3 89 c2 83 e0 7f 21 c3 83 e2 7f 21 f3 <41> 8b b6 a0 00 00 00 89 d8 83 e0 87 f6 c3 04 0f 44 d8 0f b7 47 30
2024 Nov 14 23:21:51.054543 str2-7804-sup-1 WARNING kernel: [ 1287.513355] RSP: 0000:ffffa81a0053bcb8 EFLAGS: 00010246
2024 Nov 14 23:21:51.054544 str2-7804-sup-1 WARNING kernel: [ 1287.575967] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
2024 Nov 14 23:21:51.054545 str2-7804-sup-1 WARNING kernel: [ 1287.661493] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9a41c6c35480
2024 Nov 14 23:21:51.054546 str2-7804-sup-1 WARNING kernel: [ 1287.747022] RBP: ffff9a41c6c35480 R08: ffff9a424d08bf49 R09: ffffa81a0053bc6c
2024 Nov 14 23:21:51.054547 str2-7804-sup-1 WARNING kernel: [ 1287.832549] R10: 0000000000000000 R11: 0000000000000004 R12: ffff9a41c1016000
2024 Nov 14 23:21:51.054548 str2-7804-sup-1 WARNING kernel: [ 1287.918078] R13: ffff9a41c5435028 R14: 32b727d667b798fa R15: ffff9a41c0ec3920
2024 Nov 14 23:21:51.054549 str2-7804-sup-1 WARNING kernel: [ 1288.003606] FS:  0000000000000000(0000) GS:ffff9a50ffcc0000(0000) knlGS:0000000000000000
2024 Nov 14 23:21:51.054550 str2-7804-sup-1 WARNING kernel: [ 1288.100593] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024 Nov 14 23:21:51.054550 str2-7804-sup-1 WARNING kernel: [ 1288.169454] CR2: 00007fb55fdf5030 CR3: 0000000101044001 CR4: 00000000003706e0
2024 Nov 14 23:21:51.054551 str2-7804-sup-1 WARNING kernel: [ 1288.254982] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2024 Nov 14 23:21:51.054552 str2-7804-sup-1 WARNING kernel: [ 1288.340509] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2024 Nov 14 23:21:51.054553 str2-7804-sup-1 WARNING kernel: [ 1288.426039] Call Trace:
2024 Nov 14 23:21:51.054554 str2-7804-sup-1 WARNING kernel: [ 1288.455317]  <TASK>
2024 Nov 14 23:21:51.054555 str2-7804-sup-1 WARNING kernel: [ 1288.480430]  ? __die_body.cold+0x1a/0x1f
2024 Nov 14 23:21:51.054555 str2-7804-sup-1 WARNING kernel: [ 1288.527428]  ? die_addr+0x38/0x60
2024 Nov 14 23:21:51.054556 str2-7804-sup-1 WARNING kernel: [ 1288.567128]  ? exc_general_protection+0x221/0x4a0
2024 Nov 14 23:21:51.054557 str2-7804-sup-1 WARNING kernel: [ 1288.623496]  ? asm_exc_general_protection+0x22/0x30
2024 Nov 14 23:21:51.054558 str2-7804-sup-1 WARNING kernel: [ 1288.681954]  ? pcie_config_aspm_link+0x48/0x330
2024 Nov 14 23:21:51.054559 str2-7804-sup-1 WARNING kernel: [ 1288.736243]  pcie_aspm_exit_link_state+0xb9/0x120
2024 Nov 14 23:21:51.054559 str2-7804-sup-1 WARNING kernel: [ 1288.792612]  pci_remove_bus_device+0xc8/0x110
2024 Nov 14 23:21:51.054560 str2-7804-sup-1 WARNING kernel: [ 1288.844818]  pci_remove_bus_device+0x2e/0x110
2024 Nov 14 23:21:51.054561 str2-7804-sup-1 WARNING kernel: [ 1288.897026]  pci_remove_bus_device+0x3e/0x110
2024 Nov 14 23:21:51.054562 str2-7804-sup-1 WARNING kernel: [ 1288.949234]  pciehp_unconfigure_device+0x94/0x160
2024 Nov 14 23:21:51.054563 str2-7804-sup-1 WARNING kernel: [ 1289.005609]  pciehp_disable_slot+0x69/0x100
2024 Nov 14 23:21:51.054564 str2-7804-sup-1 WARNING kernel: [ 1289.055731]  pciehp_handle_presence_or_link_change+0x241/0x350
2024 Nov 14 23:21:51.054564 str2-7804-sup-1 WARNING kernel: [ 1289.125642]  pciehp_ist+0x164/0x170
2024 Nov 14 23:21:51.054575 str2-7804-sup-1 WARNING kernel: [ 1289.167433]  ? disable_irq_nosync+0x10/0x10
2024 Nov 14 23:21:51.054577 str2-7804-sup-1 WARNING kernel: [ 1289.217548]  irq_thread_fn+0x1f/0x60
2024 Nov 14 23:21:51.054578 str2-7804-sup-1 WARNING kernel: [ 1289.260374]  irq_thread+0xfa/0x1c0
2024 Nov 14 23:21:51.054578 str2-7804-sup-1 WARNING kernel: [ 1289.301116]  ? irq_thread_fn+0x60/0x60
2024 Nov 14 23:21:51.054579 str2-7804-sup-1 WARNING kernel: [ 1289.346024]  ? irq_thread_check_affinity+0xf0/0xf0
2024 Nov 14 23:21:51.054580 str2-7804-sup-1 WARNING kernel: [ 1289.403432]  kthread+0xda/0x100
2024 Nov 14 23:21:51.054584 str2-7804-sup-1 WARNING kernel: [ 1289.441043]  ? kthread_complete_and_exit+0x20/0x20
2024 Nov 14 23:21:51.054585 str2-7804-sup-1 WARNING kernel: [ 1289.498448]  ret_from_fork+0x22/0x30
2024 Nov 14 23:21:51.054585 str2-7804-sup-1 WARNING kernel: [ 1289.541273]  </TASK>
2024 Nov 14 23:21:51.054586 str2-7804-sup-1 WARNING kernel: [ 1289.567422] Modules linked in: nft_meta_bridge(E) 8021q(E) garp(E) mrp(E) lm75(E) linux_ngbde(OE) linux_knet_cb(OE) linux_bcm_knet(OE) psample(E) linux_user_bde(OE) linux_kernel_bde(OE) xt_hl(E) xt_tcpudp(E) ip6_tables(E) xt_conntrack(E) ebt_vlan(E) nft_compat(E) nf_tables(E) tmp468(OE) amax31790(OE) veth(E) pmbus(E) pmbus_core(E) nf_conntrack_netlink(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xfrm_user(E) i2c_mux_pca9541(E) i2c_mux(E) optoe(E) lm90(E) at24(E) regmap_i2c(E) scd_hwmon(OE) i2c_dev(E) eeprom(E) bridge(E) stp(E) llc(E) nvme_fabrics(E) binfmt_misc(E) intel_rapl_msr(E) intel_rapl_common(E) intel_uncore_frequency(E) intel_uncore_frequency_common(E) sb_edac(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) bonding(E) tls(E) irqbypass(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha512_generic(E) sha256_ssse3(E) sha1_ssse3(E) aesni_intel(E) crypto_simd(E) cryptd(E) rapl(E) intel_cstate(E) intel_uncore(E) iTCO_wdt(E) evdev(E)
2024 Nov 14 23:21:51.054588 str2-7804-sup-1 WARNING kernel: [ 1289.567494]  ofpart(E) intel_pmc_bxt(E) scd(OE) spi_nor(E) iTCO_vendor_support(E) pcspkr(E) mtd(E) intel_pch_thermal(E) uio(E) watchdog(E) sg(E) ioatdma(E) button(E) nfnetlink(E) fuse(E) efi_pstore(E) dm_mod(E) drm(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) loop(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) zstd(E) zstd_compress(E) nvme(E) nvme_core(E) nls_utf8(E) nls_cp437(E) nls_ascii(E) vfat(E) fat(E) overlay(E) squashfs(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) ahci(E) libahci(E) ixgbe(E) xhci_pci(E) crct10dif_pclmul(E) spi_intel_platform(E) xfrm_algo(E) crct10dif_common(E) spi_intel(E) gpio_ich(E) libata(E) ehci_pci(E) dca(E) crc32_pclmul(E) xhci_hcd(E) ehci_hcd(E) mdio_devres(E) of_mdio(E) crc32c_intel(E) i2c_i801(E) scsi_mod(E) lpc_ich(E) fixed_phy(E) i2c_smbus(E) scsi_common(E) usbcore(E) tg3(E) fwnode_mdio(E) usb_common(E) libphy(E) mdio(E)
2024 Nov 14 23:21:51.054592 str2-7804-sup-1 WARNING kernel: [ 1291.578230] sched: RT throttling activated
2024 Nov 14 23:21:51.103876 str2-7804-sup-1 WARNING kernel: [ 1291.578551] ---[ end trace 0000000000000000 ]---
2024 Nov 14 23:21:51.220783 str2-7804-sup-1 WARNING kernel: [ 1291.682963] RIP: 0010:pcie_config_aspm_link+0x48/0x330

There is some upstream discussion about whether the commit torvalds/linux@456d8aa is complete discussed here: https://lore.kernel.org/linux-pci/20240801171103.GA107989@bhelgaas/T/#t

@arista-nwolfe arista-nwolfe requested a review from a team as a code owner December 11, 2024 02:28
Copy link
Contributor

@paulmenzel paulmenzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your path.

  1. Please use imperative mood: Revert
  2. It’d be great if you mentioned the subsystem in the summary/title.
  3. Please add an excerpt of the panic to the commit message.

From: Nathan Wolfe <[email protected]>
Date: Tue, 8 Oct 2024 11:57:26 -0700
Subject: [PATCH] revert 456d8aa to fix pcie_aspm_exit_link_status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a summary, and also a reference to the upstream discussion.

@arista-nwolfe arista-nwolfe changed the title Reverting commit 456d8aa to avoid kernel panics in 6.1.94 Revert commit 456d8aa to avoid kernel panics in 6.1.94 Dec 11, 2024
@arista-nwolfe arista-nwolfe changed the title Revert commit 456d8aa to avoid kernel panics in 6.1.94 [PCIE_ASPM] Revert commit 456d8aa to avoid kernel panics in 6.1.94 Dec 11, 2024
@arista-nwolfe arista-nwolfe changed the title [PCIE_ASPM] Revert commit 456d8aa to avoid kernel panics in 6.1.94 [PCI/ASPM] Revert commit 456d8aa to avoid kernel panics in 6.1.94 Dec 11, 2024
@rlhui rlhui requested a review from saiarcot895 December 11, 2024 18:09
@rlhui rlhui added the P0 label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants