Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kick start to use Aurora Driver #1

Open
wanting1993 opened this issue Oct 22, 2019 · 0 comments
Open

Kick start to use Aurora Driver #1

wanting1993 opened this issue Oct 22, 2019 · 0 comments

Comments

@wanting1993
Copy link

wanting1993 commented Oct 22, 2019

Hi, i am a newbie who kick start to use aurora driver in working project. i already included the NET_Vendor_AURORA and AURORA_NB8800 in menuconfig. Bitbake a new Image and .dtb
and flashed into the V3M. May i know there is any user application to test on with the Aurora driver? or any method to check the driver is working on V3M?
Thanks.

sys-oak pushed a commit that referenced this issue Oct 22, 2019
This patch fixes the lock inversion complaint:

============================================
WARNING: possible recursive locking detected
5.3.0-rc7-dbg+ #1 Not tainted
--------------------------------------------
kworker/u16:6/171 is trying to acquire lock:
00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]

but task is already holding lock:
00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&id_priv->handler_mutex);
  lock(&id_priv->handler_mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/u16:6/171:
 #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0
 #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0
 #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]

stack backtrace:
CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: iw_cm_wq cm_work_handler [iw_cm]
Call Trace:
 dump_stack+0x8a/0xd6
 __lock_acquire.cold+0xe1/0x24d
 lock_acquire+0x106/0x240
 __mutex_lock+0x12e/0xcb0
 mutex_lock_nested+0x1f/0x30
 rdma_destroy_id+0x78/0x4a0 [rdma_cm]
 iw_conn_req_handler+0x5c9/0x680 [rdma_cm]
 cm_work_handler+0xe62/0x1100 [iw_cm]
 process_one_work+0x56d/0xac0
 worker_thread+0x7a/0x5d0
 kthread+0x1bc/0x210
 ret_from_fork+0x24/0x30

This is not a bug as there are actually two lock classes here.

Link: https://lore.kernel.org/r/[email protected]
Fixes: de910bd ("RDMA/cma: Simplify locking needed for serialization of callbacks")
Signed-off-by: Bart Van Assche <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 22, 2019
A user reported a lockdep splat

 ======================================================
 WARNING: possible circular locking dependency detected
 5.2.11-gentoo #2 Not tainted
 ------------------------------------------------------
 kswapd0/711 is trying to acquire lock:
 000000007777a663 (sb_internal){.+.+}, at: start_transaction+0x3a8/0x500

but task is already holding lock:
 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (fs_reclaim){+.+.}:
 kmem_cache_alloc+0x1f/0x1c0
 btrfs_alloc_inode+0x1f/0x260
 alloc_inode+0x16/0xa0
 new_inode+0xe/0xb0
 btrfs_new_inode+0x70/0x610
 btrfs_symlink+0xd0/0x420
 vfs_symlink+0x9c/0x100
 do_symlinkat+0x66/0xe0
 do_syscall_64+0x55/0x1c0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (sb_internal){.+.+}:
 __sb_start_write+0xf6/0x150
 start_transaction+0x3a8/0x500
 btrfs_commit_inode_delayed_inode+0x59/0x110
 btrfs_evict_inode+0x19e/0x4c0
 evict+0xbc/0x1f0
 inode_lru_isolate+0x113/0x190
 __list_lru_walk_one.isra.4+0x5c/0x100
 list_lru_walk_one+0x32/0x50
 prune_icache_sb+0x36/0x80
 super_cache_scan+0x14a/0x1d0
 do_shrink_slab+0x131/0x320
 shrink_node+0xf7/0x380
 balance_pgdat+0x2d5/0x640
 kswapd+0x2ba/0x5e0
 kthread+0x147/0x160
 ret_from_fork+0x24/0x30

other info that might help us debug this:

 Possible unsafe locking scenario:

 CPU0 CPU1
 ---- ----
 lock(fs_reclaim);
 lock(sb_internal);
 lock(fs_reclaim);
 lock(sb_internal);
*** DEADLOCK ***

 3 locks held by kswapd0/711:
 #0: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
 #1: 000000004a5100f8 (shrinker_rwsem){++++}, at: shrink_node+0x9a/0x380
 #2: 00000000f956fa46 (&type->s_umount_key#30){++++}, at: super_cache_scan+0x35/0x1d0

stack backtrace:
 CPU: 7 PID: 711 Comm: kswapd0 Not tainted 5.2.11-gentoo #2
 Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.4.2 09/29/2017
 Call Trace:
 dump_stack+0x85/0xc7
 print_circular_bug.cold.40+0x1d9/0x235
 __lock_acquire+0x18b1/0x1f00
 lock_acquire+0xa6/0x170
 ? start_transaction+0x3a8/0x500
 __sb_start_write+0xf6/0x150
 ? start_transaction+0x3a8/0x500
 start_transaction+0x3a8/0x500
 btrfs_commit_inode_delayed_inode+0x59/0x110
 btrfs_evict_inode+0x19e/0x4c0
 ? var_wake_function+0x20/0x20
 evict+0xbc/0x1f0
 inode_lru_isolate+0x113/0x190
 ? discard_new_inode+0xc0/0xc0
 __list_lru_walk_one.isra.4+0x5c/0x100
 ? discard_new_inode+0xc0/0xc0
 list_lru_walk_one+0x32/0x50
 prune_icache_sb+0x36/0x80
 super_cache_scan+0x14a/0x1d0
 do_shrink_slab+0x131/0x320
 shrink_node+0xf7/0x380
 balance_pgdat+0x2d5/0x640
 kswapd+0x2ba/0x5e0
 ? __wake_up_common_lock+0x90/0x90
 kthread+0x147/0x160
 ? balance_pgdat+0x640/0x640
 ? __kthread_create_on_node+0x160/0x160
 ret_from_fork+0x24/0x30

This is because btrfs_new_inode() calls new_inode() under the
transaction.  We could probably move the new_inode() outside of this but
for now just wrap it in memalloc_nofs_save().

Reported-by: Zdenek Sojka <[email protected]>
Fixes: 712e36c ("btrfs: use GFP_KERNEL in btrfs_alloc_inode")
CC: [email protected] # 4.16+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 22, 2019
Call uart_unregister_driver() conditionally instead of
unconditionally, only if it has been previously registered.

This uses driver.state, just as the sh-sci.c driver does.

Fixes this null pointer dereference in tty_unregister_driver(),
since the 'driver' argument is null:

  general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
  RIP: 0010:tty_unregister_driver+0x25/0x1d0

Fixes: 238b872 ("[PATCH] serial uartlite driver")
Signed-off-by: Randy Dunlap <[email protected]>
Cc: stable <[email protected]>
Cc: Peter Korsgaard <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 22, 2019
When using virtual engines, the rq->engine is not stable until we hold
the engine->active.lock (as the virtual engine may be exchanged with the
sibling). Since commit 22b7a42 ("drm/i915/execlists: Preempt-to-busy")
we may retire a request concurrently with resubmitting it to HW, we need
to be extra careful to verify we are holding the correct lock for the
request's active list. This is similar to the issue we saw with
rescheduling the virtual requests, see sched_lock_engine().

Or else:

<4> [876.736126] list_add corruption. prev->next should be next (ffff8883f931a1f8), but was dead000000000100. (prev=ffff888361ffa610).
<4> [876.736136] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [876.736137] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
<4> [876.736154] CPU: 2 PID: 21 Comm: ksoftirqd/2 Tainted: G     U            5.3.0-CI-CI_DRM_6898+ #1
<4> [876.736156] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [876.736157] RIP: 0010:__list_add_valid+0x4d/0x70
<4> [876.736159] Code: c3 48 89 d1 48 c7 c7 20 33 0e 82 48 89 c2 e8 4a 4a bc ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 70 33 0e 82 e8 33 4a bc ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 c0 33 0e 82 e8
<4> [876.736160] RSP: 0018:ffffc9000018bd30 EFLAGS: 00010082
<4> [876.736162] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000104
<4> [876.736163] RDX: 0000000080000104 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [876.736164] RBP: ffffc9000018bd68 R08: 0000000000000000 R09: 0000000000000001
<4> [876.736165] R10: 00000000aed95de3 R11: 000000007fe927eb R12: ffff888361ffca10
<4> [876.736166] R13: ffff888361ffa610 R14: ffff888361ffc880 R15: ffff8883f931a1f8
<4> [876.736168] FS:  0000000000000000(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
<4> [876.736169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [876.736170] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
<4> [876.736171] PKRU: 55555554
<4> [876.736172] Call Trace:
<4> [876.736226]  __i915_request_submit+0x152/0x370 [i915]
<4> [876.736263]  __execlists_submission_tasklet+0x6da/0x1f50 [i915]
<4> [876.736293]  ? execlists_submission_tasklet+0x29/0x50 [i915]
<4> [876.736321]  execlists_submission_tasklet+0x34/0x50 [i915]
<4> [876.736325]  tasklet_action_common.isra.5+0x47/0xb0
<4> [876.736328]  __do_softirq+0xd8/0x4ae
<4> [876.736332]  ? smpboot_thread_fn+0x23/0x280
<4> [876.736334]  ? smpboot_thread_fn+0x6b/0x280
<4> [876.736336]  run_ksoftirqd+0x2b/0x50
<4> [876.736338]  smpboot_thread_fn+0x1d3/0x280
<4> [876.736341]  ? sort_range+0x20/0x20
<4> [876.736343]  kthread+0x119/0x130
<4> [876.736345]  ? kthread_park+0xa0/0xa0
<4> [876.736347]  ret_from_fork+0x24/0x50
<4> [876.736353] irq event stamp: 2290145
<4> [876.736356] hardirqs last  enabled at (2290144): [<ffffffff8123cde8>] __slab_free+0x3e8/0x500
<4> [876.736358] hardirqs last disabled at (2290145): [<ffffffff819cfb4d>] _raw_spin_lock_irqsave+0xd/0x50
<4> [876.736360] softirqs last  enabled at (2290114): [<ffffffff81c0033e>] __do_softirq+0x33e/0x4ae
<4> [876.736361] softirqs last disabled at (2290119): [<ffffffff810b815b>] run_ksoftirqd+0x2b/0x50
<4> [876.736363] WARNING: CPU: 2 PID: 21 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [876.736364] ---[ end trace 3e58d6c7356c65bf ]---
<4> [876.736406] ------------[ cut here ]------------
<4> [876.736415] list_del corruption. prev->next should be ffff888361ffca10, but was ffff88840ac2c730
<4> [876.736421] WARNING: CPU: 2 PID: 5490 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90
<4> [876.736422] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e cdc_ether usbnet mii snd_pcm ptp pps_core mei_me mei prime_numbers btusb btrtl btbcm btintel bluetooth ecdh_generic ecc [last unloaded: i915]
<4> [876.736433] CPU: 2 PID: 5490 Comm: i915_selftest Tainted: G     U  W         5.3.0-CI-CI_DRM_6898+ #1
<4> [876.736435] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [876.736436] RIP: 0010:__list_del_entry_valid+0x79/0x90
<4> [876.736438] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 30 34 0e 82 e8 ae 49 bc ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 68 34 0e 82 e8 97 49 bc ff <0f> 0b 31 c0 c3 48 c7 c7 a8 34 0e 82 e8 86 49 bc ff 0f 0b 31 c0 c3
<4> [876.736439] RSP: 0018:ffffc900003ef758 EFLAGS: 00010086
<4> [876.736440] RAX: 0000000000000000 RBX: ffff888361ffc840 RCX: 0000000000000002
<4> [876.736442] RDX: 0000000080000002 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [876.736443] RBP: ffffc900003ef780 R08: 0000000000000000 R09: 0000000000000001
<4> [876.736444] R10: 000000001418e4b7 R11: 000000007f0ea93b R12: ffff888361ffcab8
<4> [876.736445] R13: ffff88843b6d0000 R14: 000000000000217c R15: 0000000000000001
<4> [876.736447] FS:  00007f4e6f255240(0000) GS:ffff88849fd00000(0000) knlGS:0000000000000000
<4> [876.736448] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [876.736449] CR2: 00007f093a9173c0 CR3: 00000003bba08005 CR4: 0000000000760ee0
<4> [876.736450] PKRU: 55555554
<4> [876.736451] Call Trace:
<4> [876.736488]  i915_request_retire+0x224/0x8e0 [i915]
<4> [876.736521]  i915_request_create+0x4b/0x1b0 [i915]
<4> [876.736550]  nop_virtual_engine+0x230/0x4d0 [i915]

Fixes: 22b7a42 ("drm/i915/execlists: Preempt-to-busy")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111695
Signed-off-by: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Matthew Auld <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 37fa0de)
Signed-off-by: Rodrigo Vivi <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 22, 2019
Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
calling CPU in an infinite loop, but with interrupts and preemption
enabled.  From this state, userspace can continue to be scheduled,
despite the system being "dead" as far as the kernel is concerned.

This is easily reproducible on arm64 when booting with "nosmp" on the
command line; a couple of shell scripts print out a periodic "Ping"
message whilst another triggers a crash by writing to
/proc/sysrq-trigger:

  | sysrq: Trigger a crash
  | Kernel panic - not syncing: sysrq triggered crash
  | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
  | Hardware name: linux,dummy-virt (DT)
  | Call trace:
  |  dump_backtrace+0x0/0x148
  |  show_stack+0x14/0x20
  |  dump_stack+0xa0/0xc4
  |  panic+0x140/0x32c
  |  sysrq_handle_reboot+0x0/0x20
  |  __handle_sysrq+0x124/0x190
  |  write_sysrq_trigger+0x64/0x88
  |  proc_reg_write+0x60/0xa8
  |  __vfs_write+0x18/0x40
  |  vfs_write+0xa4/0x1b8
  |  ksys_write+0x64/0xf0
  |  __arm64_sys_write+0x14/0x20
  |  el0_svc_common.constprop.0+0xb0/0x168
  |  el0_svc_handler+0x28/0x78
  |  el0_svc+0x8/0xc
  | Kernel Offset: disabled
  | CPU features: 0x0002,24002004
  | Memory Limit: none
  | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
  |  Ping 2!
  |  Ping 1!
  |  Ping 1!
  |  Ping 2!

The issue can also be triggered on x86 kernels if CONFIG_SMP=n,
otherwise local interrupts are disabled in 'smp_send_stop()'.

Disable preemption in 'panic()' before re-enabling interrupts.

Link: http://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
Signed-off-by: Will Deacon <[email protected]>
Reported-by: Xogium <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Russell King <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 22, 2019
finish_writeback_work() reads @done->waitq after decrementing
@done->cnt.  However, once @done->cnt reaches zero, @done may be freed
(from stack) at any moment and @done->waitq can contain something
unrelated by the time finish_writeback_work() tries to read it.  This
led to the following crash.

  "BUG: kernel NULL pointer dereference, address: 0000000000000002"
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 0 P4D 0
  Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
  CPU: 40 PID: 555153 Comm: kworker/u98:50 Kdump: loaded Not tainted
  ...
  Workqueue: writeback wb_workfn (flush-btrfs-1)
  RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
  Code: 48 89 d8 5b c3 e8 50 db 6b ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 48 89 d8 5b c3 89 c6 e8 fe ca 6b ff eb f2 66 90
  RSP: 0018:ffffc90049b27d98 EFLAGS: 00010046
  RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000003 RDI: 0000000000000002
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff889fff407600 R11: ffff88ba9395d740 R12: 000000000000e300
  R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff88bfdfa00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000002 CR3: 0000000002409005 CR4: 00000000001606e0
  Call Trace:
   __wake_up_common_lock+0x63/0xc0
   wb_workfn+0xd2/0x3e0
   process_one_work+0x1f5/0x3f0
   worker_thread+0x2d/0x3d0
   kthread+0x111/0x130
   ret_from_fork+0x1f/0x30

Fix it by reading and caching @done->waitq before decrementing
@done->cnt.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 5b9cce4 ("writeback: Generalize and expose wb_completion")
Signed-off-by: Tejun Heo <[email protected]>
Debugged-by: Chris Mason <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: <[email protected]>	[5.2+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 22, 2019
…not disabled

In kdump kernel, memcg usually is disabled with 'cgroup_disable=memory'
for saving memory.  Now kdump kernel will always panic when dump vmcore
to local disk:

  BUG: kernel NULL pointer dereference, address: 0000000000000ab8
  Oops: 0000 [#1] SMP NOPTI
  CPU: 0 PID: 598 Comm: makedumpfile Not tainted 5.3.0+ #26
  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018
  RIP: 0010:mem_cgroup_track_foreign_dirty_slowpath+0x38/0x140
  Call Trace:
   __set_page_dirty+0x52/0xc0
   iomap_set_page_dirty+0x50/0x90
   iomap_write_end+0x6e/0x270
   iomap_write_actor+0xce/0x170
   iomap_apply+0xba/0x11e
   iomap_file_buffered_write+0x62/0x90
   xfs_file_buffered_aio_write+0xca/0x320 [xfs]
   new_sync_write+0x12d/0x1d0
   vfs_write+0xa5/0x1a0
   ksys_write+0x59/0xd0
   do_syscall_64+0x59/0x1e0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

And this will corrupt the 1st kernel too with 'cgroup_disable=memory'.

Via the trace and with debugging, it is pointing to commit 97b2782
("writeback, memcg: Implement foreign dirty flushing") which introduced
this regression.  Disabling memcg causes the null pointer dereference at
uninitialized data in function mem_cgroup_track_foreign_dirty_slowpath().

Fix it by returning directly if memcg is disabled, but not trying to
record the foreign writebacks with dirty pages.

Link: http://lkml.kernel.org/r/20190924141928.GD31919@MiWiFi-R3L-srv
Fixes: 97b2782 ("writeback, memcg: Implement foreign dirty flushing")
Signed-off-by: Baoquan He <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 22, 2019
A customer reported the following softlockup:

[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002]  tracing_read_pipe+0x336/0x3c0
[899688.160002]  __vfs_read+0x26/0x140
[899688.160002]  vfs_read+0x87/0x130
[899688.160002]  SyS_read+0x42/0x90
[899688.160002]  do_syscall_64+0x74/0x160

It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.

Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.

The culprit seems to be in the code:

	/* reset all but tr, trace, and overruns */
	memset(&iter->seq, 0,
	       sizeof(struct trace_iterator) -
	       offsetof(struct trace_iterator, seq));

It was added by the commit 53d0aa7 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:

     struct trace_seq {
	unsigned char		buffer[PAGE_SIZE];
	unsigned int		len;
     };

There was no "size" variable and zeroing was perfectly fine.

The solution is to reinitialize the structure after or without
zeroing.

Link: http://lkml.kernel.org/r/[email protected]

Signed-off-by: Petr Mladek <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 22, 2019
Ensure that the drm_i915_file_private continues to exist as we attempt
to remove a request from its list, which may race with the destruction
of the file.

<6> [38.380714] [IGT] gem_ctx_create: starting subtest basic-files
<0> [42.201329] BUG: spinlock bad magic on CPU#0, kworker/u16:0/7
<4> [42.201356] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4> [42.201371] CPU: 0 PID: 7 Comm: kworker/u16:0 Tainted: G     U            5.3.0-rc5-CI-Patchwork_14169+ #1
<4> [42.201391] Hardware name: Dell Inc.                 OptiPlex 745                 /0GW726, BIOS 2.3.1  05/21/2007
<4> [42.201594] Workqueue: i915 retire_work_handler [i915]
<4> [42.201614] RIP: 0010:spin_dump+0x5a/0x90
<4> [42.201625] Code: 00 48 8d 88 c0 06 00 00 48 c7 c7 00 71 09 82 e8 35 ef 00 00 48 85 db 44 8b 4d 08 41 b8 ff ff ff ff 48 c7 c1 0b cd 0f 82 74 0e <44> 8b 83 e0 04 00 00 48 8d 8b c0 06 00 00 8b 55 04 48 89 ee 48 c7
<4> [42.201660] RSP: 0018:ffffc9000004bd80 EFLAGS: 00010202
<4> [42.201673] RAX: 0000000000000031 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff820fcd0b
<4> [42.201688] RDX: 0000000000000000 RSI: ffff88803de266f8 RDI: 00000000ffffffff
<4> [42.201703] RBP: ffff888038381ff8 R08: 00000000ffffffff R09: 000000006b6b6b6b
<4> [42.201718] R10: 0000000041cb0b89 R11: 646162206b636f6c R12: ffff88802a618500
<4> [42.201733] R13: ffff88802b32c288 R14: ffff888038381ff8 R15: ffff88802b32c250
<4> [42.201748] FS:  0000000000000000(0000) GS:ffff88803de00000(0000) knlGS:0000000000000000
<4> [42.201765] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [42.201778] CR2: 00007f2cefc6d180 CR3: 00000000381ee000 CR4: 00000000000006f0
<4> [42.201793] Call Trace:
<4> [42.201805]  do_raw_spin_lock+0x66/0xb0
<4> [42.201898]  i915_request_retire+0x548/0x7c0 [i915]
<4> [42.201989]  retire_requests+0x4d/0x60 [i915]
<4> [42.202078]  i915_retire_requests+0x144/0x2e0 [i915]
<4> [42.202169]  retire_work_handler+0x10/0x40 [i915]

Recently, in commit 44c22f3 ("drm/i915: Serialize insertion into the
file->mm.request_list"), we fixed a race on insertion. Now, it appears
we also have a race with destruction!

Signed-off-by: Chris Wilson <[email protected]>
Cc: Matthew Auld 
Reviewed-by: Matthew Auld 
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
sys-oak pushed a commit that referenced this issue Oct 22, 2019
If we make sure we grab a strong reference to each object as we dump it,
we can reduce the locks outside of our iterators to an rcu_read_lock.

This should prevent errors like:
[ 2138.371911] BUG: KASAN: use-after-free in per_file_stats+0x43/0x380 [i915]
[ 2138.371924] Read of size 8 at addr ffff888223651000 by task cat/8293

[ 2138.371947] CPU: 0 PID: 8293 Comm: cat Not tainted 5.3.0-rc6-CI-Custom_4352+ #1
[ 2138.371953] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4205-ITX, BIOS P1.40 07/14/2017
[ 2138.371959] Call Trace:
[ 2138.371974]  dump_stack+0x7c/0xbb
[ 2138.372099]  ? per_file_stats+0x43/0x380 [i915]
[ 2138.372108]  print_address_description+0x73/0x3a0
[ 2138.372231]  ? per_file_stats+0x43/0x380 [i915]
[ 2138.372352]  ? per_file_stats+0x43/0x380 [i915]
[ 2138.372362]  __kasan_report+0x14e/0x192
[ 2138.372489]  ? per_file_stats+0x43/0x380 [i915]
[ 2138.372502]  kasan_report+0xe/0x20
[ 2138.372625]  per_file_stats+0x43/0x380 [i915]
[ 2138.372751]  ? i915_panel_show+0x110/0x110 [i915]
[ 2138.372761]  idr_for_each+0xa7/0x160
[ 2138.372773]  ? idr_get_next_ul+0x110/0x110
[ 2138.372782]  ? do_raw_spin_lock+0x10a/0x1d0
[ 2138.372923]  print_context_stats+0x264/0x510 [i915]

Signed-off-by: Chris Wilson <[email protected]>
Tested-by: David Weinehall <[email protected]>
Reviewed-by: Mika Kuoppala <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
sys-oak pushed a commit that referenced this issue Oct 22, 2019
Being a "low-level" test, we opt to bypass the normal bind/unbind hooks
for the lower level insert_entries/clear_range. For ggtt, the
bind/unbind hooks provide the runtime wakeref and so we must also handle
this in exercising the low level hooks.

<4> [538.151672] RPM raw-wakeref not held
<4> [538.151825] WARNING: CPU: 0 PID: 11 at ./drivers/gpu/drm/i915/intel_runtime_pm.h:107 fwtable_read32+0x1be/0x300 [i915]
<4> [538.151830] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp btusb btrtl btbcm x86_pkg_temp_thermal coretemp btintel crct10dif_pclmul bluetooth crc32_pclmul snd_intel_nhlt snd_hda_codec ecdh_generic ghash_clmulni_intel ecc snd_hwdep snd_hda_core lpc_ich r8169 realtek snd_pcm mei_me mei prime_numbers pinctrl_broxton pinctrl_intel [last unloaded: i915]
<4> [538.151861] CPU: 0 PID: 11 Comm: migration/0 Tainted: G     U            5.3.0-rc7-CI-Trybot_4938+ #1
<4> [538.151864] Hardware name: Intel corporation NUC6CAYS/NUC6CAYB, BIOS AYAPLCEL.86A.0056.2018.0926.1100 09/26/2018
<4> [538.151960] RIP: 0010:fwtable_read32+0x1be/0x300 [i915]
<4> [538.151965] Code: e8 e7 f9 5f e0 e9 0b ff ff ff 80 3d d5 8d 26 00 00 0f 85 81 fe ff ff 48 c7 c7 ef 01 bd a0 c6 05 c1 8d 26 00 01 e8 b2 e4 6a e0 <0f> 0b e9 67 fe ff ff 80 3d ad 8d 26 00 00 0f 85 65 fe ff ff 48 c7
<4> [538.151969] RSP: 0018:ffffc9000007be10 EFLAGS: 00010086
<4> [538.151972] RAX: 0000000000000000 RBX: ffff88826be10d50 RCX: 0000000000000002
<4> [538.151975] RDX: 0000000080000002 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [538.151978] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
<4> [538.151981] R10: 0000000000000000 R11: ffffc9000007bcb0 R12: 0000000000101008
<4> [538.151984] R13: 0000000000000000 R14: ffffc9000036f638 R15: 0000000000000002
<4> [538.151987] FS:  0000000000000000(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
<4> [538.151990] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [538.151993] CR2: 00007fd48e7052f8 CR3: 0000000005210000 CR4: 00000000003406f0
<4> [538.151995] Call Trace:
<4> [538.152106]  bxt_vtd_ggtt_clear_range__cb+0x38/0x40 [i915]

Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Matthew Auld 
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
sys-oak pushed a commit that referenced this issue Oct 22, 2019
Explosions during early driver init on the error path. Make sure we fail
gracefully.

[ 9547.672258] BUG: kernel NULL pointer dereference, address: 000000000000007c
[ 9547.672288] #PF: supervisor read access in kernel mode
[ 9547.672292] #PF: error_code(0x0000) - not-present page
[ 9547.672296] PGD 8000000846b41067 P4D 8000000846b41067 PUD 797034067 PMD 0
[ 9547.672303] Oops: 0000 [#1] SMP PTI
[ 9547.672307] CPU: 1 PID: 25634 Comm: i915_selftest Tainted: G     U            5.3.0-rc8+ #73
[ 9547.672313] Hardware name:  /NUC6i7KYB, BIOS KYSKLi70.86A.0050.2017.0831.1924 08/31/2017
[ 9547.672395] RIP: 0010:intel_context_unpin+0x9/0x100 [i915]
[ 9547.672400] Code: 6b 60 00 e9 17 ff ff ff bd fc ff ff ff e9 7c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00
 00 0f 1f 40 00 0f 1f 44 00 00 41 54 55 53 <8b> 47 7c 83 f8 01 74 26 8d 48 ff f0 0f b1 4f 7c 48 8d 57 7c
 75 05
[ 9547.672413] RSP: 0018:ffffae8ac24ff878 EFLAGS: 00010246
[ 9547.672417] RAX: ffff944a1b7842d0 RBX: ffff944a1b784000 RCX: ffff944a12dd6fa8
[ 9547.672422] RDX: ffff944a1b7842c0 RSI: ffff944a12dd5328 RDI: 0000000000000000
[ 9547.672428] RBP: 0000000000000000 R08: ffff944a11e5d840 R09: 0000000000000000
[ 9547.672433] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 9547.672438] R13: ffffffffc11aaf00 R14: 00000000ffffffe4 R15: ffff944a0e29bf38
[ 9547.672443] FS:  00007fc259b88ac0(0000) GS:ffff944a1f880000(0000) knlGS:0000000000000000
[ 9547.672449] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9547.672454] CR2: 000000000000007c CR3: 0000000853346003 CR4: 00000000003606e0
[ 9547.672459] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9547.672464] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9547.672469] Call Trace:
[ 9547.672518]  intel_engine_cleanup_common+0xe3/0x270 [i915]
[ 9547.672567]  execlists_destroy+0xe/0x30 [i915]
[ 9547.672669]  intel_engines_init+0x94/0xf0 [i915]
[ 9547.672749]  i915_gem_init+0x191/0x950 [i915]

Signed-off-by: Matthew Auld 
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link:
sys-oak pushed a commit that referenced this issue Oct 22, 2019
With deferring the breadcrumb signalling to the virtual engine (thanks
preempt-to-busy) we need to make sure the lists and irq-worker are ready
to send a signal.

[41958.710544] BUG: kernel NULL pointer dereference, address: 0000000000000000
[41958.710553] #PF: supervisor write access in kernel mode
[41958.710556] #PF: error_code(0x0002) - not-present page
[41958.710558] PGD 0 P4D 0
[41958.710562] Oops: 0002 [#1] SMP
[41958.710565] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U            5.3.0+ #207
[41958.710568] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[41958.710602] RIP: 0010:i915_request_enable_breadcrumb+0xe1/0x130 [i915]
[41958.710605] Code: 8b 44 24 30 48 89 41 08 48 89 08 48 8b 85 98 01 00 00 48 8d 8d 90 01 00 00 48 89 95 98 01 00 00 49 89 4c 24 28 49 89 44 24 30 <48> 89 10 f0 80 4b 30 10 c6 85 88 01 00 00 00 e9 1a ff ff ff 48 83
[41958.710609] RSP: 0018:ffffc90000003de0 EFLAGS: 00010046
[41958.710612] RAX: 0000000000000000 RBX: ffff888735424480 RCX: ffff8887cddb2190
[41958.710614] RDX: ffff8887cddb3570 RSI: ffff888850362190 RDI: ffff8887cddb2188
[41958.710617] RBP: ffff8887cddb2000 R08: ffff8888503624a8 R09: 0000000000000100
[41958.710619] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8887cddb3548
[41958.710622] R13: 0000000000000000 R14: 0000000000000046 R15: ffff888850362070
[41958.710625] FS:  0000000000000000(0000) GS:ffff88885ea00000(0000) knlGS:0000000000000000
[41958.710628] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[41958.710630] CR2: 0000000000000000 CR3: 0000000002c09002 CR4: 00000000001606f0
[41958.710633] Call Trace:
[41958.710636]  <IRQ>
[41958.710668]  __i915_request_submit+0x12b/0x160 [i915]
[41958.710693]  virtual_submit_request+0x67/0x120 [i915]
[41958.710720]  __unwind_incomplete_requests+0x131/0x170 [i915]
[41958.710744]  execlists_dequeue+0xb40/0xe00 [i915]
[41958.710771]  execlists_submission_tasklet+0x10f/0x150 [i915]
[41958.710776]  tasklet_action_common.isra.17+0x41/0xa0
[41958.710781]  __do_softirq+0xc8/0x221
[41958.710785]  irq_exit+0xa6/0xb0
[41958.710788]  smp_apic_timer_interrupt+0x4d/0x80
[41958.710791]  apic_timer_interrupt+0xf/0x20
[41958.710794]  </IRQ>

Fixes: cb2377a ("drm/i915: Fixup preempt-to-busy vs reset of a virtual request")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
sys-oak pushed a commit that referenced this issue Oct 22, 2019
Daniel Vetter uncovered a nasty cycle in using the mmu-notifiers to
invalidate userptr objects which also happen to be pulled into GGTT
mmaps. That is when we unbind the userptr object (on mmu invalidation),
we revoke all CPU mmaps, which may then recurse into mmu invalidation.

We looked for ways of breaking the cycle, but the revocation on
invalidation is required and cannot be avoided. The only solution we
could see was to not allow such GGTT bindings of userptr objects in the
first place. In practice, no one really wants to use a GGTT mmapping of
a CPU pointer...

Just before Daniel's explosive lockdep patches land in v5.4-rc1, we got
a genuine blip from CI:

<4>[  246.793958] ======================================================
<4>[  246.793972] WARNING: possible circular locking dependency detected
<4>[  246.793989] 5.3.0-gbd6c56f50d15-drmtip_372+ #1 Tainted: G     U
<4>[  246.794003] ------------------------------------------------------
<4>[  246.794017] kswapd0/145 is trying to acquire lock:
<4>[  246.794030] 000000003f565be6 (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[  246.794250]
                  but task is already holding lock:
<4>[  246.794263] 000000001799cef9 (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0
<4>[  246.794291]
                  which lock already depends on the new lock.

<4>[  246.794307]
                  the existing dependency chain (in reverse order) is:
<4>[  246.794322]
                  -> #3 (&anon_vma->rwsem){++++}:
<4>[  246.794344]        down_write+0x33/0x70
<4>[  246.794357]        __vma_adjust+0x3d9/0x7b0
<4>[  246.794370]        __split_vma+0x16a/0x180
<4>[  246.794385]        mprotect_fixup+0x2a5/0x320
<4>[  246.794399]        do_mprotect_pkey+0x208/0x2e0
<4>[  246.794413]        __x64_sys_mprotect+0x16/0x20
<4>[  246.794429]        do_syscall_64+0x55/0x1c0
<4>[  246.794443]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794456]
                  -> #2 (&mapping->i_mmap_rwsem){++++}:
<4>[  246.794478]        down_write+0x33/0x70
<4>[  246.794493]        unmap_mapping_pages+0x48/0x130
<4>[  246.794519]        i915_vma_revoke_mmap+0x81/0x1b0 [i915]
<4>[  246.794519]        i915_vma_unbind+0x11d/0x4a0 [i915]
<4>[  246.794519]        i915_vma_destroy+0x31/0x300 [i915]
<4>[  246.794519]        __i915_gem_free_objects+0xb8/0x4b0 [i915]
<4>[  246.794519]        drm_file_free.part.0+0x1e6/0x290
<4>[  246.794519]        drm_release+0xa6/0xe0
<4>[  246.794519]        __fput+0xc2/0x250
<4>[  246.794519]        task_work_run+0x82/0xb0
<4>[  246.794519]        do_exit+0x35b/0xdb0
<4>[  246.794519]        do_group_exit+0x34/0xb0
<4>[  246.794519]        __x64_sys_exit_group+0xf/0x10
<4>[  246.794519]        do_syscall_64+0x55/0x1c0
<4>[  246.794519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794519]
                  -> #1 (&vm->mutex){+.+.}:
<4>[  246.794519]        i915_gem_shrinker_taints_mutex+0x6d/0xe0 [i915]
<4>[  246.794519]        i915_address_space_init+0x9f/0x160 [i915]
<4>[  246.794519]        i915_ggtt_init_hw+0x55/0x170 [i915]
<4>[  246.794519]        i915_driver_probe+0xc9f/0x1620 [i915]
<4>[  246.794519]        i915_pci_probe+0x43/0x1b0 [i915]
<4>[  246.794519]        pci_device_probe+0x9e/0x120
<4>[  246.794519]        really_probe+0xea/0x3d0
<4>[  246.794519]        driver_probe_device+0x10b/0x120
<4>[  246.794519]        device_driver_attach+0x4a/0x50
<4>[  246.794519]        __driver_attach+0x97/0x130
<4>[  246.794519]        bus_for_each_dev+0x74/0xc0
<4>[  246.794519]        bus_add_driver+0x13f/0x210
<4>[  246.794519]        driver_register+0x56/0xe0
<4>[  246.794519]        do_one_initcall+0x58/0x300
<4>[  246.794519]        do_init_module+0x56/0x1f6
<4>[  246.794519]        load_module+0x25bd/0x2a40
<4>[  246.794519]        __se_sys_finit_module+0xd3/0xf0
<4>[  246.794519]        do_syscall_64+0x55/0x1c0
<4>[  246.794519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794519]
                  -> #0 (&dev->struct_mutex/1){+.+.}:
<4>[  246.794519]        __lock_acquire+0x15d8/0x1e90
<4>[  246.794519]        lock_acquire+0xa6/0x1c0
<4>[  246.794519]        __mutex_lock+0x9d/0x9b0
<4>[  246.794519]        userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[  246.794519]        __mmu_notifier_invalidate_range_start+0x85/0x110
<4>[  246.794519]        try_to_unmap_one+0x76b/0x860
<4>[  246.794519]        rmap_walk_anon+0x104/0x280
<4>[  246.794519]        try_to_unmap+0xc0/0xf0
<4>[  246.794519]        shrink_page_list+0x561/0xc10
<4>[  246.794519]        shrink_inactive_list+0x220/0x440
<4>[  246.794519]        shrink_node_memcg+0x36e/0x740
<4>[  246.794519]        shrink_node+0xcb/0x490
<4>[  246.794519]        balance_pgdat+0x241/0x580
<4>[  246.794519]        kswapd+0x16c/0x530
<4>[  246.794519]        kthread+0x119/0x130
<4>[  246.794519]        ret_from_fork+0x24/0x50
<4>[  246.794519]
                  other info that might help us debug this:

<4>[  246.794519] Chain exists of:
                    &dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem

<4>[  246.794519]  Possible unsafe locking scenario:

<4>[  246.794519]        CPU0                    CPU1
<4>[  246.794519]        ----                    ----
<4>[  246.794519]   lock(&anon_vma->rwsem);
<4>[  246.794519]                                lock(&mapping->i_mmap_rwsem);
<4>[  246.794519]                                lock(&anon_vma->rwsem);
<4>[  246.794519]   lock(&dev->struct_mutex/1);
<4>[  246.794519]
                   *** DEADLOCK ***

v2: Say no to mmap_ioctl

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111744
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111870
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
sys-oak pushed a commit that referenced this issue Oct 22, 2019
As we may signal a request and take the engine->active.lock within the
signaler, the engine submission paths have to use a nested annotation on
their requests -- but we guarantee that we can never submit on the same
engine as the signaling fence.

<4>[  723.763281] WARNING: possible circular locking dependency detected
<4>[  723.763285] 5.3.0-g80fa0e042cdb-drmtip_379+ #1 Tainted: G     U
<4>[  723.763288] ------------------------------------------------------
<4>[  723.763291] gem_exec_await/1388 is trying to acquire lock:
<4>[  723.763294] ffff93a7b53221d8 (&engine->active.lock){..-.}, at: execlists_submit_request+0x2b/0x1e0 [i915]
<4>[  723.763378]
                  but task is already holding lock:
<4>[  723.763381] ffff93a7c25f6d20 (&i915_request_get(rq)->submit/1){-.-.}, at: __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  723.763420]
                  which lock already depends on the new lock.

<4>[  723.763423]
                  the existing dependency chain (in reverse order) is:
<4>[  723.763427]
                  -> #2 (&i915_request_get(rq)->submit/1){-.-.}:
<4>[  723.763434]        _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[  723.763478]        __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  723.763513]        intel_engine_breadcrumbs_irq+0x3aa/0x5e0 [i915]
<4>[  723.763600]        cs_irq_handler+0x49/0x50 [i915]
<4>[  723.763659]        gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[  723.763690]        gen11_irq_handler+0x54/0xf0 [i915]
<4>[  723.763695]        __handle_irq_event_percpu+0x41/0x2d0
<4>[  723.763699]        handle_irq_event_percpu+0x2b/0x70
<4>[  723.763702]        handle_irq_event+0x2f/0x50
<4>[  723.763706]        handle_edge_irq+0xee/0x1a0
<4>[  723.763709]        do_IRQ+0x7e/0x160
<4>[  723.763712]        ret_from_intr+0x0/0x1d
<4>[  723.763717]        __slab_alloc.isra.28.constprop.33+0x4f/0x70
<4>[  723.763720]        kmem_cache_alloc+0x28d/0x2f0
<4>[  723.763724]        vm_area_dup+0x15/0x40
<4>[  723.763727]        dup_mm+0x2dd/0x550
<4>[  723.763730]        copy_process+0xf21/0x1ef0
<4>[  723.763734]        _do_fork+0x71/0x670
<4>[  723.763737]        __se_sys_clone+0x6e/0xa0
<4>[  723.763741]        do_syscall_64+0x4f/0x210
<4>[  723.763744]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  723.763747]
                  -> #1 (&(&rq->lock)->rlock#2){-.-.}:
<4>[  723.763752]        _raw_spin_lock+0x2a/0x40
<4>[  723.763789]        __unwind_incomplete_requests+0x3eb/0x450 [i915]
<4>[  723.763825]        __execlists_submission_tasklet+0x9ec/0x1d60 [i915]
<4>[  723.763864]        execlists_submission_tasklet+0x34/0x50 [i915]
<4>[  723.763874]        tasklet_action_common.isra.5+0x47/0xb0
<4>[  723.763878]        __do_softirq+0xd8/0x4ae
<4>[  723.763881]        irq_exit+0xa9/0xc0
<4>[  723.763883]        smp_apic_timer_interrupt+0xb7/0x280
<4>[  723.763887]        apic_timer_interrupt+0xf/0x20
<4>[  723.763892]        cpuidle_enter_state+0xae/0x450
<4>[  723.763895]        cpuidle_enter+0x24/0x40
<4>[  723.763899]        do_idle+0x1e7/0x250
<4>[  723.763902]        cpu_startup_entry+0x14/0x20
<4>[  723.763905]        start_secondary+0x15f/0x1b0
<4>[  723.763908]        secondary_startup_64+0xa4/0xb0
<4>[  723.763911]
                  -> #0 (&engine->active.lock){..-.}:
<4>[  723.763916]        __lock_acquire+0x15d8/0x1ea0
<4>[  723.763919]        lock_acquire+0xa6/0x1c0
<4>[  723.763922]        _raw_spin_lock_irqsave+0x33/0x50
<4>[  723.763956]        execlists_submit_request+0x2b/0x1e0 [i915]
<4>[  723.764002]        submit_notify+0xa8/0x13c [i915]
<4>[  723.764035]        __i915_sw_fence_complete+0x81/0x250 [i915]
<4>[  723.764054]        i915_sw_fence_wake+0x51/0x64 [i915]
<4>[  723.764054]        __i915_sw_fence_complete+0x1ee/0x250 [i915]
<4>[  723.764054]        dma_i915_sw_fence_wake_timer+0x14/0x20 [i915]
<4>[  723.764054]        dma_fence_signal_locked+0x9e/0x1c0
<4>[  723.764054]        dma_fence_signal+0x1f/0x40
<4>[  723.764054]        vgem_fence_signal_ioctl+0x67/0xc0 [vgem]
<4>[  723.764054]        drm_ioctl_kernel+0x83/0xf0
<4>[  723.764054]        drm_ioctl+0x2f3/0x3b0
<4>[  723.764054]        do_vfs_ioctl+0xa0/0x6f0
<4>[  723.764054]        ksys_ioctl+0x35/0x60
<4>[  723.764054]        __x64_sys_ioctl+0x11/0x20
<4>[  723.764054]        do_syscall_64+0x4f/0x210
<4>[  723.764054]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  723.764054]
                  other info that might help us debug this:

<4>[  723.764054] Chain exists of:
                    &engine->active.lock --> &(&rq->lock)->rlock#2 --> &i915_request_get(rq)->submit/1

<4>[  723.764054]  Possible unsafe locking scenario:

<4>[  723.764054]        CPU0                    CPU1
<4>[  723.764054]        ----                    ----
<4>[  723.764054]   lock(&i915_request_get(rq)->submit/1);
<4>[  723.764054]                                lock(&(&rq->lock)->rlock#2);
<4>[  723.764054]                                lock(&i915_request_get(rq)->submit/1);
<4>[  723.764054]   lock(&engine->active.lock);
<4>[  723.764054]
                   *** DEADLOCK ***

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111862
Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
sys-oak pushed a commit that referenced this issue Oct 22, 2019
Disable irqs around updating the context image to keep lockdep happy:

<4>[  673.483340] WARNING: possible irq lock inversion dependency detected
<4>[  673.483342] 5.4.0-rc1-CI-Trybot_5118+ #1 Tainted: G     U
<4>[  673.483342] --------------------------------------------------------
<4>[  673.483343] swapper/2/0 just changed the state of lock:
<4>[  673.483344] ffff88845db885a0 (&i915_request_get(rq)->submit/1){-...}, at: __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  673.483387] but this lock took another, HARDIRQ-unsafe lock in the past:
<4>[  673.483388]  (&ce->pin_mutex/2){+...}
<4>[  673.483389]

                  and interrupts could create inverse lock ordering between them.

<4>[  673.483390]
                  other info that might help us debug this:
<4>[  673.483390] Chain exists of:
                    &i915_request_get(rq)->submit/1 --> &engine->active.lock --> &ce->pin_mutex/2

<4>[  673.483392]  Possible interrupt unsafe locking scenario:

<4>[  673.483392]        CPU0                    CPU1
<4>[  673.483393]        ----                    ----
<4>[  673.483393]   lock(&ce->pin_mutex/2);
<4>[  673.483394]                                local_irq_disable();
<4>[  673.483395]                                lock(&i915_request_get(rq)->submit/1);
<4>[  673.483396]                                lock(&engine->active.lock);
<4>[  673.483396]   <Interrupt>
<4>[  673.483397]     lock(&i915_request_get(rq)->submit/1);
<4>[  673.483398]
                   *** DEADLOCK ***

<4>[  673.483398] 2 locks held by swapper/2/0:
<4>[  673.483399]  #0: ffff8883f61ac9b0 (&(&gt->irq_lock)->rlock){-.-.}, at: gen11_gt_irq_handler+0x42/0x280 [i915]
<4>[  673.483433]  #1: ffff88845db8c418 (&(&rq->lock)->rlock){-.-.}, at: intel_engine_breadcrumbs_irq+0x34a/0x5a0 [i915]
<4>[  673.483463]
                  the shortest dependencies between 2nd lock and 1st lock:
<4>[  673.483466]   -> (&ce->pin_mutex/2){+...} ops: 614520 {
<4>[  673.483468]      HARDIRQ-ON-W at:
<4>[  673.483471]                         lock_acquire+0xa7/0x1c0
<4>[  673.483501]                         live_unlite_restore+0x1d8/0x6c0 [i915]
<4>[  673.483543]                         __i915_subtests+0xb8/0x210 [i915]
<4>[  673.483581]                         __run_selftests+0x112/0x170 [i915]
<4>[  673.483615]                         i915_live_selftests+0x2c/0x60 [i915]
<4>[  673.483644]                         i915_pci_probe+0x93/0x1b0 [i915]
<4>[  673.483646]                         pci_device_probe+0x9e/0x120
<4>[  673.483648]                         really_probe+0xea/0x420
<4>[  673.483649]                         driver_probe_device+0x10b/0x120
<4>[  673.483651]                         device_driver_attach+0x4a/0x50
<4>[  673.483652]                         __driver_attach+0x97/0x130
<4>[  673.483653]                         bus_for_each_dev+0x74/0xc0
<4>[  673.483654]                         bus_add_driver+0x142/0x220
<4>[  673.483655]                         driver_register+0x56/0xf0
<4>[  673.483657]                         do_one_initcall+0x58/0x2ff
<4>[  673.483659]                         do_init_module+0x56/0x1f8
<4>[  673.483660]                         load_module+0x243e/0x29f0
<4>[  673.483661]                         __do_sys_finit_module+0xe9/0x110
<4>[  673.483662]                         do_syscall_64+0x4f/0x210
<4>[  673.483665]                         entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  673.483665]      INITIAL USE at:
<4>[  673.483667]                        lock_acquire+0xa7/0x1c0
<4>[  673.483698]                        live_unlite_restore+0x1d8/0x6c0 [i915]
<4>[  673.483733]                        __i915_subtests+0xb8/0x210 [i915]
<4>[  673.483764]                        __run_selftests+0x112/0x170 [i915]
<4>[  673.483793]                        i915_live_selftests+0x2c/0x60 [i915]
<4>[  673.483821]                        i915_pci_probe+0x93/0x1b0 [i915]
<4>[  673.483822]                        pci_device_probe+0x9e/0x120
<4>[  673.483824]                        really_probe+0xea/0x420
<4>[  673.483825]                        driver_probe_device+0x10b/0x120
<4>[  673.483826]                        device_driver_attach+0x4a/0x50
<4>[  673.483827]                        __driver_attach+0x97/0x130
<4>[  673.483828]                        bus_for_each_dev+0x74/0xc0
<4>[  673.483829]                        bus_add_driver+0x142/0x220
<4>[  673.483830]                        driver_register+0x56/0xf0
<4>[  673.483831]                        do_one_initcall+0x58/0x2ff
<4>[  673.483833]                        do_init_module+0x56/0x1f8
<4>[  673.483834]                        load_module+0x243e/0x29f0
<4>[  673.483835]                        __do_sys_finit_module+0xe9/0x110
<4>[  673.483836]                        do_syscall_64+0x4f/0x210
<4>[  673.483837]                        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  673.483838]    }
<4>[  673.483868]    ... key      at: [<ffffffffa0a8f132>] __key.70113+0x2/0xffffffffffef2ed0 [i915]
<4>[  673.483869]    ... acquired at:
<4>[  673.483935]    __execlists_reset+0xfb/0xc20 [i915]
<4>[  673.483965]    execlists_reset+0x3d/0x50 [i915]
<4>[  673.483995]    intel_engine_reset+0xdf/0x230 [i915]
<4>[  673.484022]    live_preempt_hang+0x1d7/0x2e0 [i915]
<4>[  673.484064]    __i915_subtests+0xb8/0x210 [i915]
<4>[  673.484130]    __run_selftests+0x112/0x170 [i915]
<4>[  673.484163]    i915_live_selftests+0x2c/0x60 [i915]
<4>[  673.484193]    i915_pci_probe+0x93/0x1b0 [i915]
<4>[  673.484194]    pci_device_probe+0x9e/0x120
<4>[  673.484195]    really_probe+0xea/0x420
<4>[  673.484196]    driver_probe_device+0x10b/0x120
<4>[  673.484197]    device_driver_attach+0x4a/0x50
<4>[  673.484198]    __driver_attach+0x97/0x130
<4>[  673.484199]    bus_for_each_dev+0x74/0xc0
<4>[  673.484200]    bus_add_driver+0x142/0x220
<4>[  673.484202]    driver_register+0x56/0xf0
<4>[  673.484203]    do_one_initcall+0x58/0x2ff
<4>[  673.484204]    do_init_module+0x56/0x1f8
<4>[  673.484205]    load_module+0x243e/0x29f0
<4>[  673.484206]    __do_sys_finit_module+0xe9/0x110
<4>[  673.484207]    do_syscall_64+0x4f/0x210
<4>[  673.484208]    entry_SYSCALL_64_after_hwframe+0x49/0xbe

<4>[  673.484209]  -> (&engine->active.lock){..-.} ops: 972791 {
<4>[  673.484211]     IN-SOFTIRQ-W at:
<4>[  673.484213]                       lock_acquire+0xa7/0x1c0
<4>[  673.484214]                       _raw_spin_lock_irqsave+0x33/0x50
<4>[  673.484244]                       execlists_submission_tasklet+0xaf/0x100 [i915]
<4>[  673.484246]                       tasklet_action_common.isra.18+0x6c/0x1c0
<4>[  673.484247]                       __do_softirq+0xdf/0x47f
<4>[  673.484248]                       irq_exit+0xba/0xc0
<4>[  673.484249]                       do_IRQ+0x83/0x160
<4>[  673.484250]                       ret_from_intr+0x0/0x1d
<4>[  673.484252]                       cpuidle_enter_state+0xb2/0x450
<4>[  673.484253]                       cpuidle_enter+0x24/0x40
<4>[  673.484254]                       do_idle+0x1e7/0x250
<4>[  673.484256]                       cpu_startup_entry+0x14/0x20
<4>[  673.484257]                       start_secondary+0x15f/0x1b0
<4>[  673.484258]                       secondary_startup_64+0xa4/0xb0
<4>[  673.484259]     INITIAL USE at:
<4>[  673.484261]                      lock_acquire+0xa7/0x1c0
<4>[  673.484290]                      intel_engine_init_active+0x7e/0xb0 [i915]
<4>[  673.484305]                      intel_engines_setup+0x1cd/0x3b0 [i915]
<4>[  673.484305]                      i915_gem_init+0x12d/0x900 [i915]
<4>[  673.484305]                      i915_driver_probe+0xb70/0x15d0 [i915]
<4>[  673.484305]                      i915_pci_probe+0x43/0x1b0 [i915]
<4>[  673.484305]                      pci_device_probe+0x9e/0x120
<4>[  673.484305]                      really_probe+0xea/0x420
<4>[  673.484305]                      driver_probe_device+0x10b/0x120
<4>[  673.484305]                      device_driver_attach+0x4a/0x50
<4>[  673.484305]                      __driver_attach+0x97/0x130
<4>[  673.484305]                      bus_for_each_dev+0x74/0xc0
<4>[  673.484305]                      bus_add_driver+0x142/0x220
<4>[  673.484305]                      driver_register+0x56/0xf0
<4>[  673.484305]                      do_one_initcall+0x58/0x2ff
<4>[  673.484305]                      do_init_module+0x56/0x1f8
<4>[  673.484305]                      load_module+0x243e/0x29f0
<4>[  673.484305]                      __do_sys_finit_module+0xe9/0x110
<4>[  673.484305]                      do_syscall_64+0x4f/0x210
<4>[  673.484305]                      entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  673.484305]   }
<4>[  673.484305]   ... key      at: [<ffffffffa0a8f160>] __key.70307+0x0/0xffffffffffef2ea0 [i915]
<4>[  673.484305]   ... acquired at:
<4>[  673.484305]    _raw_spin_lock_irqsave+0x33/0x50
<4>[  673.484305]    execlists_submit_request+0x2b/0x1e0 [i915]
<4>[  673.484305]    submit_notify+0xa8/0x13c [i915]
<4>[  673.484305]    __i915_sw_fence_complete+0x81/0x250 [i915]
<4>[  673.484305]    i915_sw_fence_wake+0x51/0x70 [i915]
<4>[  673.484305]    __i915_sw_fence_complete+0x1ee/0x250 [i915]
<4>[  673.484305]    dma_i915_sw_fence_wake+0x1b/0x30 [i915]
<4>[  673.484305]    dma_fence_signal_locked+0x9e/0x1b0
<4>[  673.484305]    dma_fence_signal+0x1f/0x40
<4>[  673.484305]    fence_work+0x28/0x80 [i915]
<4>[  673.484305]    process_one_work+0x26a/0x620
<4>[  673.484305]    worker_thread+0x37/0x380
<4>[  673.484305]    kthread+0x119/0x130
<4>[  673.484305]    ret_from_fork+0x24/0x50

<4>[  673.484305] -> (&i915_request_get(rq)->submit/1){-...} ops: 857694 {
<4>[  673.484305]    IN-HARDIRQ-W at:
<4>[  673.484305]                     lock_acquire+0xa7/0x1c0
<4>[  673.484305]                     _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[  673.484305]                     __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  673.484305]                     intel_engine_breadcrumbs_irq+0x3d0/0x5a0 [i915]
<4>[  673.484305]                     cs_irq_handler+0x39/0x50 [i915]
<4>[  673.484305]                     gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[  673.484305]                     gen11_irq_handler+0x54/0xf0 [i915]
<4>[  673.484305]                     __handle_irq_event_percpu+0x41/0x2c0
<4>[  673.484305]                     handle_irq_event_percpu+0x2b/0x70
<4>[  673.484305]                     handle_irq_event+0x2f/0x50
<4>[  673.484305]                     handle_edge_irq+0x99/0x1b0
<4>[  673.484305]                     do_IRQ+0x7e/0x160
<4>[  673.484305]                     ret_from_intr+0x0/0x1d
<4>[  673.484305]                     cpuidle_enter_state+0xb2/0x450
<4>[  673.484305]                     cpuidle_enter+0x24/0x40
<4>[  673.484305]                     do_idle+0x1e7/0x250
<4>[  673.484305]                     cpu_startup_entry+0x14/0x20
<4>[  673.484305]                     start_secondary+0x15f/0x1b0
<4>[  673.484305]                     secondary_startup_64+0xa4/0xb0
<4>[  673.484305]    INITIAL USE at:
<4>[  673.484305]                    lock_acquire+0xa7/0x1c0
<4>[  673.484305]                    _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[  673.484305]                    __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  673.484305]                    __engine_park+0x233/0x420 [i915]
<4>[  673.484305]                    ____intel_wakeref_put_last+0x1c/0x70 [i915]
<4>[  673.484305]                    intel_gt_resume+0x202/0x2c0 [i915]
<4>[  673.484305]                    i915_gem_init+0x36e/0x900 [i915]
<4>[  673.484305]                    i915_driver_probe+0xb70/0x15d0 [i915]
<4>[  673.484305]                    i915_pci_probe+0x43/0x1b0 [i915]
<4>[  673.484305]                    pci_device_probe+0x9e/0x120
<4>[  673.484305]                    really_probe+0xea/0x420
<4>[  673.484305]                    driver_probe_device+0x10b/0x120
<4>[  673.484305]                    device_driver_attach+0x4a/0x50
<4>[  673.484305]                    __driver_attach+0x97/0x130
<4>[  673.484305]                    bus_for_each_dev+0x74/0xc0
<4>[  673.484305]                    bus_add_driver+0x142/0x220
<4>[  673.484305]                    driver_register+0x56/0xf0
<4>[  673.484305]                    do_one_initcall+0x58/0x2ff
<4>[  673.484305]                    do_init_module+0x56/0x1f8
<4>[  673.484305]                    load_module+0x243e/0x29f0
<4>[  673.484305]                    __do_sys_finit_module+0xe9/0x110
<4>[  673.484305]                    do_syscall_64+0x4f/0x210
<4>[  673.484305]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  673.484305]  }
<4>[  673.484305]  ... key      at: [<ffffffffa0a8f6a1>] __key.80173+0x1/0xffffffffffef2960 [i915]
<4>[  673.484305]  ... acquired at:
<4>[  673.484305]    mark_lock+0x382/0x500
<4>[  673.484305]    __lock_acquire+0x7e1/0x15d0
<4>[  673.484305]    lock_acquire+0xa7/0x1c0
<4>[  673.484305]    _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[  673.484305]    __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  673.484305]    intel_engine_breadcrumbs_irq+0x3d0/0x5a0 [i915]
<4>[  673.484305]    cs_irq_handler+0x39/0x50 [i915]
<4>[  673.484305]    gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[  673.484305]    gen11_irq_handler+0x54/0xf0 [i915]
<4>[  673.484305]    __handle_irq_event_percpu+0x41/0x2c0
<4>[  673.484305]    handle_irq_event_percpu+0x2b/0x70
<4>[  673.484305]    handle_irq_event+0x2f/0x50
<4>[  673.484305]    handle_edge_irq+0x99/0x1b0
<4>[  673.484305]    do_IRQ+0x7e/0x160
<4>[  673.484305]    ret_from_intr+0x0/0x1d
<4>[  673.484305]    cpuidle_enter_state+0xb2/0x450
<4>[  673.484305]    cpuidle_enter+0x24/0x40
<4>[  673.484305]    do_idle+0x1e7/0x250
<4>[  673.484305]    cpu_startup_entry+0x14/0x20
<4>[  673.484305]    start_secondary+0x15f/0x1b0
<4>[  673.484305]    secondary_startup_64+0xa4/0xb0

<4>[  673.484305]
                  stack backtrace:
<4>[  673.484305] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G     U            5.4.0-rc1-CI-Trybot_5118+ #1
<4>[  673.484305] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4>[  673.484305] Call Trace:
<4>[  673.484305]  <IRQ>
<4>[  673.484305]  dump_stack+0x67/0x9b
<4>[  673.484305]  check_usage_forwards+0x13c/0x150
<4>[  673.484305]  ? mark_lock+0x382/0x500
<4>[  673.484305]  mark_lock+0x382/0x500
<4>[  673.484305]  ? check_usage_backwards+0x140/0x140
<4>[  673.484305]  __lock_acquire+0x7e1/0x15d0
<4>[  673.484305]  ? debug_object_deactivate+0x17e/0x190
<4>[  673.484305]  lock_acquire+0xa7/0x1c0
<4>[  673.484305]  ? __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  673.484305]  _raw_spin_lock_irqsave_nested+0x39/0x50
<4>[  673.484305]  ? __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  673.484305]  __i915_sw_fence_complete+0x1b2/0x250 [i915]
<4>[  673.484305]  intel_engine_breadcrumbs_irq+0x3d0/0x5a0 [i915]
<4>[  673.484305]  cs_irq_handler+0x39/0x50 [i915]
<4>[  673.484305]  gen11_gt_irq_handler+0x17b/0x280 [i915]
<4>[  673.484305]  gen11_irq_handler+0x54/0xf0 [i915]
<4>[  673.484305]  __handle_irq_event_percpu+0x41/0x2c0
<4>[  673.484305]  handle_irq_event_percpu+0x2b/0x70
<4>[  673.484305]  handle_irq_event+0x2f/0x50
<4>[  673.484305]  handle_edge_irq+0x99/0x1b0
<4>[  673.484305]  do_IRQ+0x7e/0x160
<4>[  673.484305]  common_interrupt+0xf/0xf
<4>[  673.484305]  </IRQ>

Signed-off-by: Chris Wilson <[email protected]>
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
sys-oak pushed a commit that referenced this issue Oct 22, 2019
…tion

fi->f is set in f_midi's alloc_func, need to clean this to
NULL in free_func, otherwise on ConfigFS's function switch,
midi->usb_function it self is freed, fi->f will be a wild
pointer and run into below kernel panic:
---------------
[   58.950628] Unable to handle kernel paging request at virtual address 63697664
[   58.957869] pgd = c0004000
[   58.960583] [63697664] *pgd=00000000
[   58.964185] Internal error: Oops: 80000005 [#1] PREEMPT SMP ARM
[   58.970111] Modules linked in:
[   58.973191] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.15-03504-g34c857c-dirty #89
[   58.981024] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   58.987557] task: c110bd70 ti: c1100000 task.ti: c1100000
[   58.992962] PC is at 0x63697664
[   58.996120] LR is at android_setup+0x78/0x138
<..snip..>
[   60.044980] 1fc0: ffffffff ffffffff c1000684 00000000 00000000 c108ecd0 c11f7294 c11039c0
[   60.053181] 1fe0: c108eccc c110d148 1000406a 412fc09a 00000000 1000807c 00000000 00000000
[   60.061420] [<c073b1fc>] (android_setup) from [<c0730490>] (udc_irq+0x758/0x1034)
[   60.068951] [<c0730490>] (udc_irq) from [<c017c650>] (handle_irq_event_percpu+0x50/0x254)
[   60.077165] [<c017c650>] (handle_irq_event_percpu) from [<c017c890>] (handle_irq_event+0x3c/0x5c)
[   60.086072] [<c017c890>] (handle_irq_event) from [<c017f3ec>] (handle_fasteoi_irq+0xe0/0x198)
[   60.094630] [<c017f3ec>] (handle_fasteoi_irq) from [<c017bcfc>] (generic_handle_irq+0x2c/0x3c)
[   60.103271] [<c017bcfc>] (generic_handle_irq) from [<c017bfb8>] (__handle_domain_irq+0x7c/0xec)
[   60.112000] [<c017bfb8>] (__handle_domain_irq) from [<c0101450>] (gic_handle_irq+0x24/0x5c)
--------------

Bug: 111003288
Bug: 120441124
Change-Id: Iaf7cc809379c184048a2d9ab1f59e71ebaa3416e
Signed-off-by: Winter Wang <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 22, 2019
* Set reboot target in NVRAM
	- register to reboot event
	- create CDATA ABL code following reboot target

* Set capsule parameters in NVRAM
	- accessible through /sys/kernel/capsule/capsule_name
	- same syntax as "nvram" ABL command
	- e.g. m1:capsule.bin
		- m => emmc
		- 1 => partition #1
		- capsule.bin => file name

* Notify on capsule request
	- accessible through /sys/kernel/capsule/capsule_requested
		- 0 : no capsule requested
		- 1 : capsule requested (triggered through
			/sys/kernel/capsule/capsule_name access)

Signed-off-by: Guillaume Betous <[email protected]>
Signed-off-by: Tian, Baofeng
sys-oak pushed a commit that referenced this issue Oct 22, 2019
* Set reboot target in NVRAM
    - register to reboot event
    - create CDATA SBL code following reboot target

* Set capsule parameters in NVRAM
    - accessible through /sys/kernel/capsule/capsule_name
    - same syntax as "nvram" SBL command
    - e.g. m1:capsule.bin
            - m => emmc
            - 1 => partition #1
            - capsule.bin => file name

* Notify on capsule request
    - accessible through /sys/kernel/capsule/capsule_requested
            - 0 : no capsule requested
            - 1 : capsule requested (triggered through
                    /sys/kernel/capsule/capsule_name access)

Signed-off-by: Zhou, JianfengX <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 28, 2019
syzbot got a NULL dereference in bond_update_slave_arr() [1],
happening after a failure to allocate bond->slave_arr

A workqueue (bond_slave_arr_handler) is supposed to retry
the allocation later, but if the slave is removed before
the workqueue had a chance to complete, bond->slave_arr
can still be NULL.

[1]

Failed to build slave-array.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
Modules linked in:
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:bond_update_slave_arr.cold+0xc6/0x198 drivers/net/bonding/bond_main.c:4039
RSP: 0018:ffff88018fe33678 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc9000290b000
RDX: 0000000000000000 RSI: ffffffff82b63037 RDI: ffff88019745ea20
RBP: ffff88018fe33760 R08: ffff880170754280 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88019745ea00 R14: 0000000000000000 R15: ffff88018fe338b0
FS:  00007febd837d700(0000) GS:ffff8801dad00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004540a0 CR3: 00000001c242e005 CR4: 00000000001626f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 [<ffffffff82b5b45e>] __bond_release_one+0x43e/0x500 drivers/net/bonding/bond_main.c:1923
 [<ffffffff82b5b966>] bond_release drivers/net/bonding/bond_main.c:2039 [inline]
 [<ffffffff82b5b966>] bond_do_ioctl+0x416/0x870 drivers/net/bonding/bond_main.c:3562
 [<ffffffff83ae25f4>] dev_ifsioc+0x6f4/0x940 net/core/dev_ioctl.c:328
 [<ffffffff83ae2e58>] dev_ioctl+0x1b8/0xc70 net/core/dev_ioctl.c:495
 [<ffffffff83995ffd>] sock_do_ioctl+0x1bd/0x300 net/socket.c:1088
 [<ffffffff83996a80>] sock_ioctl+0x300/0x5d0 net/socket.c:1196
 [<ffffffff81b124db>] vfs_ioctl fs/ioctl.c:47 [inline]
 [<ffffffff81b124db>] file_ioctl fs/ioctl.c:501 [inline]
 [<ffffffff81b124db>] do_vfs_ioctl+0xacb/0x1300 fs/ioctl.c:688
 [<ffffffff81b12dc6>] SYSC_ioctl fs/ioctl.c:705 [inline]
 [<ffffffff81b12dc6>] SyS_ioctl+0xb6/0xe0 fs/ioctl.c:696
 [<ffffffff8101ccc8>] do_syscall_64+0x528/0x770 arch/x86/entry/common.c:305
 [<ffffffff84400091>] entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: ee63771 ("bonding: Simplify the xmit function for modes that use xmit_hash")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Cc: Mahesh Bandewar <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 28, 2019
A long time ago we fixed a similar deadlock in show_slab_objects() [1].
However, it is apparently due to the commits like 01fb58b ("slab:
remove synchronous synchronize_sched() from memcg cache deactivation
path") and 03afc0e ("slab: get_online_mems for
kmem_cache_{create,destroy,shrink}"), this kind of deadlock is back by
just reading files in /sys/kernel/slab which will generate a lockdep
splat below.

Since the "mem_hotplug_lock" here is only to obtain a stable online node
mask while racing with NUMA node hotplug, in the worst case, the results
may me miscalculated while doing NUMA node hotplug, but they shall be
corrected by later reads of the same files.

  WARNING: possible circular locking dependency detected
  ------------------------------------------------------
  cat/5224 is trying to acquire lock:
  ffff900012ac3120 (mem_hotplug_lock.rw_sem){++++}, at:
  show_slab_objects+0x94/0x3a8

  but task is already holding lock:
  b8ff009693eee398 (kn->count#45){++++}, at: kernfs_seq_start+0x44/0xf0

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (kn->count#45){++++}:
         lock_acquire+0x31c/0x360
         __kernfs_remove+0x290/0x490
         kernfs_remove+0x30/0x44
         sysfs_remove_dir+0x70/0x88
         kobject_del+0x50/0xb0
         sysfs_slab_unlink+0x2c/0x38
         shutdown_cache+0xa0/0xf0
         kmemcg_cache_shutdown_fn+0x1c/0x34
         kmemcg_workfn+0x44/0x64
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #1 (slab_mutex){+.+.}:
         lock_acquire+0x31c/0x360
         __mutex_lock_common+0x16c/0xf78
         mutex_lock_nested+0x40/0x50
         memcg_create_kmem_cache+0x38/0x16c
         memcg_kmem_cache_create_func+0x3c/0x70
         process_one_work+0x4f4/0x950
         worker_thread+0x390/0x4bc
         kthread+0x1cc/0x1e8
         ret_from_fork+0x10/0x18

  -> #0 (mem_hotplug_lock.rw_sem){++++}:
         validate_chain+0xd10/0x2bcc
         __lock_acquire+0x7f4/0xb8c
         lock_acquire+0x31c/0x360
         get_online_mems+0x54/0x150
         show_slab_objects+0x94/0x3a8
         total_objects_show+0x28/0x34
         slab_attr_show+0x38/0x54
         sysfs_kf_seq_show+0x198/0x2d4
         kernfs_seq_show+0xa4/0xcc
         seq_read+0x30c/0x8a8
         kernfs_fop_read+0xa8/0x314
         __vfs_read+0x88/0x20c
         vfs_read+0xd8/0x10c
         ksys_read+0xb0/0x120
         __arm64_sys_read+0x54/0x88
         el0_svc_handler+0x170/0x240
         el0_svc+0x8/0xc

  other info that might help us debug this:

  Chain exists of:
    mem_hotplug_lock.rw_sem --> slab_mutex --> kn->count#45

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(kn->count#45);
                                 lock(slab_mutex);
                                 lock(kn->count#45);
    lock(mem_hotplug_lock.rw_sem);

   *** DEADLOCK ***

  3 locks held by cat/5224:
   #0: 9eff00095b14b2a0 (&p->lock){+.+.}, at: seq_read+0x4c/0x8a8
   #1: 0eff008997041480 (&of->mutex){+.+.}, at: kernfs_seq_start+0x34/0xf0
   #2: b8ff009693eee398 (kn->count#45){++++}, at:
  kernfs_seq_start+0x44/0xf0

  stack backtrace:
  Call trace:
   dump_backtrace+0x0/0x248
   show_stack+0x20/0x2c
   dump_stack+0xd0/0x140
   print_circular_bug+0x368/0x380
   check_noncircular+0x248/0x250
   validate_chain+0xd10/0x2bcc
   __lock_acquire+0x7f4/0xb8c
   lock_acquire+0x31c/0x360
   get_online_mems+0x54/0x150
   show_slab_objects+0x94/0x3a8
   total_objects_show+0x28/0x34
   slab_attr_show+0x38/0x54
   sysfs_kf_seq_show+0x198/0x2d4
   kernfs_seq_show+0xa4/0xcc
   seq_read+0x30c/0x8a8
   kernfs_fop_read+0xa8/0x314
   __vfs_read+0x88/0x20c
   vfs_read+0xd8/0x10c
   ksys_read+0xb0/0x120
   __arm64_sys_read+0x54/0x88
   el0_svc_handler+0x170/0x240
   el0_svc+0x8/0xc

I think it is important to mention that this doesn't expose the
show_slab_objects to use-after-free.  There is only a single path that
might really race here and that is the slab hotplug notifier callback
__kmem_cache_shrink (via slab_mem_going_offline_callback) but that path
doesn't really destroy kmem_cache_node data structures.

[1] http://lkml.iu.edu/hypermail/linux/kernel/1101.0/02850.html

[[email protected]: add comment explaining why we don't need mem_hotplug_lock]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 01fb58b ("slab: remove synchronous synchronize_sched() from memcg cache deactivation path")
Fixes: 03afc0e ("slab: get_online_mems for kmem_cache_{create,destroy,shrink}")
Signed-off-by: Qian Cai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 28, 2019
Daniel Vetter uncovered a nasty cycle in using the mmu-notifiers to
invalidate userptr objects which also happen to be pulled into GGTT
mmaps. That is when we unbind the userptr object (on mmu invalidation),
we revoke all CPU mmaps, which may then recurse into mmu invalidation.

We looked for ways of breaking the cycle, but the revocation on
invalidation is required and cannot be avoided. The only solution we
could see was to not allow such GGTT bindings of userptr objects in the
first place. In practice, no one really wants to use a GGTT mmapping of
a CPU pointer...

Just before Daniel's explosive lockdep patches land in v5.4-rc1, we got
a genuine blip from CI:

<4>[  246.793958] ======================================================
<4>[  246.793972] WARNING: possible circular locking dependency detected
<4>[  246.793989] 5.3.0-gbd6c56f50d15-drmtip_372+ #1 Tainted: G     U
<4>[  246.794003] ------------------------------------------------------
<4>[  246.794017] kswapd0/145 is trying to acquire lock:
<4>[  246.794030] 000000003f565be6 (&dev->struct_mutex/1){+.+.}, at: userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[  246.794250]
                  but task is already holding lock:
<4>[  246.794263] 000000001799cef9 (&anon_vma->rwsem){++++}, at: page_lock_anon_vma_read+0xe6/0x2a0
<4>[  246.794291]
                  which lock already depends on the new lock.

<4>[  246.794307]
                  the existing dependency chain (in reverse order) is:
<4>[  246.794322]
                  -> #3 (&anon_vma->rwsem){++++}:
<4>[  246.794344]        down_write+0x33/0x70
<4>[  246.794357]        __vma_adjust+0x3d9/0x7b0
<4>[  246.794370]        __split_vma+0x16a/0x180
<4>[  246.794385]        mprotect_fixup+0x2a5/0x320
<4>[  246.794399]        do_mprotect_pkey+0x208/0x2e0
<4>[  246.794413]        __x64_sys_mprotect+0x16/0x20
<4>[  246.794429]        do_syscall_64+0x55/0x1c0
<4>[  246.794443]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794456]
                  -> #2 (&mapping->i_mmap_rwsem){++++}:
<4>[  246.794478]        down_write+0x33/0x70
<4>[  246.794493]        unmap_mapping_pages+0x48/0x130
<4>[  246.794519]        i915_vma_revoke_mmap+0x81/0x1b0 [i915]
<4>[  246.794519]        i915_vma_unbind+0x11d/0x4a0 [i915]
<4>[  246.794519]        i915_vma_destroy+0x31/0x300 [i915]
<4>[  246.794519]        __i915_gem_free_objects+0xb8/0x4b0 [i915]
<4>[  246.794519]        drm_file_free.part.0+0x1e6/0x290
<4>[  246.794519]        drm_release+0xa6/0xe0
<4>[  246.794519]        __fput+0xc2/0x250
<4>[  246.794519]        task_work_run+0x82/0xb0
<4>[  246.794519]        do_exit+0x35b/0xdb0
<4>[  246.794519]        do_group_exit+0x34/0xb0
<4>[  246.794519]        __x64_sys_exit_group+0xf/0x10
<4>[  246.794519]        do_syscall_64+0x55/0x1c0
<4>[  246.794519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794519]
                  -> #1 (&vm->mutex){+.+.}:
<4>[  246.794519]        i915_gem_shrinker_taints_mutex+0x6d/0xe0 [i915]
<4>[  246.794519]        i915_address_space_init+0x9f/0x160 [i915]
<4>[  246.794519]        i915_ggtt_init_hw+0x55/0x170 [i915]
<4>[  246.794519]        i915_driver_probe+0xc9f/0x1620 [i915]
<4>[  246.794519]        i915_pci_probe+0x43/0x1b0 [i915]
<4>[  246.794519]        pci_device_probe+0x9e/0x120
<4>[  246.794519]        really_probe+0xea/0x3d0
<4>[  246.794519]        driver_probe_device+0x10b/0x120
<4>[  246.794519]        device_driver_attach+0x4a/0x50
<4>[  246.794519]        __driver_attach+0x97/0x130
<4>[  246.794519]        bus_for_each_dev+0x74/0xc0
<4>[  246.794519]        bus_add_driver+0x13f/0x210
<4>[  246.794519]        driver_register+0x56/0xe0
<4>[  246.794519]        do_one_initcall+0x58/0x300
<4>[  246.794519]        do_init_module+0x56/0x1f6
<4>[  246.794519]        load_module+0x25bd/0x2a40
<4>[  246.794519]        __se_sys_finit_module+0xd3/0xf0
<4>[  246.794519]        do_syscall_64+0x55/0x1c0
<4>[  246.794519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  246.794519]
                  -> #0 (&dev->struct_mutex/1){+.+.}:
<4>[  246.794519]        __lock_acquire+0x15d8/0x1e90
<4>[  246.794519]        lock_acquire+0xa6/0x1c0
<4>[  246.794519]        __mutex_lock+0x9d/0x9b0
<4>[  246.794519]        userptr_mn_invalidate_range_start+0x18f/0x220 [i915]
<4>[  246.794519]        __mmu_notifier_invalidate_range_start+0x85/0x110
<4>[  246.794519]        try_to_unmap_one+0x76b/0x860
<4>[  246.794519]        rmap_walk_anon+0x104/0x280
<4>[  246.794519]        try_to_unmap+0xc0/0xf0
<4>[  246.794519]        shrink_page_list+0x561/0xc10
<4>[  246.794519]        shrink_inactive_list+0x220/0x440
<4>[  246.794519]        shrink_node_memcg+0x36e/0x740
<4>[  246.794519]        shrink_node+0xcb/0x490
<4>[  246.794519]        balance_pgdat+0x241/0x580
<4>[  246.794519]        kswapd+0x16c/0x530
<4>[  246.794519]        kthread+0x119/0x130
<4>[  246.794519]        ret_from_fork+0x24/0x50
<4>[  246.794519]
                  other info that might help us debug this:

<4>[  246.794519] Chain exists of:
                    &dev->struct_mutex/1 --> &mapping->i_mmap_rwsem --> &anon_vma->rwsem

<4>[  246.794519]  Possible unsafe locking scenario:

<4>[  246.794519]        CPU0                    CPU1
<4>[  246.794519]        ----                    ----
<4>[  246.794519]   lock(&anon_vma->rwsem);
<4>[  246.794519]                                lock(&mapping->i_mmap_rwsem);
<4>[  246.794519]                                lock(&anon_vma->rwsem);
<4>[  246.794519]   lock(&dev->struct_mutex/1);
<4>[  246.794519]
                   *** DEADLOCK ***

v2: Say no to mmap_ioctl

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111744
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111870
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: [email protected]
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit a431174)
Signed-off-by: Rodrigo Vivi <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 28, 2019
Both multi_cpu_stop() and set_state() access multi_stop_data::state
racily using plain accesses. These are subject to compiler
transformations which could break the intended behaviour of the code,
and this situation is detected by KCSAN on both arm64 and x86 (splats
below).

Improve matters by using READ_ONCE() and WRITE_ONCE() to ensure that the
compiler cannot elide, replay, or tear loads and stores.

In multi_cpu_stop() the two loads of multi_stop_data::state are expected to
be a consistent value, so snapshot the value into a temporary variable to
ensure this.

The state transitions are serialized by atomic manipulation of
multi_stop_data::num_threads, and other fields in multi_stop_data are not
modified while subject to concurrent reads.

KCSAN splat on arm64:

| BUG: KCSAN: data-race in multi_cpu_stop+0xa8/0x198 and set_state+0x80/0xb0
|
| write to 0xffff00001003bd00 of 4 bytes by task 24 on cpu 3:
|  set_state+0x80/0xb0
|  multi_cpu_stop+0x16c/0x198
|  cpu_stopper_thread+0x170/0x298
|  smpboot_thread_fn+0x40c/0x560
|  kthread+0x1a8/0x1b0
|  ret_from_fork+0x10/0x18
|
| read to 0xffff00001003bd00 of 4 bytes by task 14 on cpu 1:
|  multi_cpu_stop+0xa8/0x198
|  cpu_stopper_thread+0x170/0x298
|  smpboot_thread_fn+0x40c/0x560
|  kthread+0x1a8/0x1b0
|  ret_from_fork+0x10/0x18
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.3.0-00007-g67ab35a199f4-dirty #3
| Hardware name: linux,dummy-virt (DT)

KCSAN splat on x86:

| write to 0xffffb0bac0013e18 of 4 bytes by task 19 on cpu 2:
|  set_state kernel/stop_machine.c:170 [inline]
|  ack_state kernel/stop_machine.c:177 [inline]
|  multi_cpu_stop+0x1a4/0x220 kernel/stop_machine.c:227
|  cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
|  smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
|  kthread+0x1b5/0x200 kernel/kthread.c:255
|  ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
|
| read to 0xffffb0bac0013e18 of 4 bytes by task 44 on cpu 7:
|  multi_cpu_stop+0xb4/0x220 kernel/stop_machine.c:213
|  cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
|  smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
|  kthread+0x1b5/0x200 kernel/kthread.c:255
|  ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 7 PID: 44 Comm: migration/7 Not tainted 5.3.0+ #1
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014

Signed-off-by: Mark Rutland <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Marco Elver <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
sys-oak pushed a commit that referenced this issue Oct 28, 2019
There are three places where we access uninitialized memmaps, namely:
- /proc/kpagecount
- /proc/kpageflags
- /proc/kpagecgroup

We have initialized memmaps either when the section is online or when the
page was initialized to the ZONE_DEVICE.  Uninitialized memmaps contain
garbage and in the worst case trigger kernel BUGs, especially with
CONFIG_PAGE_POISONING.

For example, not onlining a DIMM during boot and calling /proc/kpagecount
with CONFIG_PAGE_POISONING:

  :/# cat /proc/kpagecount > tmp.test
  BUG: unable to handle page fault for address: fffffffffffffffe
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 114616067 P4D 114616067 PUD 114618067 PMD 0
  Oops: 0000 [#1] SMP NOPTI
  CPU: 0 PID: 469 Comm: cat Not tainted 5.4.0-rc1-next-20191004+ #11
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
  RIP: 0010:kpagecount_read+0xce/0x1e0
  Code: e8 09 83 e0 3f 48 0f a3 02 73 2d 4c 89 e7 48 c1 e7 06 48 03 3d ab 51 01 01 74 1d 48 8b 57 08 480
  RSP: 0018:ffffa14e409b7e78 EFLAGS: 00010202
  RAX: fffffffffffffffe RBX: 0000000000020000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 00007f76b5595000 RDI: fffff35645000000
  RBP: 00007f76b5595000 R08: 0000000000000001 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
  R13: 0000000000020000 R14: 00007f76b5595000 R15: ffffa14e409b7f08
  FS:  00007f76b577d580(0000) GS:ffff8f41bd400000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: fffffffffffffffe CR3: 0000000078960000 CR4: 00000000000006f0
  Call Trace:
   proc_reg_read+0x3c/0x60
   vfs_read+0xc5/0x180
   ksys_read+0x68/0xe0
   do_syscall_64+0x5c/0xa0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

For now, let's drop support for ZONE_DEVICE from the three pseudo files
in order to fix this.  To distinguish offline memory (with garbage
memmap) from ZONE_DEVICE memory with properly initialized memmaps, we
would have to check get_dev_pagemap() and pfn_zone_device_reserved()
right now.  The usage of both (especially, special casing devmem) is
frowned upon and needs to be reworked.

The fundamental issue we have is:

	if (pfn_to_online_page(pfn)) {
		/* memmap initialized */
	} else if (pfn_valid(pfn)) {
		/*
		 * ???
		 * a) offline memory. memmap garbage.
		 * b) devmem: memmap initialized to ZONE_DEVICE.
		 * c) devmem: reserved for driver. memmap garbage.
		 * (d) devmem: memmap currently initializing - garbage)
		 */
	}

We'll leave the pfn_zone_device_reserved() check in stable_page_flags()
in place as that function is also used from memory failure.  We now no
longer dump information about pages that are not in use anymore -
offline.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: f1dd2cd ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e]
Signed-off-by: David Hildenbrand <[email protected]>
Reported-by: Qian Cai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Toshiki Fukasawa <[email protected]>
Cc: Pankaj gupta <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Anthony Yznaga <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: <[email protected]>	[4.13+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 28, 2019
…pagetypeinfo

Uninitialized memmaps contain garbage and in the worst case trigger
kernel BUGs, especially with CONFIG_PAGE_POISONING.  They should not get
touched.

For example, when not onlining a memory block that is spanned by a zone
and reading /proc/pagetypeinfo with CONFIG_DEBUG_VM_PGFLAGS and
CONFIG_PAGE_POISONING, we can trigger a kernel BUG:

  :/# echo 1 > /sys/devices/system/memory/memory40/online
  :/# echo 1 > /sys/devices/system/memory/memory42/online
  :/# cat /proc/pagetypeinfo > test.file
   page:fffff2c585200000 is uninitialized and poisoned
   raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
   raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
   page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
   There is not page extension available.
   ------------[ cut here ]------------
   kernel BUG at include/linux/mm.h:1107!
   invalid opcode: 0000 [#1] SMP NOPTI

Please note that this change does not affect ZONE_DEVICE, because
pagetypeinfo_showmixedcount_print() is called from
mm/vmstat.c:pagetypeinfo_showmixedcount() only for populated zones, and
ZONE_DEVICE is never populated (zone->present_pages always 0).

[[email protected]: move check to outer loop, add comment, rephrase description]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: f1dd2cd ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visible after d0dc12e
Signed-off-by: Qian Cai <[email protected]>
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: Miles Chen <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: <[email protected]>	[4.13+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 28, 2019
…pointer release

Karsten reported the following panic in __free_slab() happening on a s390x
machine:

  Unable to handle kernel pointer dereference in virtual kernel address space
  Failing address: 0000000000000000 TEID: 0000000000000483
  Fault in home space mode while using kernel ASCE.
  AS:00000000017d4007 R3:000000007fbd0007 S:000000007fbff000 P:000000000000003d
  Oops: 0004 ilc:3 Ý#1¨ PREEMPT SMP
  Modules linked in: tcp_diag inet_diag xt_tcpudp ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_at nf_nat
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-05872-g6133e3e4bada-dirty #14
  Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
  Krnl PSW : 0704d00180000000 00000000003cadb6 (__free_slab+0x686/0x6b0)
             R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
  Krnl GPRS: 00000000f3a32928 0000000000000000 000000007fbf5d00 000000000117c4b8
             0000000000000000 000000009e3291c1 0000000000000000 0000000000000000
             0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
             0000000000000003 0000000000000008 000000002b478b00 000003d080a97600
             000000000117ba00 000003e000057db0 00000000003cabcc 000003e000057c78
  Krnl Code: 00000000003cada6: e310a1400004        lg      %r1,320(%r10)
             00000000003cadac: c0e50046c286        brasl   %r14,ca32b8
            #00000000003cadb2: a7f4fe36            brc     15,3caa1e
            >00000000003cadb6: e32060800024        stg     %r2,128(%r6)
             00000000003cadbc: a7f4fd9e            brc     15,3ca8f8
             00000000003cadc0: c0e50046790c        brasl   %r14,c99fd8
             00000000003cadc6: a7f4fe2c            brc     15,3caa
             00000000003cadc6: a7f4fe2c            brc     15,3caa1e
             00000000003cadca: ecb1ffff00d9        aghik   %r11,%r1,-1
  Call Trace:
  (<00000000003cabcc> __free_slab+0x49c/0x6b0)
   <00000000001f5886> rcu_core+0x5a6/0x7e0
   <0000000000ca2dea> __do_softirq+0xf2/0x5c0
   <0000000000152644> irq_exit+0x104/0x130
   <000000000010d222> do_IRQ+0x9a/0xf0
   <0000000000ca2344> ext_int_handler+0x130/0x134
   <0000000000103648> enabled_wait+0x58/0x128
  (<0000000000103634> enabled_wait+0x44/0x128)
   <0000000000103b00> arch_cpu_idle+0x40/0x58
   <0000000000ca0544> default_idle_call+0x3c/0x68
   <000000000018eaa4> do_idle+0xec/0x1c0
   <000000000018ee0e> cpu_startup_entry+0x36/0x40
   <000000000122df34> arch_call_rest_init+0x5c/0x88
   <0000000000000000> 0x0
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   <00000000003ca8f4> __free_slab+0x1c4/0x6b0
  Kernel panic - not syncing: Fatal exception in interrupt

The kernel panics on an attempt to dereference the NULL memcg pointer.
When shutdown_cache() is called from the kmem_cache_destroy() context, a
memcg kmem_cache might have empty slab pages in a partial list, which are
still charged to the memory cgroup.

These pages are released by free_partial() at the beginning of
shutdown_cache(): either directly or by scheduling a RCU-delayed work
(if the kmem_cache has the SLAB_TYPESAFE_BY_RCU flag).  The latter case
is when the reported panic can happen: memcg_unlink_cache() is called
immediately after shrinking partial lists, without waiting for scheduled
RCU works.  It sets the kmem_cache->memcg_params.memcg pointer to NULL,
and the following attempt to dereference it by __free_slab() from the
RCU work context causes the panic.

To fix the issue, let's postpone the release of the memcg pointer to
destroy_memcg_params().  It's called from a separate work context by
slab_caches_to_rcu_destroy_workfn(), which contains a full RCU barrier.
This guarantees that all scheduled page release RCU works will complete
before the memcg pointer will be zeroed.

Big thanks for Karsten for the perfect report containing all necessary
information, his help with the analysis of the problem and testing of the
fix.

Link: http://lkml.kernel.org/r/[email protected]
Fixes: fb2f2b0 ("mm: memcg/slab: reparent memcg kmem_caches on cgroup removal")
Signed-off-by: Roman Gushchin <[email protected]>
Reported-by: Karsten Graul <[email protected]>
Tested-by: Karsten Graul <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Cc: Karsten Graul <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 28, 2019
mount.ocfs2 failed when reading ocfs2 filesystem superblock encounters
an error.  ocfs2_initialize_super() returns before allocating ocfs2_wq.
ocfs2_dismount_volume() triggers the following panic.

  Oct 15 16:09:27 cnwarekv-205120 kernel: On-disk corruption discovered.Please run fsck.ocfs2 once the filesystem is unmounted.
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_read_locked_inode:537 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:458 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_init_global_system_inodes:491 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_initialize_super:2313 ERROR: status = -30
  Oct 15 16:09:27 cnwarekv-205120 kernel: (mount.ocfs2,22804,44): ocfs2_fill_super:1033 ERROR: status = -30
  ------------[ cut here ]------------
  Oops: 0002 [#1] SMP NOPTI
  CPU: 1 PID: 11753 Comm: mount.ocfs2 Tainted: G  E
        4.14.148-200.ckv.x86_64 #1
  Hardware name: Sugon H320-G30/35N16-US, BIOS 0SSDX017 12/21/2018
  task: ffff967af0520000 task.stack: ffffa5f05484000
  RIP: 0010:mutex_lock+0x19/0x20
  Call Trace:
    flush_workqueue+0x81/0x460
    ocfs2_shutdown_local_alloc+0x47/0x440 [ocfs2]
    ocfs2_dismount_volume+0x84/0x400 [ocfs2]
    ocfs2_fill_super+0xa4/0x1270 [ocfs2]
    ? ocfs2_initialize_super.isa.211+0xf20/0xf20 [ocfs2]
    mount_bdev+0x17f/0x1c0
    mount_fs+0x3a/0x160

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yi Li <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 28, 2019
Definitions for idisp snd_soc_dai_links within skl_hda_dsp_common are
missing platform component. Add it to address following bug reported by
KASAN:

[   10.538502] BUG: KASAN: global-out-of-bounds in skl_hda_audio_probe+0x13a/0x2b0 [snd_soc_skl_hda_dsp]
[   10.538509] Write of size 8 at addr ffffffffc0606840 by task systemd-udevd/299

[   10.538516] CPU: 3 PID: 299 Comm: systemd-udevd Not tainted 5.3.0-rc2chromedev1.0.115+ #1
[   10.538518] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake U DDR4 RVP, BIOS CNLSFWR1.R00.X113.B00.1711090348 11/09/2017
[   10.538519] Call Trace:
[   10.538524]  dump_stack+0x62/0x95
[   10.538528]  print_address_description+0x2f5/0x3b0
[   10.538532]  ? skl_hda_audio_probe+0x13a/0x2b0 [snd_soc_skl_hda_dsp]
[   10.538535]  __kasan_report+0x134/0x191
[   10.538538]  ? skl_hda_audio_probe+0x13a/0x2b0 [snd_soc_skl_hda_dsp]
[   10.538542]  ? skl_hda_audio_probe+0x13a/0x2b0 [snd_soc_skl_hda_dsp]
[   10.538544]  kasan_report+0x12/0x20
[   10.538546]  __asan_store8+0x57/0x90
[   10.538550]  skl_hda_audio_probe+0x13a/0x2b0 [snd_soc_skl_hda_dsp]
[   10.538553]  platform_drv_probe+0x51/0xb0
[   10.538556]  really_probe+0x311/0x600
[   10.538559]  driver_probe_device+0x87/0x1b0
[   10.538562]  device_driver_attach+0x8f/0xa0
[   10.538565]  ? device_driver_attach+0xa0/0xa0
[   10.538567]  __driver_attach+0x102/0x1a0
[   10.538569]  ? device_driver_attach+0xa0/0xa0
[   10.538572]  bus_for_each_dev+0xe8/0x160
[   10.538574]  ? subsys_dev_iter_exit+0x10/0x10
[   10.538577]  ? preempt_count_sub+0x18/0xc0
[   10.538580]  ? _raw_write_unlock+0x1f/0x40
[   10.538582]  driver_attach+0x2b/0x30
[   10.538585]  bus_add_driver+0x251/0x340
[   10.538588]  driver_register+0xd3/0x1c0
[   10.538590]  __platform_driver_register+0x6c/0x80
[   10.538592]  ? 0xffffffffc03e8000
[   10.538595]  skl_hda_audio_init+0x1c/0x1000 [snd_soc_skl_hda_dsp]
[   10.538598]  do_one_initcall+0xd0/0x36a
[   10.538600]  ? trace_event_raw_event_initcall_finish+0x160/0x160
[   10.538602]  ? kasan_unpoison_shadow+0x36/0x50
[   10.538605]  ? __kasan_kmalloc+0xcc/0xe0
[   10.538607]  ? kasan_unpoison_shadow+0x36/0x50
[   10.538609]  ? kasan_poison_shadow+0x2f/0x40
[   10.538612]  ? __asan_register_globals+0x65/0x80
[   10.538615]  do_init_module+0xf9/0x36f
[   10.538619]  load_module+0x398e/0x4590
[   10.538625]  ? module_frob_arch_sections+0x20/0x20
[   10.538628]  ? __kasan_check_write+0x14/0x20
[   10.538630]  ? kernel_read+0x9a/0xc0
[   10.538632]  ? __kasan_check_write+0x14/0x20
[   10.538634]  ? kernel_read_file+0x1d3/0x3c0
[   10.538638]  ? cap_capable+0xca/0x110
[   10.538642]  __do_sys_finit_module+0x190/0x1d0
[   10.538644]  ? __do_sys_finit_module+0x190/0x1d0
[   10.538646]  ? __x64_sys_init_module+0x50/0x50
[   10.538649]  ? expand_files+0x380/0x380
[   10.538652]  ? __kasan_check_write+0x14/0x20
[   10.538654]  ? fput_many+0x20/0xc0
[   10.538658]  __x64_sys_finit_module+0x43/0x50
[   10.538660]  do_syscall_64+0xce/0x700
[   10.538662]  ? syscall_return_slowpath+0x230/0x230
[   10.538665]  ? __do_page_fault+0x51e/0x640
[   10.538668]  ? __kasan_check_read+0x11/0x20
[   10.538670]  ? prepare_exit_to_usermode+0xc7/0x200
[   10.538673]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Change-Id: Ia21db5aba2c8329ba0a5b25097e037306b200422
Fixes: a78959f ("ASoC: Intel: skl_hda_dsp_common: use modern dai_link style")
Signed-off-by: Cezary Rojewski <[email protected]>
Reviewed-on: 
Tested-by: gkblditp <[email protected]>
sys-oak pushed a commit that referenced this issue Oct 28, 2019
Ensure that the drm_i915_file_private continues to exist as we attempt
to remove a request from its list, which may race with the destruction
of the file.

<6> [38.380714] [IGT] gem_ctx_create: starting subtest basic-files
<0> [42.201329] BUG: spinlock bad magic on CPU#0, kworker/u16:0/7
<4> [42.201356] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4> [42.201371] CPU: 0 PID: 7 Comm: kworker/u16:0 Tainted: G     U            5.3.0-rc5-CI-Patchwork_14169+ #1
<4> [42.201391] Hardware name: Dell Inc.                 OptiPlex 745                 /0GW726, BIOS 2.3.1  05/21/2007
<4> [42.201594] Workqueue: i915 retire_work_handler [i915]
<4> [42.201614] RIP: 0010:spin_dump+0x5a/0x90
<4> [42.201625] Code: 00 48 8d 88 c0 06 00 00 48 c7 c7 00 71 09 82 e8 35 ef 00 00 48 85 db 44 8b 4d 08 41 b8 ff ff ff ff 48 c7 c1 0b cd 0f 82 74 0e <44> 8b 83 e0 04 00 00 48 8d 8b c0 06 00 00 8b 55 04 48 89 ee 48 c7
<4> [42.201660] RSP: 0018:ffffc9000004bd80 EFLAGS: 00010202
<4> [42.201673] RAX: 0000000000000031 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffff820fcd0b
<4> [42.201688] RDX: 0000000000000000 RSI: ffff88803de266f8 RDI: 00000000ffffffff
<4> [42.201703] RBP: ffff888038381ff8 R08: 00000000ffffffff R09: 000000006b6b6b6b
<4> [42.201718] R10: 0000000041cb0b89 R11: 646162206b636f6c R12: ffff88802a618500
<4> [42.201733] R13: ffff88802b32c288 R14: ffff888038381ff8 R15: ffff88802b32c250
<4> [42.201748] FS:  0000000000000000(0000) GS:ffff88803de00000(0000) knlGS:0000000000000000
<4> [42.201765] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [42.201778] CR2: 00007f2cefc6d180 CR3: 00000000381ee000 CR4: 00000000000006f0
<4> [42.201793] Call Trace:
<4> [42.201805]  do_raw_spin_lock+0x66/0xb0
<4> [42.201898]  i915_request_retire+0x548/0x7c0 [i915]
<4> [42.201989]  retire_requests+0x4d/0x60 [i915]
<4> [42.202078]  i915_retire_requests+0x144/0x2e0 [i915]
<4> [42.202169]  retire_work_handler+0x10/0x40 [i915]

Recently, in commit 44c22f3 ("drm/i915: Serialize insertion into the
file->mm.request_list"), we fixed a race on insertion. Now, it appears
we also have a race with destruction!

Signed-off-by: Chris Wilson <[email protected]>
Cc: Matthew Auld 
Reviewed-by: Matthew Auld 
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
sys-oak pushed a commit that referenced this issue Oct 28, 2019
If we make sure we grab a strong reference to each object as we dump it,
we can reduce the locks outside of our iterators to an rcu_read_lock.

This should prevent errors like:
[ 2138.371911] BUG: KASAN: use-after-free in per_file_stats+0x43/0x380 [i915]
[ 2138.371924] Read of size 8 at addr ffff888223651000 by task cat/8293

[ 2138.371947] CPU: 0 PID: 8293 Comm: cat Not tainted 5.3.0-rc6-CI-Custom_4352+ #1
[ 2138.371953] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./J4205-ITX, BIOS P1.40 07/14/2017
[ 2138.371959] Call Trace:
[ 2138.371974]  dump_stack+0x7c/0xbb
[ 2138.372099]  ? per_file_stats+0x43/0x380 [i915]
[ 2138.372108]  print_address_description+0x73/0x3a0
[ 2138.372231]  ? per_file_stats+0x43/0x380 [i915]
[ 2138.372352]  ? per_file_stats+0x43/0x380 [i915]
[ 2138.372362]  __kasan_report+0x14e/0x192
[ 2138.372489]  ? per_file_stats+0x43/0x380 [i915]
[ 2138.372502]  kasan_report+0xe/0x20
[ 2138.372625]  per_file_stats+0x43/0x380 [i915]
[ 2138.372751]  ? i915_panel_show+0x110/0x110 [i915]
[ 2138.372761]  idr_for_each+0xa7/0x160
[ 2138.372773]  ? idr_get_next_ul+0x110/0x110
[ 2138.372782]  ? do_raw_spin_lock+0x10a/0x1d0
[ 2138.372923]  print_context_stats+0x264/0x510 [i915]

Signed-off-by: Chris Wilson <[email protected]>
Tested-by: David Weinehall <[email protected]>
Reviewed-by: Mika Kuoppala <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
sys-oak pushed a commit that referenced this issue Nov 22, 2024
…ing to satisfy some BPF verifiers

In a RHEL8 kernel (4.18.0-513.11.1.el8_9.x86_64), that, as enterprise
kernels go, have backports from modern kernels, the verifier complains
about lack of bounds check for the index into the array of syscall
arguments, on a BPF bytecode generated by clang 17, with:

  ; } else if (size < 0 && size >= -6) { /* buffer */
  116: (b7) r1 = -6
  117: (2d) if r1 > r6 goto pc-30
   R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv-6 R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6=inv(id=0,umin_value=18446744073709551610,var_off=(0xffffffff00000000; 0xffffffff)) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value
  ; index = -(size + 1);
  118: (a7) r6 ^= -1
  119: (67) r6 <<= 32
  120: (77) r6 >>= 32
  ; aug_size = args->args[index];
  121: (67) r6 <<= 3
  122: (79) r1 = *(u64 *)(r10 -24)
  123: (0f) r1 += r6
  last_idx 123 first_idx 116
  regs=40 stack=0 before 122: (79) r1 = *(u64 *)(r10 -24)
  regs=40 stack=0 before 121: (67) r6 <<= 3
  regs=40 stack=0 before 120: (77) r6 >>= 32
  regs=40 stack=0 before 119: (67) r6 <<= 32
  regs=40 stack=0 before 118: (a7) r6 ^= -1
  regs=40 stack=0 before 117: (2d) if r1 > r6 goto pc-30
  regs=42 stack=0 before 116: (b7) r1 = -6
   R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv1 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_w=inv(id=0) R5_w=inv40 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=0) R7_w=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8_w=invP6 R9_w=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_w=map_value fp-24_r=map_value fp-32_w=inv40 fp-40=ctx fp-48=map_value fp-56_w=inv1 fp-64_w=map_value fp-72=map_value fp-80=map_value
  parent didn't have regs=40 stack=0 marks
  last_idx 110 first_idx 98
  regs=40 stack=0 before 110: (6d) if r1 s> r6 goto pc+5
  regs=42 stack=0 before 109: (b7) r1 = 1
  regs=40 stack=0 before 108: (65) if r6 s> 0x1000 goto pc+7
  regs=40 stack=0 before 98: (55) if r6 != 0x1 goto pc+9
   R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=invP12 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_rw=inv(id=0) R5_w=inv24 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=2147483647) R7_w=map_value(id=0,off=40,ks=4,vs=8272,imm=0) R8_rw=invP4 R9_w=map_value(id=0,off=12,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_rw=map_value fp-24_r=map_value fp-32_rw=invP24 fp-40_r=ctx fp-48_r=map_value fp-56_w=invP1 fp-64_rw=map_value fp-72_r=map_value fp-80_r=map_value
  parent already had regs=40 stack=0 marks
  124: (79) r6 = *(u64 *)(r1 +16)
   R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=map_value(id=0,off=0,ks=4,vs=8272,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6_w=invP(id=0,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value
  R1 unbounded memory access, make sure to bounds check any such access
  processed 466 insns (limit 1000000) max_states_per_insn 2 total_states 20 peak_states 20 mark_read 3

If we add this line, as used in other BPF programs, to cap that index:

   index &= 7;

The generated BPF program is considered safe by that version of the BPF
verifier, allowing perf to collect the syscall args in one more kernel
using the BPF based pointer contents collector.

With the above one-liner it works with that kernel:

  [root@dell-per740-01 ~]# uname -a
  Linux dell-per740-01.khw.eng.rdu2.dc.redhat.com 4.18.0-513.11.1.el8_9.x86_64 #1 SMP Thu Dec 7 03:06:13 EST 2023 x86_64 x86_64 x86_64 GNU/Linux
  [root@dell-per740-01 ~]# ~acme/bin/perf trace -e *sleep* sleep 1.234567890
       0.000 (1234.704 ms): sleep/3863610 nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 })                  = 0
  [root@dell-per740-01 ~]#

As well as with the one in Fedora 40:

  root@number:~# uname -a
  Linux number 6.11.3-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 10 22:31:19 UTC 2024 x86_64 GNU/Linux
  root@number:~# perf trace -e *sleep* sleep 1.234567890
       0.000 (1234.722 ms): sleep/14873 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x7ffe87311a40) = 0
  root@number:~#

Song Liu reported that this one-liner was being optimized out by clang
18, so I suggested and he tested that adding a compiler barrier before
it made clang v18 to keep it and the verifier in the kernel in Song's
case (Meta's 5.12 based kernel) also was happy with the resulting
bytecode.

I'll investigate using virtme-ng[1] to have all the perf BPF based
functionality thoroughly tested over multiple kernels and clang
versions.

[1] https://kernel-recipes.org/en/2024/virtme-ng/

Cc: Adrian Hunter <[email protected]>
Cc: Alan Maguire <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andrea Righi <[email protected]>
Cc: Howard Chu <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Song Liu <[email protected]>
Link: https://lore.kernel.org/lkml/Zw7JgJc0LOwSpuvx@x1
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
The purpose of btrfs_bbio_propagate_error() shall be propagating an error
of split bio to its original btrfs_bio, and tell the error to the upper
layer. However, it's not working well on some cases.

* Case 1. Immediate (or quick) end_bio with an error

When btrfs sends btrfs_bio to mirrored devices, btrfs calls
btrfs_bio_end_io() when all the mirroring bios are completed. If that
btrfs_bio was split, it is from btrfs_clone_bioset and its end_io function
is btrfs_orig_write_end_io. For this case, btrfs_bbio_propagate_error()
accesses the orig_bbio's bio context to increase the error count.

That works well in most cases. However, if the end_io is called enough
fast, orig_bbio's (remaining part after split) bio context may not be
properly set at that time. Since the bio context is set when the orig_bbio
(the last btrfs_bio) is sent to devices, that might be too late for earlier
split btrfs_bio's completion.  That will result in NULL pointer
dereference.

That bug is easily reproducible by running btrfs/146 on zoned devices [1]
and it shows the following trace.

[1] You need raid-stripe-tree feature as it create "-d raid0 -m raid1" FS.

  BUG: kernel NULL pointer dereference, address: 0000000000000020
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
  CPU: 1 UID: 0 PID: 13 Comm: kworker/u32:1 Not tainted 6.11.0-rc7-BTRFS-ZNS+ #474
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Workqueue: writeback wb_workfn (flush-btrfs-5)
  RIP: 0010:btrfs_bio_end_io+0xae/0xc0 [btrfs]
  BTRFS error (device dm-0): bdev /dev/mapper/error-test errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
  RSP: 0018:ffffc9000006f248 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff888005a7f080 RCX: ffffc9000006f1dc
  RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff888005a7f080
  RBP: ffff888011dfc540 R08: 0000000000000000 R09: 0000000000000001
  R10: ffffffff82e508e0 R11: 0000000000000005 R12: ffff88800ddfbe58
  R13: ffff888005a7f080 R14: ffff888005a7f158 R15: ffff888005a7f158
  FS:  0000000000000000(0000) GS:ffff88803ea80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000020 CR3: 0000000002e22006 CR4: 0000000000370ef0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   ? __die_body.cold+0x19/0x26
   ? page_fault_oops+0x13e/0x2b0
   ? _printk+0x58/0x73
   ? do_user_addr_fault+0x5f/0x750
   ? exc_page_fault+0x76/0x240
   ? asm_exc_page_fault+0x22/0x30
   ? btrfs_bio_end_io+0xae/0xc0 [btrfs]
   ? btrfs_log_dev_io_error+0x7f/0x90 [btrfs]
   btrfs_orig_write_end_io+0x51/0x90 [btrfs]
   dm_submit_bio+0x5c2/0xa50 [dm_mod]
   ? find_held_lock+0x2b/0x80
   ? blk_try_enter_queue+0x90/0x1e0
   __submit_bio+0xe0/0x130
   ? ktime_get+0x10a/0x160
   ? lockdep_hardirqs_on+0x74/0x100
   submit_bio_noacct_nocheck+0x199/0x410
   btrfs_submit_bio+0x7d/0x150 [btrfs]
   btrfs_submit_chunk+0x1a1/0x6d0 [btrfs]
   ? lockdep_hardirqs_on+0x74/0x100
   ? __folio_start_writeback+0x10/0x2c0
   btrfs_submit_bbio+0x1c/0x40 [btrfs]
   submit_one_bio+0x44/0x60 [btrfs]
   submit_extent_folio+0x13f/0x330 [btrfs]
   ? btrfs_set_range_writeback+0xa3/0xd0 [btrfs]
   extent_writepage_io+0x18b/0x360 [btrfs]
   extent_write_locked_range+0x17c/0x340 [btrfs]
   ? __pfx_end_bbio_data_write+0x10/0x10 [btrfs]
   run_delalloc_cow+0x71/0xd0 [btrfs]
   btrfs_run_delalloc_range+0x176/0x500 [btrfs]
   ? find_lock_delalloc_range+0x119/0x260 [btrfs]
   writepage_delalloc+0x2ab/0x480 [btrfs]
   extent_write_cache_pages+0x236/0x7d0 [btrfs]
   btrfs_writepages+0x72/0x130 [btrfs]
   do_writepages+0xd4/0x240
   ? find_held_lock+0x2b/0x80
   ? wbc_attach_and_unlock_inode+0x12c/0x290
   ? wbc_attach_and_unlock_inode+0x12c/0x290
   __writeback_single_inode+0x5c/0x4c0
   ? do_raw_spin_unlock+0x49/0xb0
   writeback_sb_inodes+0x22c/0x560
   __writeback_inodes_wb+0x4c/0xe0
   wb_writeback+0x1d6/0x3f0
   wb_workfn+0x334/0x520
   process_one_work+0x1ee/0x570
   ? lock_is_held_type+0xc6/0x130
   worker_thread+0x1d1/0x3b0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0xee/0x120
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0x30/0x50
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1a/0x30
   </TASK>
  Modules linked in: dm_mod btrfs blake2b_generic xor raid6_pq rapl
  CR2: 0000000000000020

* Case 2. Earlier completion of orig_bbio for mirrored btrfs_bios

btrfs_bbio_propagate_error() assumes the end_io function for orig_bbio is
called last among split bios. In that case, btrfs_orig_write_end_io() sets
the bio->bi_status to BLK_STS_IOERR by seeing the bioc->error [2].
Otherwise, the increased orig_bio's bioc->error is not checked by anyone
and return BLK_STS_OK to the upper layer.

[2] Actually, this is not true. Because we only increases orig_bioc->errors
by max_errors, the condition "atomic_read(&bioc->error) > bioc->max_errors"
is still not met if only one split btrfs_bio fails.

* Case 3. Later completion of orig_bbio for un-mirrored btrfs_bios

In contrast to the above case, btrfs_bbio_propagate_error() is not working
well if un-mirrored orig_bbio is completed last. It sets
orig_bbio->bio.bi_status to the btrfs_bio's error. But, that is easily
over-written by orig_bbio's completion status. If the status is BLK_STS_OK,
the upper layer would not know the failure.

* Solution

Considering the above cases, we can only save the error status in the
orig_bbio (remaining part after split) itself as it is always
available. Also, the saved error status should be propagated when all the
split btrfs_bios are finished (i.e, bbio->pending_ios == 0).

This commit introduces "status" to btrfs_bbio and saves the first error of
split bios to original btrfs_bio's "status" variable. When all the split
bios are finished, the saved status is loaded into original btrfs_bio's
status.

With this commit, btrfs/146 on zoned devices does not hit the NULL pointer
dereference anymore.

Fixes: 852eee6 ("btrfs: allow btrfs_submit_bio to split bios")
CC: [email protected] # 6.6+
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Signed-off-by: Naohiro Aota <[email protected]>
Signed-off-by: David Sterba <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
Running rcutorture scenario TREE05, the below warning is triggered.

[   32.604594] WARNING: suspicious RCU usage
[   32.605928] 6.11.0-rc5-00040-g4ba4f1afb6a9 #55238 Not tainted
[   32.607812] -----------------------------
[   32.609140] kernel/events/core.c:13946 RCU-list traversed in non-reader section!!
[   32.611595] other info that might help us debug this:
[   32.614247] rcu_scheduler_active = 2, debug_locks = 1
[   32.616392] 3 locks held by cpuhp/4/35:
[   32.617687]  #0: ffffffffb666a650 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x4e/0x200
[   32.620563]  #1: ffffffffb666cd20 (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0x4e/0x200
[   32.623412]  #2: ffffffffb677c288 (pmus_lock){+.+.}-{3:3}, at: perf_event_exit_cpu_context+0x32/0x2f0

In perf_event_clear_cpumask(), uses list_for_each_entry_rcu() without an
obvious RCU read-side critical section.

Either pmus_srcu or pmus_lock is good enough to protect the pmus list.
In the current context, pmus_lock is already held. The
list_for_each_entry_rcu() is not required.

Fixes: 4ba4f1a ("perf: Generic hotplug support for a PMU with a scope")
Closes: https://lore.kernel.org/lkml/2b66dff8-b827-494b-b151-1ad8d56f13e6@paulmck-laptop/
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Reported-by: "Paul E. McKenney" <[email protected]>
Reported-by: kernel test robot <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Kan Liang <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: "Paul E. McKenney" <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
sys-oak pushed a commit that referenced this issue Nov 22, 2024
Add check for the return value of spi_get_csgpiod() to avoid passing a NULL
pointer to gpiod_direction_output(), preventing a crash when GPIO chip
select is not used.

Fix below crash:
[    4.251960] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[    4.260762] Mem abort info:
[    4.263556]   ESR = 0x0000000096000004
[    4.267308]   EC = 0x25: DABT (current EL), IL = 32 bits
[    4.272624]   SET = 0, FnV = 0
[    4.275681]   EA = 0, S1PTW = 0
[    4.278822]   FSC = 0x04: level 0 translation fault
[    4.283704] Data abort info:
[    4.286583]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[    4.292074]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    4.297130]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    4.302445] [0000000000000000] user address but active_mm is swapper
[    4.308805] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[    4.315072] Modules linked in:
[    4.318124] CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc4-next-20241023-00008-ga20ec42c5fc1 #359
[    4.328130] Hardware name: LS1046A QDS Board (DT)
[    4.332832] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    4.339794] pc : gpiod_direction_output+0x34/0x5c
[    4.344505] lr : gpiod_direction_output+0x18/0x5c
[    4.349208] sp : ffff80008003b8f0
[    4.352517] x29: ffff80008003b8f0 x28: 0000000000000000 x27: ffffc96bcc7e9068
[    4.359659] x26: ffffc96bcc6e00b0 x25: ffffc96bcc598398 x24: ffff447400132810
[    4.366800] x23: 0000000000000000 x22: 0000000011e1a300 x21: 0000000000020002
[    4.373940] x20: 0000000000000000 x19: 0000000000000000 x18: ffffffffffffffff
[    4.381081] x17: ffff44740016e600 x16: 0000000500000003 x15: 0000000000000007
[    4.388221] x14: 0000000000989680 x13: 0000000000020000 x12: 000000000000001e
[    4.395362] x11: 0044b82fa09b5a53 x10: 0000000000000019 x9 : 0000000000000008
[    4.402502] x8 : 0000000000000002 x7 : 0000000000000007 x6 : 0000000000000000
[    4.409641] x5 : 0000000000000200 x4 : 0000000002000000 x3 : 0000000000000000
[    4.416781] x2 : 0000000000022202 x1 : 0000000000000000 x0 : 0000000000000000
[    4.423921] Call trace:
[    4.426362]  gpiod_direction_output+0x34/0x5c (P)
[    4.431067]  gpiod_direction_output+0x18/0x5c (L)
[    4.435771]  dspi_setup+0x220/0x334

Fixes: 9e264f3 ("spi: Replace all spi->chip_select and spi->cs_gpiod references with function call")
Cc: [email protected]
Signed-off-by: Frank Li <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Mark Brown <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
… non-PCI device

The function cxl_endpoint_gather_bandwidth() invokes
pci_bus_read/write_XXX(), however, not all CXL devices are presently
implemented via PCI. It is recognized that the cxl_test has realized a CXL
device using a platform device.

Calling pci_bus_read/write_XXX() in cxl_test will cause kernel panic:
 platform cxl_host_bridge.3: host supports CXL (restricted)
 Oops: general protection fault, probably for non-canonical address 0x3ef17856fcae4fbd: 0000 [#1] PREEMPT SMP PTI
 Call Trace:
  <TASK>
  ? __die_body.cold+0x19/0x27
  ? die_addr+0x38/0x60
  ? exc_general_protection+0x1f5/0x4b0
  ? asm_exc_general_protection+0x22/0x30
  ? pci_bus_read_config_word+0x1c/0x60
  pcie_capability_read_word+0x93/0xb0
  pcie_link_speed_mbps+0x18/0x50
  cxl_pci_get_bandwidth+0x18/0x60 [cxl_core]
  cxl_endpoint_gather_bandwidth.constprop.0+0xf4/0x230 [cxl_core]
  ? xas_store+0x54/0x660
  ? preempt_count_add+0x69/0xa0
  ? _raw_spin_lock+0x13/0x40
  ? __kmalloc_cache_noprof+0xe7/0x270
  cxl_region_shared_upstream_bandwidth_update+0x9c/0x790 [cxl_core]
  cxl_region_attach+0x520/0x7e0 [cxl_core]
  store_targetN+0xf2/0x120 [cxl_core]
  kernfs_fop_write_iter+0x13a/0x1f0
  vfs_write+0x23b/0x410
  ksys_write+0x53/0xd0
  do_syscall_64+0x62/0x180
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

And Ying also reported a KASAN error with similar calltrace.

Reported-by: Huang, Ying <[email protected]>
Closes: http://lore.kernel.org/[email protected]
Fixes: a5ab0de ("cxl: Calculate region bandwidth of targets with shared upstream link")
Signed-off-by: Li Zhijian <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Tested-by: Huang, Ying <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Ira Weiny <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
In support of investigating an initialization failure report [1],
cxl_test was updated to register mock memory-devices after the mock
root-port/bus device had been registered. That led to cxl_test crashing
with a use-after-free bug with the following signature:

    cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
    cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
    cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
1)  cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
    [..]
    cxld_unregister: cxl decoder14.0:
    cxl_region_decode_reset: cxl_region region3:
    mock_decoder_reset: cxl_port port3: decoder3.0 reset
2)  mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
    cxl_endpoint_decoder_release: cxl decoder14.0:
    [..]
    cxld_unregister: cxl decoder7.0:
3)  cxl_region_decode_reset: cxl_region region3:
    Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
    [..]
    RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
    [..]
    Call Trace:
     <TASK>
     cxl_region_decode_reset+0x69/0x190 [cxl_core]
     cxl_region_detach+0xe8/0x210 [cxl_core]
     cxl_decoder_kill_region+0x27/0x40 [cxl_core]
     cxld_unregister+0x5d/0x60 [cxl_core]

At 1) a region has been established with 2 endpoint decoders (7.0 and
14.0). Those endpoints share a common switch-decoder in the topology
(3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
the "out of order reset case" in the switch decoder. The effect though
is that region3 cleanup is aborted leaving it in-tact and
referencing decoder14.0. At 3) the second attempt to teardown region3
trips over the stale decoder14.0 object which has long since been
deleted.

The fix here is to recognize that the CXL specification places no
mandate on in-order shutdown of switch-decoders, the driver enforces
in-order allocation, and hardware enforces in-order commit. So, rather
than fail and leave objects dangling, always remove them.

In support of making cxl_region_decode_reset() always succeed,
cxl_region_invalidate_memregion() failures are turned into warnings.
Crashing the kernel is ok there since system integrity is at risk if
caches cannot be managed around physical address mutation events like
CXL region destruction.

A new device_for_each_child_reverse_from() is added to cleanup
port->commit_end after all dependent decoders have been disabled. In
other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
port->commit_end only decrements from 2 after 2 has been disabled, and
it decrements all the way to zero since 1 was disabled previously.

Link: http://lore.kernel.org/[email protected] [1]
Cc: [email protected]
Fixes: 176baef ("cxl/hdm: Commit decoder state to hardware")
Reviewed-by: Jonathan Cameron <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Alison Schofield <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Zijun Hu <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Ira Weiny <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
Under memory pressure it's possible for GFP_ATOMIC order-0 allocations to
fail even though free pages are available in the highatomic reserves. 
GFP_ATOMIC allocations cannot trigger unreserve_highatomic_pageblock()
since it's only run from reclaim.

Given that such allocations will pass the watermarks in
__zone_watermark_unusable_free(), it makes sense to fallback to highatomic
reserves the same way that ALLOC_OOM can.

This fixes order-0 page allocation failures observed on Cloudflare's fleet
when handling network packets:

  kswapd1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC),
  nodemask=(null),cpuset=/,mems_allowed=0-7
  CPU: 10 PID: 696 Comm: kswapd1 Kdump: loaded Tainted: G           O 6.6.43-CUSTOM #1
  Hardware name: MACHINE
  Call Trace:
   <IRQ>
   dump_stack_lvl+0x3c/0x50
   warn_alloc+0x13a/0x1c0
   __alloc_pages_slowpath.constprop.0+0xc9d/0xd10
   __alloc_pages+0x327/0x340
   __napi_alloc_skb+0x16d/0x1f0
   bnxt_rx_page_skb+0x96/0x1b0 [bnxt_en]
   bnxt_rx_pkt+0x201/0x15e0 [bnxt_en]
   __bnxt_poll_work+0x156/0x2b0 [bnxt_en]
   bnxt_poll+0xd9/0x1c0 [bnxt_en]
   __napi_poll+0x2b/0x1b0
   bpf_trampoline_6442524138+0x7d/0x1000
   __napi_poll+0x5/0x1b0
   net_rx_action+0x342/0x740
   handle_softirqs+0xcf/0x2b0
   irq_exit_rcu+0x6c/0x90
   sysvec_apic_timer_interrupt+0x72/0x90
   </IRQ>

[[email protected]: update comment]
  Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/CAGis_TWzSu=P7QJmjD58WWiu3zjMTVKSzdOwWE8ORaGytzWJwQ@mail.gmail.com/
Fixes: 1d91df8 ("mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs")
Signed-off-by: Matt Fleming <[email protected]>
Suggested-by: Vlastimil Babka <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
walk_system_ram_res_rev() erroneously discards resource flags when passing
the information to the callback.

This causes systems with IORESOURCE_SYSRAM_DRIVER_MANAGED memory to have
these resources selected during kexec to store kexec buffers if that
memory happens to be at placed above normal system ram.

This leads to undefined behavior after reboot.  If the kexec buffer is
never touched, nothing happens.  If the kexec buffer is touched, it could
lead to a crash (like below) or undefined behavior.

Tested on a system with CXL memory expanders with driver managed memory,
TPM enabled, and CONFIG_IMA_KEXEC=y.  Adding printk's showed the flags
were being discarded and as a result the check for
IORESOURCE_SYSRAM_DRIVER_MANAGED passes.

find_next_iomem_res: name(System RAM (kmem))
		     start(10000000000)
		     end(1034fffffff)
		     flags(83000200)

locate_mem_hole_top_down: start(10000000000) end(1034fffffff) flags(0)

[.] BUG: unable to handle page fault for address: ffff89834ffff000
[.] #PF: supervisor read access in kernel mode
[.] #PF: error_code(0x0000) - not-present page
[.] PGD c04c8bf067 P4D c04c8bf067 PUD c04c8be067 PMD 0
[.] Oops: 0000 [#1] SMP
[.] RIP: 0010:ima_restore_measurement_list+0x95/0x4b0
[.] RSP: 0018:ffffc900000d3a80 EFLAGS: 00010286
[.] RAX: 0000000000001000 RBX: 0000000000000000 RCX: ffff89834ffff000
[.] RDX: 0000000000000018 RSI: ffff89834ffff000 RDI: ffff89834ffff018
[.] RBP: ffffc900000d3ba0 R08: 0000000000000020 R09: ffff888132b8a900
[.] R10: 4000000000000000 R11: 000000003a616d69 R12: 0000000000000000
[.] R13: ffffffff8404ac28 R14: 0000000000000000 R15: ffff89834ffff000
[.] FS:  0000000000000000(0000) GS:ffff893d44640000(0000) knlGS:0000000000000000
[.] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[.] ata5: SATA link down (SStatus 0 SControl 300)
[.] CR2: ffff89834ffff000 CR3: 000001034d00f001 CR4: 0000000000770ef0
[.] PKRU: 55555554
[.] Call Trace:
[.]  <TASK>
[.]  ? __die+0x78/0xc0
[.]  ? page_fault_oops+0x2a8/0x3a0
[.]  ? exc_page_fault+0x84/0x130
[.]  ? asm_exc_page_fault+0x22/0x30
[.]  ? ima_restore_measurement_list+0x95/0x4b0
[.]  ? template_desc_init_fields+0x317/0x410
[.]  ? crypto_alloc_tfm_node+0x9c/0xc0
[.]  ? init_ima_lsm+0x30/0x30
[.]  ima_load_kexec_buffer+0x72/0xa0
[.]  ima_init+0x44/0xa0
[.]  __initstub__kmod_ima__373_1201_init_ima7+0x1e/0xb0
[.]  ? init_ima_lsm+0x30/0x30
[.]  do_one_initcall+0xad/0x200
[.]  ? idr_alloc_cyclic+0xaa/0x110
[.]  ? new_slab+0x12c/0x420
[.]  ? new_slab+0x12c/0x420
[.]  ? number+0x12a/0x430
[.]  ? sysvec_apic_timer_interrupt+0xa/0x80
[.]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[.]  ? parse_args+0xd4/0x380
[.]  ? parse_args+0x14b/0x380
[.]  kernel_init_freeable+0x1c1/0x2b0
[.]  ? rest_init+0xb0/0xb0
[.]  kernel_init+0x16/0x1a0
[.]  ret_from_fork+0x2f/0x40
[.]  ? rest_init+0xb0/0xb0
[.]  ret_from_fork_asm+0x11/0x20
[.]  </TASK>

Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 7acf164 ("resource: add walk_system_ram_res_rev()")
Signed-off-by: Gregory Price <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: AKASHI Takahiro <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Ilpo Järvinen <[email protected]>
Cc: Mika Westerberg <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
The following BUG was triggered:

=============================
[ BUG: Invalid wait context ]
6.12.0-rc2-XXX #406 Not tainted
-----------------------------
kworker/1:1/62 is trying to lock:
ffffff8801593030 (&cpc_ptr->rmw_lock){+.+.}-{3:3}, at: cpc_write+0xcc/0x370
other info that might help us debug this:
context-{5:5}
2 locks held by kworker/1:1/62:
  #0: ffffff897ef5ec98 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2c/0x50
  #1: ffffff880154e238 (&sg_policy->update_lock){....}-{2:2}, at: sugov_update_shared+0x3c/0x280
stack backtrace:
CPU: 1 UID: 0 PID: 62 Comm: kworker/1:1 Not tainted 6.12.0-rc2-g9654bd3e8806 #406
Workqueue:  0x0 (events)
Call trace:
  dump_backtrace+0xa4/0x130
  show_stack+0x20/0x38
  dump_stack_lvl+0x90/0xd0
  dump_stack+0x18/0x28
  __lock_acquire+0x480/0x1ad8
  lock_acquire+0x114/0x310
  _raw_spin_lock+0x50/0x70
  cpc_write+0xcc/0x370
  cppc_set_perf+0xa0/0x3a8
  cppc_cpufreq_fast_switch+0x40/0xc0
  cpufreq_driver_fast_switch+0x4c/0x218
  sugov_update_shared+0x234/0x280
  update_load_avg+0x6ec/0x7b8
  dequeue_entities+0x108/0x830
  dequeue_task_fair+0x58/0x408
  __schedule+0x4f0/0x1070
  schedule+0x54/0x130
  worker_thread+0xc0/0x2e8
  kthread+0x130/0x148
  ret_from_fork+0x10/0x20

sugov_update_shared() locks a raw_spinlock while cpc_write() locks a
spinlock.

To have a correct wait-type order, update rmw_lock to a raw spinlock and
ensure that interrupts will be disabled on the CPU holding it.

Fixes: 60949b7 ("ACPI: CPPC: Fix MASK_VAL() usage")
Signed-off-by: Pierre Gondois <[email protected]>
Link: https://patch.msgid.link/[email protected]
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
I got a syzbot report without a repro [1] crashing in nf_send_reset6()

I think the issue is that dev->hard_header_len is zero, and we attempt
later to push an Ethernet header.

Use LL_MAX_HEADER, as other functions in net/ipv6/netfilter/nf_reject_ipv6.c.

[1]

skbuff: skb_under_panic: text:ffffffff89b1d008 len:74 put:14 head:ffff88803123aa00 data:ffff88803123a9f2 tail:0x3c end:0x140 dev:syz_tun
 kernel BUG at net/core/skbuff.c:206 !
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 UID: 0 PID: 7373 Comm: syz.1.568 Not tainted 6.12.0-rc2-syzkaller-00631-g6d858708d465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
 RIP: 0010:skb_panic net/core/skbuff.c:206 [inline]
 RIP: 0010:skb_under_panic+0x14b/0x150 net/core/skbuff.c:216
Code: 0d 8d 48 c7 c6 60 a6 29 8e 48 8b 54 24 08 8b 0c 24 44 8b 44 24 04 4d 89 e9 50 41 54 41 57 41 56 e8 ba 30 38 02 48 83 c4 20 90 <0f> 0b 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
RSP: 0018:ffffc900045269b0 EFLAGS: 00010282
RAX: 0000000000000088 RBX: dffffc0000000000 RCX: cd66dacdc5d8e800
RDX: 0000000000000000 RSI: 0000000000000200 RDI: 0000000000000000
RBP: ffff88802d39a3d0 R08: ffffffff8174afec R09: 1ffff920008a4ccc
R10: dffffc0000000000 R11: fffff520008a4ccd R12: 0000000000000140
R13: ffff88803123aa00 R14: ffff88803123a9f2 R15: 000000000000003c
FS:  00007fdbee5ff6c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000005d322000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  skb_push+0xe5/0x100 net/core/skbuff.c:2636
  eth_header+0x38/0x1f0 net/ethernet/eth.c:83
  dev_hard_header include/linux/netdevice.h:3208 [inline]
  nf_send_reset6+0xce6/0x1270 net/ipv6/netfilter/nf_reject_ipv6.c:358
  nft_reject_inet_eval+0x3b9/0x690 net/netfilter/nft_reject_inet.c:48
  expr_call_ops_eval net/netfilter/nf_tables_core.c:240 [inline]
  nft_do_chain+0x4ad/0x1da0 net/netfilter/nf_tables_core.c:288
  nft_do_chain_inet+0x418/0x6b0 net/netfilter/nft_chain_filter.c:161
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626
  nf_hook include/linux/netfilter.h:269 [inline]
  NF_HOOK include/linux/netfilter.h:312 [inline]
  br_nf_pre_routing_ipv6+0x63e/0x770 net/bridge/br_netfilter_ipv6.c:184
  nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline]
  nf_hook_bridge_pre net/bridge/br_input.c:277 [inline]
  br_handle_frame+0x9fd/0x1530 net/bridge/br_input.c:424
  __netif_receive_skb_core+0x13e8/0x4570 net/core/dev.c:5562
  __netif_receive_skb_one_core net/core/dev.c:5666 [inline]
  __netif_receive_skb+0x12f/0x650 net/core/dev.c:5781
  netif_receive_skb_internal net/core/dev.c:5867 [inline]
  netif_receive_skb+0x1e8/0x890 net/core/dev.c:5926
  tun_rx_batched+0x1b7/0x8f0 drivers/net/tun.c:1550
  tun_get_user+0x3056/0x47e0 drivers/net/tun.c:2007
  tun_chr_write_iter+0x10d/0x1f0 drivers/net/tun.c:2053
  new_sync_write fs/read_write.c:590 [inline]
  vfs_write+0xa6d/0xc90 fs/read_write.c:683
  ksys_write+0x183/0x2b0 fs/read_write.c:736
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fdbeeb7d1ff
Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 c9 8d 02 00 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 1c 8e 02 00 48
RSP: 002b:00007fdbee5ff000 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fdbeed36058 RCX: 00007fdbeeb7d1ff
RDX: 000000000000008e RSI: 0000000020000040 RDI: 00000000000000c8
RBP: 00007fdbeebf12be R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000008e R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fdbeed36058 R15: 00007ffc38de06e8
 </TASK>

Fixes: c8d7b98 ("netfilter: move nf_send_resetX() code to nf_reject_ipvX modules")
Reported-by: syzbot <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
Hou Tao says:

====================
The patch set fixes several issues in bits iterator. Patch #1 fixes the
kmemleak problem of bits iterator. Patch #2~#3 fix the overflow problem
of nr_bits. Patch #4 fixes the potential stack corruption when bits
iterator is used on 32-bit host. Patch #5 adds more test cases for bits
iterator.

Please see the individual patches for more details. And comments are
always welcome.
---
v4:
 * patch #1: add ack from Yafang
 * patch #3: revert code-churn like changes:
   (1) compute nr_bytes and nr_bits before the check of nr_words.
   (2) use nr_bits == 64 to check for single u64, preventing build
       warning on 32-bit hosts.
 * patch #4: use "BITS_PER_LONG == 32" instead of "!defined(CONFIG_64BIT)"

v3: https://lore.kernel.org/bpf/[email protected]/T/#t
  * split the bits-iterator related patches from "Misc fixes for bpf"
    patch set
  * patch #1: use "!nr_bits || bits >= nr_bits" to stop the iteration
  * patch #2: add a new helper for the overflow problem
  * patch #3: decrease the limitation from 512 to 511 and check whether
    nr_bytes is too large for bpf memory allocator explicitly
  * patch #5: add two more test cases for bit iterator

v2: http://lore.kernel.org/bpf/[email protected]
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
Petr Machata says:

====================
mlxsw: Fixes

In this patchset:

- Tx header should be pushed for each packet which is transmitted via
  Spectrum ASICs. Patch #1 adds a missing call to skb_cow_head() to make
  sure that there is both enough room to push the Tx header and that the
  SKB header is not cloned and can be modified.

- Commit b5b60bb ("mlxsw: pci: Use page pool for Rx buffers
  allocation") converted mlxsw to use page pool for Rx buffers allocation.
  Sync for CPU and for device should be done for Rx pages. In patches #2
  and #3, add the missing calls to sync pages for, respectively, CPU and
  the device.

- Patch #4 then fixes a bug to IPv6 GRE forwarding offload. Patch #5 adds
  a generic forwarding test that fails with mlxsw ports prior to the fix.
====================

Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
When we compile and load lib/slub_kunit.c,it will cause a panic.

The root cause is that __kmalloc_cache_noprof was directly called instead
of kmem_cache_alloc,which resulted in no alloc_tag being allocated.This
caused current->alloc_tag to be null,leading to a null pointer dereference
in alloc_tag_ref_set.

Despite the fact that my colleague Pei Xiao will later fix the code in
slub_kunit.c,we still need fix null pointer check logic for ref and tag to
avoid panic caused by a null pointer dereference.

Here is the log for the panic:

[   74.779373][ T2158] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[   74.780130][ T2158] Mem abort info:
[   74.780406][ T2158]   ESR = 0x0000000096000004
[   74.780756][ T2158]   EC = 0x25: DABT (current EL), IL = 32 bits
[   74.781225][ T2158]   SET = 0, FnV = 0
[   74.781529][ T2158]   EA = 0, S1PTW = 0
[   74.781836][ T2158]   FSC = 0x04: level 0 translation fault
[   74.782288][ T2158] Data abort info:
[   74.782577][ T2158]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[   74.783068][ T2158]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[   74.783533][ T2158]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[   74.784010][ T2158] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000105f34000
[   74.784586][ T2158] [0000000000000020] pgd=0000000000000000, p4d=0000000000000000
[   74.785293][ T2158] Internal error: Oops: 0000000096000004 [#1] SMP
[   74.785805][ T2158] Modules linked in: slub_kunit kunit ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle 4
[   74.790661][ T2158] CPU: 0 UID: 0 PID: 2158 Comm: kunit_try_catch Kdump: loaded Tainted: G        W        N 6.12.0-rc3+ #2
[   74.791535][ T2158] Tainted: [W]=WARN, [N]=TEST
[   74.791889][ T2158] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[   74.792479][ T2158] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   74.793101][ T2158] pc : alloc_tagging_slab_alloc_hook+0x120/0x270
[   74.793607][ T2158] lr : alloc_tagging_slab_alloc_hook+0x120/0x270
[   74.794095][ T2158] sp : ffff800084d33cd0
[   74.794418][ T2158] x29: ffff800084d33cd0 x28: 0000000000000000 x27: 0000000000000000
[   74.795095][ T2158] x26: 0000000000000000 x25: 0000000000000012 x24: ffff80007b30e314
[   74.795822][ T2158] x23: ffff000390ff6f10 x22: 0000000000000000 x21: 0000000000000088
[   74.796555][ T2158] x20: ffff000390285840 x19: fffffd7fc3ef7830 x18: ffffffffffffffff
[   74.797283][ T2158] x17: ffff8000800e63b4 x16: ffff80007b33afc4 x15: ffff800081654c00
[   74.798011][ T2158] x14: 0000000000000000 x13: 205d383531325420 x12: 5b5d383734363537
[   74.798744][ T2158] x11: ffff800084d337e0 x10: 000000000000005d x9 : 00000000ffffffd0
[   74.799476][ T2158] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008219d188 x6 : c0000000ffff7fff
[   74.800206][ T2158] x5 : ffff0003fdbc9208 x4 : ffff800081edd188 x3 : 0000000000000001
[   74.800932][ T2158] x2 : 0beaa6dee1ac5a00 x1 : 0beaa6dee1ac5a00 x0 : ffff80037c2cb000
[   74.801656][ T2158] Call trace:
[   74.801954][ T2158]  alloc_tagging_slab_alloc_hook+0x120/0x270
[   74.802494][ T2158]  __kmalloc_cache_noprof+0x148/0x33c
[   74.802976][ T2158]  test_kmalloc_redzone_access+0x4c/0x104 [slub_kunit]
[   74.803607][ T2158]  kunit_try_run_case+0x70/0x17c [kunit]
[   74.804124][ T2158]  kunit_generic_run_threadfn_adapter+0x2c/0x4c [kunit]
[   74.804768][ T2158]  kthread+0x10c/0x118
[   74.805141][ T2158]  ret_from_fork+0x10/0x20
[   74.805540][ T2158] Code: b9400a80 11000400 b9000a80 97ffd858 (f94012d3)
[   74.806176][ T2158] SMP: stopping secondary CPUs
[   74.808130][ T2158] Starting crashdump kernel...

Link: https://lkml.kernel.org/r/[email protected]
Fixes: e0a955b ("mm/codetag: add pgalloc_tag_copy()")
Signed-off-by: Hao Ge <[email protected]>
Acked-by: Suren Baghdasaryan <[email protected]>
Suggested-by: Suren Baghdasaryan <[email protected]>
Acked-by: Yu Zhao <[email protected]>
Cc: Kent Overstreet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
Enqueue packets in dql after dma engine starts causes race condition.
Tx transfer starts once dma engine is started and may execute dql dequeue
in completion before it gets queued. It results in following kernel crash
while running iperf stress test:

kernel BUG at lib/dynamic_queue_limits.c:99!
<snip>
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
pc : dql_completed+0x238/0x248
lr : dql_completed+0x3c/0x248

Call trace:
  dql_completed+0x238/0x248
  axienet_dma_tx_cb+0xa0/0x170
  xilinx_dma_do_tasklet+0xdc/0x290
  tasklet_action_common+0xf8/0x11c
  tasklet_action+0x30/0x3c
  handle_softirqs+0xf8/0x230
<snip>

Start dmaengine after enqueue in dql fixes the crash.

Fixes: 6a91b84 ("net: axienet: Introduce dmaengine support")
Signed-off-by: Suraj Gupta <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
Recently, we got a customer report that CIFS triggers oops while
reconnecting to a server.  [0]

The workload runs on Kubernetes, and some pods mount CIFS servers
in non-root network namespaces.  The problem rarely happened, but
it was always while the pod was dying.

The root cause is wrong reference counting for network namespace.

CIFS uses kernel sockets, which do not hold refcnt of the netns that
the socket belongs to.  That means CIFS must ensure the socket is
always freed before its netns; otherwise, use-after-free happens.

The repro steps are roughly:

  1. mount CIFS in a non-root netns
  2. drop packets from the netns
  3. destroy the netns
  4. unmount CIFS

We can reproduce the issue quickly with the script [1] below and see
the splat [2] if CONFIG_NET_NS_REFCNT_TRACKER is enabled.

When the socket is TCP, it is hard to guarantee the netns lifetime
without holding refcnt due to async timers.

Let's hold netns refcnt for each socket as done for SMC in commit
9744d2b ("smc: Fix use-after-free in tcp_write_timer_handler().").

Note that we need to move put_net() from cifs_put_tcp_session() to
clean_demultiplex_info(); otherwise, __sock_create() still could touch a
freed netns while cifsd tries to reconnect from cifs_demultiplex_thread().

Also, maybe_get_net() cannot be put just before __sock_create() because
the code is not under RCU and there is a small chance that the same
address happened to be reallocated to another netns.

[0]:
CIFS: VFS: \\XXXXXXXXXXX has not responded in 15 seconds. Reconnecting...
CIFS: Serverclose failed 4 times, giving up
Unable to handle kernel paging request at virtual address 14de99e461f84a07
Mem abort info:
  ESR = 0x0000000096000004
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x04: level 0 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000004
  CM = 0, WnR = 0
[14de99e461f84a07] address between user and kernel address ranges
Internal error: Oops: 0000000096000004 [#1] SMP
Modules linked in: cls_bpf sch_ingress nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver tcp_diag inet_diag veth xt_state xt_connmark nf_conntrack_netlink xt_nat xt_statistic xt_MASQUERADE xt_mark xt_addrtype ipt_REJECT nf_reject_ipv4 nft_chain_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment nft_compat nf_tables nfnetlink overlay nls_ascii nls_cp437 sunrpc vfat fat aes_ce_blk aes_ce_cipher ghash_ce sm4_ce_cipher sm4 sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 sha1_ce ena button sch_fq_codel loop fuse configfs dmi_sysfs sha2_ce sha256_arm64 dm_mirror dm_region_hash dm_log dm_mod dax efivarfs
CPU: 5 PID: 2690970 Comm: cifsd Not tainted 6.1.103-109.184.amzn2023.aarch64 #1
Hardware name: Amazon EC2 r7g.4xlarge/, BIOS 1.0 11/1/2018
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : fib_rules_lookup+0x44/0x238
lr : __fib_lookup+0x64/0xbc
sp : ffff8000265db790
x29: ffff8000265db790 x28: 0000000000000000 x27: 000000000000bd01
x26: 0000000000000000 x25: ffff000b4baf8000 x24: ffff00047b5e4580
x23: ffff8000265db7e0 x22: 0000000000000000 x21: ffff00047b5e4500
x20: ffff0010e3f694f8 x19: 14de99e461f849f7 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: 3f92800abd010002
x11: 0000000000000001 x10: ffff0010e3f69420 x9 : ffff800008a6f294
x8 : 0000000000000000 x7 : 0000000000000006 x6 : 0000000000000000
x5 : 0000000000000001 x4 : ffff001924354280 x3 : ffff8000265db7e0
x2 : 0000000000000000 x1 : ffff0010e3f694f8 x0 : ffff00047b5e4500
Call trace:
 fib_rules_lookup+0x44/0x238
 __fib_lookup+0x64/0xbc
 ip_route_output_key_hash_rcu+0x2c4/0x398
 ip_route_output_key_hash+0x60/0x8c
 tcp_v4_connect+0x290/0x488
 __inet_stream_connect+0x108/0x3d0
 inet_stream_connect+0x50/0x78
 kernel_connect+0x6c/0xac
 generic_ip_connect+0x10c/0x6c8 [cifs]
 __reconnect_target_unlocked+0xa0/0x214 [cifs]
 reconnect_dfs_server+0x144/0x460 [cifs]
 cifs_reconnect+0x88/0x148 [cifs]
 cifs_readv_from_socket+0x230/0x430 [cifs]
 cifs_read_from_socket+0x74/0xa8 [cifs]
 cifs_demultiplex_thread+0xf8/0x704 [cifs]
 kthread+0xd0/0xd4
Code: aa0003f8 f8480f13 eb18027f 540006c0 (b9401264)

[1]:
CIFS_CRED="/root/cred.cifs"
CIFS_USER="Administrator"
CIFS_PASS="Password"
CIFS_IP="X.X.X.X"
CIFS_PATH="//${CIFS_IP}/Users/Administrator/Desktop/CIFS_TEST"
CIFS_MNT="/mnt/smb"
DEV="enp0s3"

cat <<EOF > ${CIFS_CRED}
username=${CIFS_USER}
password=${CIFS_PASS}
domain=EXAMPLE.COM
EOF

unshare -n bash -c "
mkdir -p ${CIFS_MNT}
ip netns attach root 1
ip link add eth0 type veth peer veth0 netns root
ip link set eth0 up
ip -n root link set veth0 up
ip addr add 192.168.0.2/24 dev eth0
ip -n root addr add 192.168.0.1/24 dev veth0
ip route add default via 192.168.0.1 dev eth0
ip netns exec root sysctl net.ipv4.ip_forward=1
ip netns exec root iptables -t nat -A POSTROUTING -s 192.168.0.2 -o ${DEV} -j MASQUERADE
mount -t cifs ${CIFS_PATH} ${CIFS_MNT} -o vers=3.0,sec=ntlmssp,credentials=${CIFS_CRED},rsize=65536,wsize=65536,cache=none,echo_interval=1
touch ${CIFS_MNT}/a.txt
ip netns exec root iptables -t nat -D POSTROUTING -s 192.168.0.2 -o ${DEV} -j MASQUERADE
"

umount ${CIFS_MNT}

[2]:
ref_tracker: net notrefcnt@000000004bbc008d has 1/1 users at
     sk_alloc (./include/net/net_namespace.h:339 net/core/sock.c:2227)
     inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
     __sock_create (net/socket.c:1576)
     generic_ip_connect (fs/smb/client/connect.c:3075)
     cifs_get_tcp_session.part.0 (fs/smb/client/connect.c:3160 fs/smb/client/connect.c:1798)
     cifs_mount_get_session (fs/smb/client/trace.h:959 fs/smb/client/connect.c:3366)
     dfs_mount_share (fs/smb/client/dfs.c:63 fs/smb/client/dfs.c:285)
     cifs_mount (fs/smb/client/connect.c:3622)
     cifs_smb3_do_mount (fs/smb/client/cifsfs.c:949)
     smb3_get_tree (fs/smb/client/fs_context.c:784 fs/smb/client/fs_context.c:802 fs/smb/client/fs_context.c:794)
     vfs_get_tree (fs/super.c:1800)
     path_mount (fs/namespace.c:3508 fs/namespace.c:3834)
     __x64_sys_mount (fs/namespace.c:3848 fs/namespace.c:4057 fs/namespace.c:4034 fs/namespace.c:4034)
     do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
     entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Fixes: 26abe14 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Acked-by: Tom Talpey <[email protected]>
Signed-off-by: Steve French <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
Unloading the ice driver while switchdev port representors are added to
a bridge can lead to kernel panic. Reproducer:

  modprobe ice

  devlink dev eswitch set $PF1_PCI mode switchdev

  ip link add $BR type bridge
  ip link set $BR up

  echo 2 > /sys/class/net/$PF1/device/sriov_numvfs
  sleep 2

  ip link set $PF1 master $BR
  ip link set $VF1_PR master $BR
  ip link set $VF2_PR master $BR
  ip link set $PF1 up
  ip link set $VF1_PR up
  ip link set $VF2_PR up
  ip link set $VF1 up

  rmmod irdma ice

When unloading the driver, ice_eswitch_detach() is eventually called as
part of VF freeing. First, it removes a port representor from xarray,
then unregister_netdev() is called (via repr->ops.rem()), finally
representor is deallocated. The problem comes from the bridge doing its
own deinit at the same time. unregister_netdev() triggers a notifier
chain, resulting in ice_eswitch_br_port_deinit() being called. It should
set repr->br_port = NULL, but this does not happen since repr has
already been removed from xarray and is not found. Regardless, it
finishes up deallocating br_port. At this point, repr is still not freed
and an fdb event can happen, in which ice_eswitch_br_fdb_event_work()
takes repr->br_port and tries to use it, which causes a panic (use after
free).

Note that this only happens with 2 or more port representors added to
the bridge, since with only one representor port, the bridge deinit is
slightly different (ice_eswitch_br_port_deinit() is called via
ice_eswitch_br_ports_flush(), not ice_eswitch_br_port_unlink()).

Trace:
  Oops: general protection fault, probably for non-canonical address 0xf129010fd1a93284: 0000 [#1] PREEMPT SMP KASAN NOPTI
  KASAN: maybe wild-memory-access in range [0x8948287e8d499420-0x8948287e8d499427]
  (...)
  Workqueue: ice_bridge_wq ice_eswitch_br_fdb_event_work [ice]
  RIP: 0010:__rht_bucket_nested+0xb4/0x180
  (...)
  Call Trace:
   (...)
   ice_eswitch_br_fdb_find+0x3fa/0x550 [ice]
   ? __pfx_ice_eswitch_br_fdb_find+0x10/0x10 [ice]
   ice_eswitch_br_fdb_event_work+0x2de/0x1e60 [ice]
   ? __schedule+0xf60/0x5210
   ? mutex_lock+0x91/0xe0
   ? __pfx_ice_eswitch_br_fdb_event_work+0x10/0x10 [ice]
   ? ice_eswitch_br_update_work+0x1f4/0x310 [ice]
   (...)

A workaround is available: brctl setageing $BR 0, which stops the bridge
from adding fdb entries altogether.

Change the order of operations in ice_eswitch_detach(): move the call to
unregister_netdev() before removing repr from xarray. This way
repr->br_port will be correctly set to NULL in
ice_eswitch_br_port_deinit(), preventing a panic.

Fixes: fff292b ("ice: add VF representors one by one")
Reviewed-by: Michal Swiatkowski <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Signed-off-by: Marcin Szycik <[email protected]>
Tested-by: Sujai Buvaneswaran <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
The RTC update work involves runtime resuming the UFS controller. Hence,
only start the RTC update work after runtime power management in the UFS
driver has been fully initialized. This patch fixes the following kernel
crash:

Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
Workqueue: events ufshcd_rtc_work
Call trace:
 _raw_spin_lock_irqsave+0x34/0x8c (P)
 pm_runtime_get_if_active+0x24/0x9c (L)
 pm_runtime_get_if_active+0x24/0x9c
 ufshcd_rtc_work+0x138/0x1b4
 process_one_work+0x148/0x288
 worker_thread+0x2cc/0x3d4
 kthread+0x110/0x114
 ret_from_fork+0x10/0x20

Reported-by: Neil Armstrong <[email protected]>
Closes: https://lore.kernel.org/linux-scsi/[email protected]/
Fixes: 6bf999e ("scsi: ufs: core: Add UFS RTC support")
Cc: Bean Huo <[email protected]>
Cc: [email protected]
Signed-off-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Peter Wang <[email protected]>
Reviewed-by: Bean Huo <[email protected]>
Tested-by: Neil Armstrong <[email protected]> # on SM8650-HDK
Signed-off-by: Martin K. Petersen <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
In unlikely event that we fail during sending the new VF GGTT
configuration to the GuC, we will free only the GGTT node data
struct but will miss to release the actual GGTT allocation.

This will later lead to list corruption, GGTT space leak and
finally risking crash when unloading the driver:

 [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-EIO)
 [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT

 [ ] list_add corruption. next->prev should be prev (ffff88813cfcd628), but was 0000000000000000. (next=ffff88813cfe2028).
 [ ] RIP: 0010:__list_add_valid_or_report+0x6b/0xb0
 [ ] Call Trace:
 [ ]  drm_mm_insert_node_in_range+0x2c0/0x4e0
 [ ]  xe_ggtt_node_insert+0x46/0x70 [xe]
 [ ]  pf_provision_vf_ggtt+0x7f5/0xa70 [xe]
 [ ]  xe_gt_sriov_pf_config_set_ggtt+0x5e/0x770 [xe]
 [ ]  ggtt_set+0x4b/0x70 [xe]
 [ ]  simple_attr_write_xsigned.constprop.0.isra.0+0xb0/0x110

 [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-ENOSPC)
 [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT

 [ ] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b7b: 0000 [#1] PREEMPT SMP NOPTI
 [ ] RIP: 0010:drm_mm_remove_node+0x1b7/0x390
 [ ] Call Trace:
 [ ]  <TASK>
 [ ]  ? die_addr+0x2e/0x80
 [ ]  ? exc_general_protection+0x1a1/0x3e0
 [ ]  ? asm_exc_general_protection+0x22/0x30
 [ ]  ? drm_mm_remove_node+0x1b7/0x390
 [ ]  ggtt_node_remove+0xa5/0xf0 [xe]
 [ ]  xe_ggtt_node_remove+0x35/0x70 [xe]
 [ ]  xe_ttm_bo_destroy+0x123/0x220 [xe]
 [ ]  intel_user_framebuffer_destroy+0x44/0x70 [xe]
 [ ]  intel_plane_destroy_state+0x3b/0xc0 [xe]
 [ ]  drm_atomic_state_default_clear+0x1cd/0x2f0
 [ ]  intel_atomic_state_clear+0x9/0x20 [xe]
 [ ]  __drm_atomic_state_free+0x1d/0xb0

Fix that by using pf_release_ggtt() on the error path, which now
works regardless if the node has GGTT allocation or not.

Fixes: 34e8042 ("drm/xe: Make xe_ggtt_node struct independent")
Signed-off-by: Michal Wajdeczko <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Matthew Brost <[email protected]>
Cc: Matthew Auld <[email protected]>
Reviewed-by: Matthew Brost <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 43b1dd2b550f0861ce80fbfffd5881b1b26272b1)
Signed-off-by: Lucas De Marchi <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
vp_modern_avq_cleanup() and vp_del_vqs() clean up admin vq
resources by virtio_pci_vq_info pointer. The info pointer of admin
vq is stored in vp_dev->admin_vq.info instead of vp_dev->vqs[].
Using the info pointer from vp_dev->vqs[] for admin vq causes a
kernel NULL pointer dereference bug.
In vp_modern_avq_cleanup() and vp_del_vqs(), get the info pointer
from vp_dev->admin_vq.info for admin vq to clean up the resources.
Also make info ptr as argument of vp_del_vq() to be symmetric with
vp_setup_vq().

vp_reset calls vp_modern_avq_cleanup, and causes the Call Trace:
==================================================================
BUG: kernel NULL pointer dereference, address:0000000000000000
...
CPU: 49 UID: 0 PID: 4439 Comm: modprobe Not tainted 6.11.0-rc5 #1
RIP: 0010:vp_reset+0x57/0x90 [virtio_pci]
Call Trace:
 <TASK>
...
 ? vp_reset+0x57/0x90 [virtio_pci]
 ? vp_reset+0x38/0x90 [virtio_pci]
 virtio_reset_device+0x1d/0x30
 remove_vq_common+0x1c/0x1a0 [virtio_net]
 virtnet_remove+0xa1/0xc0 [virtio_net]
 virtio_dev_remove+0x46/0xa0
...
 virtio_pci_driver_exit+0x14/0x810 [virtio_pci]
==================================================================

Fixes: 4c3b54a ("virtio_pci_modern: use completion instead of busy loop to wait on admin cmd result")
Signed-off-by: Feng Liu <[email protected]>
Signed-off-by: Jiri Pirko <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
In the error recovery path of mlx5_vdpa_dev_add(), the cleanup is
executed and at the end put_device() is called which ends up calling
mlx5_vdpa_free(). This function will execute the same cleanup all over
again. Most resources support being cleaned up twice, but the recent
mlx5_vdpa_destroy_mr_resources() doesn't.

This change drops the explicit cleanup from within the
mlx5_vdpa_dev_add() and lets mlx5_vdpa_free() do its work.

This issue was discovered while trying to add 2 vdpa devices with the
same name:
$> vdpa dev add name vdpa-0 mgmtdev auxiliary/mlx5_core.sf.2
$> vdpa dev add name vdpa-0 mgmtdev auxiliary/mlx5_core.sf.3

... yields the following dump:

  BUG: kernel NULL pointer dereference, address: 00000000000000b8
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: Oops: 0000 [#1] SMP
  CPU: 4 UID: 0 PID: 2811 Comm: vdpa Not tainted 6.12.0-rc6 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  RIP: 0010:destroy_workqueue+0xe/0x2a0
  Code: ...
  RSP: 0018:ffff88814920b9a8 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffff888105c10000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffff888100400168 RDI: 0000000000000000
  RBP: 0000000000000000 R08: ffff888100120c00 R09: ffffffff828578c0
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff888131fd99a0 R14: 0000000000000000 R15: ffff888105c10580
  FS:  00007fdfa6b4f740(0000) GS:ffff88852ca00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000000b8 CR3: 000000018db09006 CR4: 0000000000372eb0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   ? __die+0x20/0x60
   ? page_fault_oops+0x150/0x3e0
   ? exc_page_fault+0x74/0x130
   ? asm_exc_page_fault+0x22/0x30
   ? destroy_workqueue+0xe/0x2a0
   mlx5_vdpa_destroy_mr_resources+0x2b/0x40 [mlx5_vdpa]
   mlx5_vdpa_free+0x45/0x150 [mlx5_vdpa]
   vdpa_release_dev+0x1e/0x50 [vdpa]
   device_release+0x31/0x90
   kobject_put+0x8d/0x230
   mlx5_vdpa_dev_add+0x328/0x8b0 [mlx5_vdpa]
   vdpa_nl_cmd_dev_add_set_doit+0x2b8/0x4c0 [vdpa]
   genl_family_rcv_msg_doit+0xd0/0x120
   genl_rcv_msg+0x180/0x2b0
   ? __vdpa_alloc_device+0x1b0/0x1b0 [vdpa]
   ? genl_family_rcv_msg_dumpit+0xf0/0xf0
   netlink_rcv_skb+0x54/0x100
   genl_rcv+0x24/0x40
   netlink_unicast+0x1fc/0x2d0
   netlink_sendmsg+0x1e4/0x410
   __sock_sendmsg+0x38/0x60
   ? sockfd_lookup_light+0x12/0x60
   __sys_sendto+0x105/0x160
   ? __count_memcg_events+0x53/0xe0
   ? handle_mm_fault+0x100/0x220
   ? do_user_addr_fault+0x40d/0x620
   __x64_sys_sendto+0x20/0x30
   do_syscall_64+0x4c/0x100
   entry_SYSCALL_64_after_hwframe+0x4b/0x53
  RIP: 0033:0x7fdfa6c66b57
  Code: ...
  RSP: 002b:00007ffeace22998 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
  RAX: ffffffffffffffda RBX: 000055a498608350 RCX: 00007fdfa6c66b57
  RDX: 000000000000006c RSI: 000055a498608350 RDI: 0000000000000003
  RBP: 00007ffeace229c0 R08: 00007fdfa6d35200 R09: 000000000000000c
  R10: 0000000000000000 R11: 0000000000000202 R12: 000055a4986082a0
  R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffeace233f3
   </TASK>
  Modules linked in: ...
  CR2: 00000000000000b8

Fixes: 6211165 ("vdpa/mlx5: Postpone MR deletion")
Signed-off-by: Dragos Tatulea <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
Eric reported a division by zero splat in the MPTCP protocol:

Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 6094 Comm: syz-executor317 Not tainted
6.12.0-rc5-syzkaller-00291-g05b92660cdfe #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 09/13/2024
RIP: 0010:__tcp_select_window+0x5b4/0x1310 net/ipv4/tcp_output.c:3163
Code: f6 44 01 e3 89 df e8 9b 75 09 f8 44 39 f3 0f 8d 11 ff ff ff e8
0d 74 09 f8 45 89 f4 e9 04 ff ff ff e8 00 74 09 f8 44 89 f0 99 <f7> 7c
24 14 41 29 d6 45 89 f4 e9 ec fe ff ff e8 e8 73 09 f8 48 89
RSP: 0018:ffffc900041f7930 EFLAGS: 00010293
RAX: 0000000000017e67 RBX: 0000000000017e67 RCX: ffffffff8983314b
RDX: 0000000000000000 RSI: ffffffff898331b0 RDI: 0000000000000004
RBP: 00000000005d6000 R08: 0000000000000004 R09: 0000000000017e67
R10: 0000000000003e80 R11: 0000000000000000 R12: 0000000000003e80
R13: ffff888031d9b440 R14: 0000000000017e67 R15: 00000000002eb000
FS: 00007feb5d7f16c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feb5d8adbb8 CR3: 0000000074e4c000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__tcp_cleanup_rbuf+0x3e7/0x4b0 net/ipv4/tcp.c:1493
mptcp_rcv_space_adjust net/mptcp/protocol.c:2085 [inline]
mptcp_recvmsg+0x2156/0x2600 net/mptcp/protocol.c:2289
inet_recvmsg+0x469/0x6a0 net/ipv4/af_inet.c:885
sock_recvmsg_nosec net/socket.c:1051 [inline]
sock_recvmsg+0x1b2/0x250 net/socket.c:1073
__sys_recvfrom+0x1a5/0x2e0 net/socket.c:2265
__do_sys_recvfrom net/socket.c:2283 [inline]
__se_sys_recvfrom net/socket.c:2279 [inline]
__x64_sys_recvfrom+0xe0/0x1c0 net/socket.c:2279
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7feb5d857559
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007feb5d7f1208 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
RAX: ffffffffffffffda RBX: 00007feb5d8e1318 RCX: 00007feb5d857559
RDX: 000000800000000e RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007feb5d8e1310 R08: 0000000000000000 R09: ffffffff81000000
R10: 0000000000000100 R11: 0000000000000246 R12: 00007feb5d8e131c
R13: 00007feb5d8ae074 R14: 000000800000000e R15: 00000000fffffdef

and provided a nice reproducer.

The root cause is the current bad handling of racing disconnect.
After the blamed commit below, sk_wait_data() can return (with
error) with the underlying socket disconnected and a zero rcv_mss.

Catch the error and return without performing any additional
operations on the current socket.

Reported-by: Eric Dumazet <[email protected]>
Fixes: 419ce13 ("tcp: allow again tcp_disconnect() when threads are waiting")
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://patch.msgid.link/8c82ecf71662ecbc47bf390f9905de70884c9f2d.1731060874.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
syzbot and Daan report a NULL pointer crash in the new full swap cluster
reclaim work:

> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
> KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
> CPU: 1 UID: 0 PID: 51 Comm: kworker/1:1 Not tainted 6.12.0-rc6-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> Workqueue: events swap_reclaim_work
> RIP: 0010:__list_del_entry_valid_or_report+0x20/0x1c0 lib/list_debug.c:49
> Code: 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 89 fe 48 83 c7 08 48 83 ec 18 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 19 01 00 00 48 89 f2 48 8b 4e 08 48 b8 00 00 00
> RSP: 0018:ffffc90000bb7c30 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff88807b9ae078
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000008
> RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000001 R11: 000000000000004f R12: dffffc0000000000
> R13: ffffffffffffffb8 R14: ffff88807b9ae000 R15: ffffc90003af1000
> FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fffaca68fb8 CR3: 00000000791c8000 CR4: 00000000003526f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  __list_del_entry_valid include/linux/list.h:124 [inline]
>  __list_del_entry include/linux/list.h:215 [inline]
>  list_move_tail include/linux/list.h:310 [inline]
>  swap_reclaim_full_clusters+0x109/0x460 mm/swapfile.c:748
>  swap_reclaim_work+0x2e/0x40 mm/swapfile.c:779

The syzbot console output indicates a virtual environment where swapfile
is on a rotational device.  In this case, clusters aren't actually used,
and si->full_clusters is not initialized.  Daan's report is from qemu, so
likely rotational too.

Make sure to only schedule the cluster reclaim work when clusters are
actually in use.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]/
Link: systemd/systemd#35044
Fixes: 5168a68 ("mm, swap: avoid over reclaim of full clusters")
Reported-by: [email protected]
Signed-off-by: Johannes Weiner <[email protected]>
Reported-by: Daan De Meyer <[email protected]>
Cc: Kairui Song <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
sys-oak pushed a commit that referenced this issue Nov 22, 2024
The kdump kernel is broken on SME systems with CONFIG_IMA_KEXEC=y enabled.
Debugging traced the issue back to

  b69a2af ("x86/kexec: Carry forward IMA measurement log on kexec").

Testing was previously not conducted on SME systems with CONFIG_IMA_KEXEC
enabled, which led to the oversight, with the following incarnation:

...
  ima: No TPM chip found, activating TPM-bypass!
  Loading compiled-in module X.509 certificates
  Loaded X.509 cert 'Build time autogenerated kernel key: 18ae0bc7e79b64700122bb1d6a904b070fef2656'
  ima: Allocated hash algorithm: sha256
  Oops: general protection fault, probably for non-canonical address 0xcfacfdfe6660003e: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc2+ #14
  Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.20.0 05/03/2023
  RIP: 0010:ima_restore_measurement_list
  Call Trace:
   <TASK>
   ? show_trace_log_lvl
   ? show_trace_log_lvl
   ? ima_load_kexec_buffer
   ? __die_body.cold
   ? die_addr
   ? exc_general_protection
   ? asm_exc_general_protection
   ? ima_restore_measurement_list
   ? vprintk_emit
   ? ima_load_kexec_buffer
   ima_load_kexec_buffer
   ima_init
   ? __pfx_init_ima
   init_ima
   ? __pfx_init_ima
   do_one_initcall
   do_initcalls
   ? __pfx_kernel_init
   kernel_init_freeable
   kernel_init
   ret_from_fork
   ? __pfx_kernel_init
   ret_from_fork_asm
   </TASK>
  Modules linked in:
  ---[ end trace 0000000000000000 ]---
  ...
  Kernel panic - not syncing: Fatal exception
  Kernel Offset: disabled
  Rebooting in 10 seconds..

Adding debug printks showed that the stored addr and size of ima_kexec buffer
are not decrypted correctly like:

  ima: ima_load_kexec_buffer, buffer:0xcfacfdfe6660003e, size:0xe48066052d5df359

Three types of setup_data info

  — SETUP_EFI,
  - SETUP_IMA, and
  - SETUP_RNG_SEED

are passed to the kexec/kdump kernel. Only the ima_kexec buffer
experienced incorrect decryption. Debugging identified a bug in
early_memremap_is_setup_data(), where an incorrect range calculation
occurred due to the len variable in struct setup_data ended up only
representing the length of the data field, excluding the struct's size,
and thus leading to miscalculation.

Address a similar issue in memremap_is_setup_data() while at it.

  [ bp: Heavily massage. ]

Fixes: b3c72fc ("x86/boot: Introduce setup_indirect")
Signed-off-by: Baoquan He <[email protected]>
Signed-off-by: Borislav Petkov (AMD) <[email protected]>
Acked-by: Tom Lendacky <[email protected]>
Cc: <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
sys-oak pushed a commit that referenced this issue Nov 22, 2024
Accessing `mr_table->mfc_cache_list` is protected by an RCU lock. In the
following code flow, the RCU read lock is not held, causing the
following error when `RCU_PROVE` is not held. The same problem might
show up in the IPv6 code path.

	6.12.0-rc5-kbuilder-01145-gbac17284bdcb #33 Tainted: G            E    N
	-----------------------------
	net/ipv4/ipmr_base.c:313 RCU-list traversed in non-reader section!!

	rcu_scheduler_active = 2, debug_locks = 1
		   2 locks held by RetransmitAggre/3519:
		    #0: ffff88816188c6c0 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x8a/0x290
		    #1: ffffffff83fcf7a8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_dumpit+0x6b/0x90

	stack backtrace:
		    lockdep_rcu_suspicious
		    mr_table_dump
		    ipmr_rtm_dumproute
		    rtnl_dump_all
		    rtnl_dumpit
		    netlink_dump
		    __netlink_dump_start
		    rtnetlink_rcv_msg
		    netlink_rcv_skb
		    netlink_unicast
		    netlink_sendmsg

This is not a problem per see, since the RTNL lock is held here, so, it
is safe to iterate in the list without the RCU read lock, as suggested
by Eric.

To alleviate the concern, modify the code to use
list_for_each_entry_rcu() with the RTNL-held argument.

The annotation will raise an error only if RTNL or RCU read lock are
missing during iteration, signaling a legitimate problem, otherwise it
will avoid this false positive.

This will solve the IPv6 case as well, since ip6mr_rtm_dumproute() calls
this function as well.

Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
sys-oak pushed a commit that referenced this issue Dec 13, 2024
Occasionally i915 workaround list might be empty, for example, it
might be empty when i915 is in virtualization. In this case, count
in i915 workaround list is zero, and gt pointer is NULL. Function
should not access gt pointer.

This change fix a kernel panic which tries to access gt pointer
of struct i915_wa_list when list is empty:

[    2.391840] BUG: kernel NULL pointer dereference, address: 0000000000000018
[    2.392165] #PF: supervisor read access in kernel mode
[    2.392404] #PF: error_code(0x0000) - not-present page
[    2.392643] PGD 0 P4D 0
[    2.392798] Oops: 0000 [#1] PREEMPT SMP NOPTI
[    2.393004] CPU: 3 PID: 608 Comm: systemd-modules Tainted: G U 6.1.12-intel-ese-standard-lts #5
[    2.393493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[    2.393877] RIP: 0010:wa_list_apply+0x3a/0x18a [i915]
[    2.394238] Code: c0 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 18 4c 8b 27 48 c7 c7 5f 75 99
[    2.395105] RSP: 0018:ffffb74100b47a58 EFLAGS: 00010246
[    2.395349] RAX: 0000000000000000 RBX: ffff9141c89a3608 RCX: 0000000000000017
[    2.395678] RDX: ffff9141c89a3608 RSI: ffffffffc091ee28 RDI: ffffffffc099755f
[    2.396039] RBP: ffff9141c89a1f38 R08: 0000001b714c2676 R09: ffff9141c89a3668
[    2.396370] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    2.396700] R13: ffff9141c89a0000 R14: ffff9141d79d6780 R15: 0000000000000000
[    2.397058] FS:  00007fd92d4d0c40(0000) GS:ffff9141fbd80000(0000) knlGS:0000000000000000
[    2.397431] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.397697] CR2: 0000000000000018 CR3: 0000000103f4c006 CR4: 0000000000770ee0
[    2.398111] PKRU: 55555554
[    2.398242] Call Trace:
[    2.398362]  <TASK>
[    2.398465]  ? kvm_clock_get_cycles+0x14/0x30
[    2.398674]  ? ktime_get+0x35/0xa0
[    2.398842]  intel_gt_init_hw+0x7f/0x220 [i915]
[    2.399158]  intel_gt_resume+0xfe/0x220 [i915]
[    2.399435]  intel_gt_init+0x14d/0x350 [i915]
[    2.399703]  i915_gem_init+0x193/0x200 [i915]
[    2.400000]  i915_driver_probe+0x6bd/0xce0 [i915]

Fixes: 0957e93 ("drm/i915: Add GT support for multiple types of multicast steering")
Signed-off-by: Junxiao Chang <[email protected]>
Signed-off-by: Kris Pan <[email protected]>
bbaa-bbaa pushed a commit to bbaa-bbaa/intel-mainline-tracking that referenced this issue Jan 3, 2025
[ Upstream commit 2b33eb8f1b3e8c2f87cfdbc8cc117f6bdfabc6ec ]

link down work may be scheduled before lgr freed but execute
after lgr freed, which may result in crash. So it is need to
hold a reference before shedule link down work, and put the
reference after work executed or canceled.

The relevant crash call stack as follows:
 list_del corruption. prev->next should be ffffb638c9c0fe20,
    but was 0000000000000000
 ------------[ cut here ]------------
 kernel BUG at lib/list_debug.c:51!
 invalid opcode: 0000 [intel#1] SMP NOPTI
 CPU: 6 PID: 978112 Comm: kworker/6:119 Kdump: loaded Tainted: G intel#1
 Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 2221b89 04/01/2014
 Workqueue: events smc_link_down_work [smc]
 RIP: 0010:__list_del_entry_valid.cold+0x31/0x47
 RSP: 0018:ffffb638c9c0fdd8 EFLAGS: 00010086
 RAX: 0000000000000054 RBX: ffff942fb75e5128 RCX: 0000000000000000
 RDX: ffff943520930aa0 RSI: ffff94352091fc80 RDI: ffff94352091fc80
 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffb638c9c0fc38
 R10: ffffb638c9c0fc30 R11: ffffffffa015eb28 R12: 0000000000000002
 R13: ffffb638c9c0fe20 R14: 0000000000000001 R15: ffff942f9cd051c0
 FS:  0000000000000000(0000) GS:ffff943520900000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f4f25214000 CR3: 000000025fbae004 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  rwsem_down_write_slowpath+0x17e/0x470
  smc_link_down_work+0x3c/0x60 [smc]
  process_one_work+0x1ac/0x350
  worker_thread+0x49/0x2f0
  ? rescuer_thread+0x360/0x360
  kthread+0x118/0x140
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x1f/0x30

Fixes: 541afa1 ("net/smc: add smcr_port_err() and smcr_link_down() processing")
Signed-off-by: Guangguan Wang <[email protected]>
Reviewed-by: Tony Lu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
bbaa-bbaa pushed a commit to bbaa-bbaa/intel-mainline-tracking that referenced this issue Jan 3, 2025
commit 429fde2d81bcef0ebab002215358955704586457 upstream.

syzbot reported the following crash [1]

Issue came with the blamed commit. Instead of going through
all the iov components, we keep using the first one
and end up with a malformed skb.

[1]

kernel BUG at net/core/skbuff.c:2849 !
Oops: invalid opcode: 0000 [intel#1] PREEMPT SMP KASAN PTI
CPU: 0 UID: 0 PID: 6230 Comm: syz-executor132 Not tainted 6.13.0-rc1-syzkaller-00407-g96b6fcc0ee41 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/25/2024
 RIP: 0010:__pskb_pull_tail+0x1568/0x1570 net/core/skbuff.c:2848
Code: 38 c1 0f 8c 32 f1 ff ff 4c 89 f7 e8 92 96 74 f8 e9 25 f1 ff ff e8 e8 ae 09 f8 48 8b 5c 24 08 e9 eb fb ff ff e8 d9 ae 09 f8 90 <0f> 0b 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffc90004cbef30 EFLAGS: 00010293
RAX: ffffffff8995c347 RBX: 00000000fffffff2 RCX: ffff88802cf45a00
RDX: 0000000000000000 RSI: 00000000fffffff2 RDI: 0000000000000000
RBP: ffff88807df0c06a R08: ffffffff8995b084 R09: 1ffff1100fbe185c
R10: dffffc0000000000 R11: ffffed100fbe185d R12: ffff888076e85d50
R13: ffff888076e85c80 R14: ffff888076e85cf4 R15: ffff888076e85c80
FS:  00007f0dca6ea6c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0dca6ead58 CR3: 00000000119da000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
  skb_cow_data+0x2da/0xcb0 net/core/skbuff.c:5284
  tipc_aead_decrypt net/tipc/crypto.c:894 [inline]
  tipc_crypto_rcv+0x402/0x24e0 net/tipc/crypto.c:1844
  tipc_rcv+0x57e/0x12a0 net/tipc/node.c:2109
  tipc_l2_rcv_msg+0x2bd/0x450 net/tipc/bearer.c:668
  __netif_receive_skb_list_ptype net/core/dev.c:5720 [inline]
  __netif_receive_skb_list_core+0x8b7/0x980 net/core/dev.c:5762
  __netif_receive_skb_list net/core/dev.c:5814 [inline]
  netif_receive_skb_list_internal+0xa51/0xe30 net/core/dev.c:5905
  gro_normal_list include/net/gro.h:515 [inline]
  napi_complete_done+0x2b5/0x870 net/core/dev.c:6256
  napi_complete include/linux/netdevice.h:567 [inline]
  tun_get_user+0x2ea0/0x4890 drivers/net/tun.c:1982
  tun_chr_write_iter+0x10d/0x1f0 drivers/net/tun.c:2057
 do_iter_readv_writev+0x600/0x880
  vfs_writev+0x376/0xba0 fs/read_write.c:1050
  do_writev+0x1b6/0x360 fs/read_write.c:1096
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: de4f5fe ("iov_iter: add iter_iovec() helper")
Reported-by: [email protected]
Closes: https://lore.kernel.org/netdev/[email protected]/T/#u
Cc: [email protected]
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Joe Damato <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
bbaa-bbaa pushed a commit to bbaa-bbaa/intel-mainline-tracking that referenced this issue Jan 3, 2025
commit 07a756a49f4b4290b49ea46e089cbe6f79ff8d26 upstream.

If the KVP (or VSS) daemon starts before the VMBus channel's ringbuffer is
fully initialized, we can hit the panic below:

hv_utils: Registering HyperV Utility Driver
hv_vmbus: registering driver hv_utils
...
BUG: kernel NULL pointer dereference, address: 0000000000000000
CPU: 44 UID: 0 PID: 2552 Comm: hv_kvp_daemon Tainted: G E 6.11.0-rc3+ intel#1
RIP: 0010:hv_pkt_iter_first+0x12/0xd0
Call Trace:
...
 vmbus_recvpacket
 hv_kvp_onchannelcallback
 vmbus_on_event
 tasklet_action_common
 tasklet_action
 handle_softirqs
 irq_exit_rcu
 sysvec_hyperv_stimer0
 </IRQ>
 <TASK>
 asm_sysvec_hyperv_stimer0
...
 kvp_register_done
 hvt_op_read
 vfs_read
 ksys_read
 __x64_sys_read

This can happen because the KVP/VSS channel callback can be invoked
even before the channel is fully opened:
1) as soon as hv_kvp_init() -> hvutil_transport_init() creates
/dev/vmbus/hv_kvp, the kvp daemon can open the device file immediately and
register itself to the driver by writing a message KVP_OP_REGISTER1 to the
file (which is handled by kvp_on_msg() ->kvp_handle_handshake()) and
reading the file for the driver's response, which is handled by
hvt_op_read(), which calls hvt->on_read(), i.e. kvp_register_done().

2) the problem with kvp_register_done() is that it can cause the
channel callback to be called even before the channel is fully opened,
and when the channel callback is starting to run, util_probe()->
vmbus_open() may have not initialized the ringbuffer yet, so the
callback can hit the panic of NULL pointer dereference.

To reproduce the panic consistently, we can add a "ssleep(10)" for KVP in
__vmbus_open(), just before the first hv_ringbuffer_init(), and then we
unload and reload the driver hv_utils, and run the daemon manually within
the 10 seconds.

Fix the panic by reordering the steps in util_probe() so the char dev
entry used by the KVP or VSS daemon is not created until after
vmbus_open() has completed. This reordering prevents the race condition
from happening.

Reported-by: Dexuan Cui <[email protected]>
Fixes: e0fa3e5 ("Drivers: hv: utils: fix a race on userspace daemons registration")
Cc: [email protected]
Signed-off-by: Michael Kelley <[email protected]>
Acked-by: Wei Liu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wei Liu <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
bbaa-bbaa pushed a commit to bbaa-bbaa/intel-mainline-tracking that referenced this issue Jan 3, 2025
commit 3e74859ee35edc33a022c3f3971df066ea0ca6b9 upstream.

When we call btrfs_read_folio() to bring a folio uptodate, we unlock the
folio. The result of that is that a different thread can modify the
mapping (like remove it with invalidate) before we call folio_lock().
This results in an invalid page and we need to try again.

In particular, if we are relocating concurrently with aborting a
transaction, this can result in a crash like the following:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  PGD 0 P4D 0
  Oops: 0000 [intel#1] SMP
  CPU: 76 PID: 1411631 Comm: kworker/u322:5
  Workqueue: events_unbound btrfs_reclaim_bgs_work
  RIP: 0010:set_page_extent_mapped+0x20/0xb0
  RSP: 0018:ffffc900516a7be8 EFLAGS: 00010246
  RAX: ffffea009e851d08 RBX: ffffea009e0b1880 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: ffffc900516a7b90 RDI: ffffea009e0b1880
  RBP: 0000000003573000 R08: 0000000000000001 R09: ffff88c07fd2f3f0
  R10: 0000000000000000 R11: 0000194754b575be R12: 0000000003572000
  R13: 0000000003572fff R14: 0000000000100cca R15: 0000000005582fff
  FS:  0000000000000000(0000) GS:ffff88c07fd00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000407d00f002 CR4: 00000000007706f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  PKRU: 55555554
  Call Trace:
  <TASK>
  ? __die+0x78/0xc0
  ? page_fault_oops+0x2a8/0x3a0
  ? __switch_to+0x133/0x530
  ? wq_worker_running+0xa/0x40
  ? exc_page_fault+0x63/0x130
  ? asm_exc_page_fault+0x22/0x30
  ? set_page_extent_mapped+0x20/0xb0
  relocate_file_extent_cluster+0x1a7/0x940
  relocate_data_extent+0xaf/0x120
  relocate_block_group+0x20f/0x480
  btrfs_relocate_block_group+0x152/0x320
  btrfs_relocate_chunk+0x3d/0x120
  btrfs_reclaim_bgs_work+0x2ae/0x4e0
  process_scheduled_works+0x184/0x370
  worker_thread+0xc6/0x3e0
  ? blk_add_timer+0xb0/0xb0
  kthread+0xae/0xe0
  ? flush_tlb_kernel_range+0x90/0x90
  ret_from_fork+0x2f/0x40
  ? flush_tlb_kernel_range+0x90/0x90
  ret_from_fork_asm+0x11/0x20
  </TASK>

This occurs because cleanup_one_transaction() calls
destroy_delalloc_inodes() which calls invalidate_inode_pages2() which
takes the folio_lock before setting mapping to NULL. We fail to check
this, and subsequently call set_extent_mapping(), which assumes that
mapping != NULL (in fact it asserts that in debug mode)

Note that the "fixes" patch here is not the one that introduced the
race (the very first iteration of this code from 2009) but a more recent
change that made this particular crash happen in practice.

Fixes: e7f1326 ("btrfs: set page extent mapped after read_folio in relocate_one_page")
CC: [email protected] # 6.1+
Reviewed-by: Qu Wenruo <[email protected]>
Signed-off-by: Boris Burkov <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
bbaa-bbaa pushed a commit to bbaa-bbaa/intel-mainline-tracking that referenced this issue Jan 3, 2025
Occasionally i915 workaround list might be empty, for example, it
might be empty when i915 is in virtualization. In this case, count
in i915 workaround list is zero, and gt pointer is NULL. Function
should not access gt pointer.

This change fix a kernel panic which tries to access gt pointer
of struct i915_wa_list when list is empty:

[    2.391840] BUG: kernel NULL pointer dereference, address: 0000000000000018
[    2.392165] #PF: supervisor read access in kernel mode
[    2.392404] #PF: error_code(0x0000) - not-present page
[    2.392643] PGD 0 P4D 0
[    2.392798] Oops: 0000 [intel#1] PREEMPT SMP NOPTI
[    2.393004] CPU: 3 PID: 608 Comm: systemd-modules Tainted: G U 6.1.12-intel-ese-standard-lts #5
[    2.393493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[    2.393877] RIP: 0010:wa_list_apply+0x3a/0x18a [i915]
[    2.394238] Code: c0 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 18 4c 8b 27 48 c7 c7 5f 75 99
[    2.395105] RSP: 0018:ffffb74100b47a58 EFLAGS: 00010246
[    2.395349] RAX: 0000000000000000 RBX: ffff9141c89a3608 RCX: 0000000000000017
[    2.395678] RDX: ffff9141c89a3608 RSI: ffffffffc091ee28 RDI: ffffffffc099755f
[    2.396039] RBP: ffff9141c89a1f38 R08: 0000001b714c2676 R09: ffff9141c89a3668
[    2.396370] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    2.396700] R13: ffff9141c89a0000 R14: ffff9141d79d6780 R15: 0000000000000000
[    2.397058] FS:  00007fd92d4d0c40(0000) GS:ffff9141fbd80000(0000) knlGS:0000000000000000
[    2.397431] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.397697] CR2: 0000000000000018 CR3: 0000000103f4c006 CR4: 0000000000770ee0
[    2.398111] PKRU: 55555554
[    2.398242] Call Trace:
[    2.398362]  <TASK>
[    2.398465]  ? kvm_clock_get_cycles+0x14/0x30
[    2.398674]  ? ktime_get+0x35/0xa0
[    2.398842]  intel_gt_init_hw+0x7f/0x220 [i915]
[    2.399158]  intel_gt_resume+0xfe/0x220 [i915]
[    2.399435]  intel_gt_init+0x14d/0x350 [i915]
[    2.399703]  i915_gem_init+0x193/0x200 [i915]
[    2.400000]  i915_driver_probe+0x6bd/0xce0 [i915]

Fixes: 0957e93 ("drm/i915: Add GT support for multiple types of multicast steering")
Signed-off-by: Junxiao Chang <[email protected]>
Signed-off-by: Kris Pan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant