cpu lockup when cpu idle #13

liyi-ibm · 2018-12-13T03:48:36Z

@vaidy Is it a CPU hard lockup?
[Tue Dec 11 23:30:19 2018] Watchdog CPU:97 Hard LOCKUP
[Tue Dec 11 23:30:19 2018] Modules linked in: i2c_dev ses enclosure scsi_transport_sas ipmi_powernv at24 ipmi_devintf ofpart ipmi_msghandler powernv_flash i2c_opal mtd opal_prd nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc xfs libcrc32c joydev ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e i2c_core aacraid ptp pps_core
[Tue Dec 11 23:30:19 2018] CPU: 97 PID: 0 Comm: swapper/97 Not tainted 4.14.49-4.ppc64le #1
[Tue Dec 11 23:30:19 2018] task: c000201cb740c180 task.stack: c000201cbd158000
[Tue Dec 11 23:30:19 2018] NIP:  c00000000000b464 LR: c000000000016224 CTR: c000000000141910
[Tue Dec 11 23:30:19 2018] REGS: c000201cbd15b6e0 TRAP: 0e81   Not tainted  (4.14.49-4.ppc64le)
[Tue Dec 11 23:30:19 2018] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42004284  XER: 20040000
[Tue Dec 11 23:30:19 2018] CFAR: c000000000138d90 SOFTE: 1
GPR00: c000000000af9e38 c000201cbd15b960 c0000000013f3b00 0000000028000000
GPR04: 0000000000000000 0000000000263ba4 0000000000000001 0000000000000000
GPR08: 0000000000000000 0000000000000378 c0002009d766a800 c000201cc6f45910
GPR12: 0000000000000000 c00000000fd82b00
[Tue Dec 11 23:30:19 2018] NIP [c00000000000b464] replay_interrupt_return+0x0/0x4
[Tue Dec 11 23:30:19 2018] LR [c000000000016224] arch_local_irq_restore.part.12+0x84/0xb0
[Tue Dec 11 23:30:19 2018] Call Trace:
[Tue Dec 11 23:30:19 2018] [c000201cbd15b960] [c000000000140864] vtime_account_irq_enter+0x64/0x80 (unreliable)
[Tue Dec 11 23:30:19 2018] [c000201cbd15b980] [c000000000af9e38] __do_softirq+0x138/0x424
[Tue Dec 11 23:30:19 2018] [c000201cbd15ba70] [c000000000105d28] irq_exit+0x138/0x150
[Tue Dec 11 23:30:19 2018] [c000201cbd15ba90] [c0000000000245ac] timer_interrupt+0xac/0xe0
[Tue Dec 11 23:30:19 2018] [c000201cbd15bac0] [c0000000000094f0] decrementer_common+0x180/0x190
[Tue Dec 11 23:30:19 2018] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
   LR = arch_local_irq_restore.part.12+0x84/0xb0
[Tue Dec 11 23:30:19 2018] [c000201cbd15bdb0] [c000201cbd15be30] 0xc000201cbd15be30 (unreliable)
[Tue Dec 11 23:30:19 2018] [c000201cbd15bdd0] [c0000000008dd678] cpuidle_enter_state+0x128/0x410
[Tue Dec 11 23:30:19 2018] [c000201cbd15be30] [c00000000015df9c] call_cpuidle+0x4c/0x90
[Tue Dec 11 23:30:19 2018] [c000201cbd15be50] [c00000000015e3b0] do_idle+0x2c0/0x370
[Tue Dec 11 23:30:19 2018] [c000201cbd15bec0] [c00000000015e64c] cpu_startup_entry+0x3c/0x50
[Tue Dec 11 23:30:19 2018] [c000201cbd15bef0] [c00000000004938c] start_secondary+0x4ec/0x530
[Tue Dec 11 23:30:19 2018] [c000201cbd15bf90] [c00000000000b86c] start_secondary_prolog+0x10/0x14
[Tue Dec 11 23:30:19 2018] Instruction dump:
[Tue Dec 11 23:30:19 2018] 7d8000a6 e9628008 7d200026 618c8000 2c030900 4182df28 2c030500 4182ee70
[Tue Dec 11 23:30:19 2018] 2c030a00 4182ffa4 2c030e60 4182eb40 <4e800020> 7c781b78 48000419 48000431
[Tue Dec 11 23:30:19 2018] Watchdog CPU:97 became unstuck

The text was updated successfully, but these errors were encountered:

liyi-ibm · 2018-12-13T03:50:02Z

N: that's actually a known bug, somewhere in the timer subsystem I don't think we have a root cause because it's difficult to reproduce or trace. The watchdog timer gets lost when the CPU goes idle. It seems to be pretty harmless so we can ignore it for now.

V: This did not look to be related to cgroup cfs balancing. I did not know about the new scenario that you mentioned. I will work this as an independent issue.

N: No it's not, it's some bug we haven't been able to track down. It happens without cgroups at all. It doesn't appear to be too harmful though, doesn't seem to cause real lockups.

liyi-ibm · 2018-12-13T03:55:36Z

V: The cfs settings can help in solving the lockup caused by cfs scheduler. You have not yet hit that, but you will if you run long enough.

hard lockup only means a cpu was doing something for very long time (10 seconds) and not run any workload. We are solving each scenario where cpu spend lots of time. So the hard lock message is a way to observe what the cpu is doing (memory allocation work or cpu scheduler, etc)
The above message you got is a missed timer. the cpu was really idle and does not point to a problem where it was stuck doing some kernel work. Hence that can be ignored. You should continue your tests with different load and utilization to be sure that we can hit the problem and also solve it with different settings.

commit 23edca8 upstream. There is a problem if we are going to unmap a rbd device and the watch_dwork is going to queue delayed work for watch: unmap Thread watch Thread timer do_rbd_remove cancel_tasks_sync(rbd_dev) queue_delayed_work for watch destroy_workqueue(rbd_dev->task_wq) drain_workqueue(wq) destroy other resources in wq call_timer_fn __queue_work() Then the delayed work escape the cancel_tasks_sync() and destroy_workqueue() and we will get an user-after-free call trace: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI Modules linked in: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 4.17.0-rc6+ #13 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__queue_work+0x6a/0x3b0 RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086 RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000 RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00 R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970 R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0 Call Trace: <IRQ> ? __queue_work+0x3b0/0x3b0 call_timer_fn+0x2d/0x130 run_timer_softirq+0x16e/0x430 ? tick_sched_timer+0x37/0x70 __do_softirq+0xd2/0x280 irq_exit+0xd5/0xe0 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 [ Move rbd_dev->watch_dwork cancellation so that rbd_reregister_watch() either bails out early because the watch is UNREGISTERED at that point or just gets cancelled. ] Cc: [email protected] Fixes: 99d1694 ("rbd: retry watch re-registration periodically") Signed-off-by: Dongsheng Yang <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 401c636 ] When we get a hung task it can often be valuable to see _all_ the hung tasks on the system before calling panic(). Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&id=5316056503549952 ---------------------------------------- INFO: task syz-executor0:6540 blocked for more than 120 seconds. Not tainted 4.16.0+ #13 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. syz-executor0 D23560 6540 4521 0x80000004 Call Trace: context_switch kernel/sched/core.c:2848 [inline] __schedule+0x8fb/0x1ef0 kernel/sched/core.c:3490 schedule+0xf5/0x430 kernel/sched/core.c:3549 schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3607 __mutex_lock_common kernel/locking/mutex.c:833 [inline] __mutex_lock+0xb7f/0x1810 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 __blkdev_driver_ioctl block/ioctl.c:303 [inline] blkdev_ioctl+0x1759/0x1e00 block/ioctl.c:601 ioctl_by_bdev+0xa5/0x110 fs/block_dev.c:2060 isofs_get_last_session fs/isofs/inode.c:567 [inline] isofs_fill_super+0x2ba9/0x3bc0 fs/isofs/inode.c:660 mount_bdev+0x2b7/0x370 fs/super.c:1119 isofs_mount+0x34/0x40 fs/isofs/inode.c:1560 mount_fs+0x66/0x2d0 fs/super.c:1222 vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037 vfs_kern_mount fs/namespace.c:2514 [inline] do_new_mount fs/namespace.c:2517 [inline] do_mount+0xea4/0x2b90 fs/namespace.c:2847 ksys_mount+0xab/0x120 fs/namespace.c:3063 SYSC_mount fs/namespace.c:3077 [inline] SyS_mount+0x39/0x50 fs/namespace.c:3074 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 (...snipped...) Showing all locks held in the system: (...snipped...) 2 locks held by syz-executor0/6540: #0: 00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: alloc_super fs/super.c:211 [inline] #0: 00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: sget_userns+0x3b2/0xe60 fs/super.c:502 /* down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); */ #1: 0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ (...snipped...) 3 locks held by syz-executor7/6541: #0: 0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */ #1: 000000007bf3d3f9 (&bdev->bd_mutex){+.+.}, at: blkdev_reread_part+0x1e/0x40 block/ioctl.c:192 #2: 00000000566d4c39 (&type->s_umount_key#50){.+.+}, at: __get_super.part.10+0x1d3/0x280 fs/super.c:663 /* down_read(&sb->s_umount); */ ---------------------------------------- When reporting an AB-BA deadlock like shown above, it would be nice if trace of PID=6541 is printed as well as trace of PID=6540 before calling panic(). Showing hung tasks up to /proc/sys/kernel/hung_task_warnings could delay calling panic() but normally there should not be so many hung tasks. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tetsuo Handa <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Acked-by: Dmitry Vyukov <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Mandeep Singh Baines <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit ecd6053 ] Booting a ColdFire m68k core with MMU enabled causes a "bad page state" oops since commit 1d40a5e ("mm: mark pages in use for page tables"): BUG: Bad page state in process sh pfn:01ce2 page:004fefc8 count:0 mapcount:-1024 mapping:00000000 index:0x0 flags: 0x0() raw: 00000000 00000000 00000000 fffffbff 00000000 00000100 00000200 00000000 raw: 039c4000 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 22 Comm: sh Not tainted 4.17.0-07461-g1d40a5ea01d5 #13 Fix by calling pgtable_page_dtor() in our __pte_free_tlb() code path, so that the PG_table flag is cleared before we free the pte page. Note that I had to change the type of pte_free() to be static from extern. Otherwise you get a lot of warnings like this: ./arch/m68k/include/asm/mcf_pgalloc.h:80:2: warning: ‘pgtable_page_dtor’ is static but used in inline function ‘pte_free’ which is not static pgtable_page_dtor(page); ^ And making it static is consistent with our use of this in the other m68k pgalloc definitions of pte_free(). Signed-off-by: Greg Ungerer <[email protected]> CC: Matthew Wilcox <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu lockup when cpu idle #13

cpu lockup when cpu idle #13

liyi-ibm commented Dec 13, 2018

liyi-ibm commented Dec 13, 2018

liyi-ibm commented Dec 13, 2018

cpu lockup when cpu idle #13

cpu lockup when cpu idle #13

Comments

liyi-ibm commented Dec 13, 2018

liyi-ibm commented Dec 13, 2018

liyi-ibm commented Dec 13, 2018