Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu lockup when cpu idle #13

Open
liyi-ibm opened this issue Dec 13, 2018 · 2 comments
Open

cpu lockup when cpu idle #13

liyi-ibm opened this issue Dec 13, 2018 · 2 comments

Comments

@liyi-ibm
Copy link
Owner

@vaidy Is it a CPU hard lockup?
[Tue Dec 11 23:30:19 2018] Watchdog CPU:97 Hard LOCKUP
[Tue Dec 11 23:30:19 2018] Modules linked in: i2c_dev ses enclosure scsi_transport_sas ipmi_powernv at24 ipmi_devintf ofpart ipmi_msghandler powernv_flash i2c_opal mtd opal_prd nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc xfs libcrc32c joydev ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e i2c_core aacraid ptp pps_core
[Tue Dec 11 23:30:19 2018] CPU: 97 PID: 0 Comm: swapper/97 Not tainted 4.14.49-4.ppc64le #1
[Tue Dec 11 23:30:19 2018] task: c000201cb740c180 task.stack: c000201cbd158000
[Tue Dec 11 23:30:19 2018] NIP:  c00000000000b464 LR: c000000000016224 CTR: c000000000141910
[Tue Dec 11 23:30:19 2018] REGS: c000201cbd15b6e0 TRAP: 0e81   Not tainted  (4.14.49-4.ppc64le)
[Tue Dec 11 23:30:19 2018] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 42004284  XER: 20040000
[Tue Dec 11 23:30:19 2018] CFAR: c000000000138d90 SOFTE: 1
GPR00: c000000000af9e38 c000201cbd15b960 c0000000013f3b00 0000000028000000
GPR04: 0000000000000000 0000000000263ba4 0000000000000001 0000000000000000
GPR08: 0000000000000000 0000000000000378 c0002009d766a800 c000201cc6f45910
GPR12: 0000000000000000 c00000000fd82b00
[Tue Dec 11 23:30:19 2018] NIP [c00000000000b464] replay_interrupt_return+0x0/0x4
[Tue Dec 11 23:30:19 2018] LR [c000000000016224] arch_local_irq_restore.part.12+0x84/0xb0
[Tue Dec 11 23:30:19 2018] Call Trace:
[Tue Dec 11 23:30:19 2018] [c000201cbd15b960] [c000000000140864] vtime_account_irq_enter+0x64/0x80 (unreliable)
[Tue Dec 11 23:30:19 2018] [c000201cbd15b980] [c000000000af9e38] __do_softirq+0x138/0x424
[Tue Dec 11 23:30:19 2018] [c000201cbd15ba70] [c000000000105d28] irq_exit+0x138/0x150
[Tue Dec 11 23:30:19 2018] [c000201cbd15ba90] [c0000000000245ac] timer_interrupt+0xac/0xe0
[Tue Dec 11 23:30:19 2018] [c000201cbd15bac0] [c0000000000094f0] decrementer_common+0x180/0x190
[Tue Dec 11 23:30:19 2018] --- interrupt: 901 at replay_interrupt_return+0x0/0x4
   LR = arch_local_irq_restore.part.12+0x84/0xb0
[Tue Dec 11 23:30:19 2018] [c000201cbd15bdb0] [c000201cbd15be30] 0xc000201cbd15be30 (unreliable)
[Tue Dec 11 23:30:19 2018] [c000201cbd15bdd0] [c0000000008dd678] cpuidle_enter_state+0x128/0x410
[Tue Dec 11 23:30:19 2018] [c000201cbd15be30] [c00000000015df9c] call_cpuidle+0x4c/0x90
[Tue Dec 11 23:30:19 2018] [c000201cbd15be50] [c00000000015e3b0] do_idle+0x2c0/0x370
[Tue Dec 11 23:30:19 2018] [c000201cbd15bec0] [c00000000015e64c] cpu_startup_entry+0x3c/0x50
[Tue Dec 11 23:30:19 2018] [c000201cbd15bef0] [c00000000004938c] start_secondary+0x4ec/0x530
[Tue Dec 11 23:30:19 2018] [c000201cbd15bf90] [c00000000000b86c] start_secondary_prolog+0x10/0x14
[Tue Dec 11 23:30:19 2018] Instruction dump:
[Tue Dec 11 23:30:19 2018] 7d8000a6 e9628008 7d200026 618c8000 2c030900 4182df28 2c030500 4182ee70
[Tue Dec 11 23:30:19 2018] 2c030a00 4182ffa4 2c030e60 4182eb40 <4e800020> 7c781b78 48000419 48000431
[Tue Dec 11 23:30:19 2018] Watchdog CPU:97 became unstuck
@liyi-ibm
Copy link
Owner Author

N: that's actually a known bug, somewhere in the timer subsystem I don't think we have a root cause because it's difficult to reproduce or trace. The watchdog timer gets lost when the CPU goes idle. It seems to be pretty harmless so we can ignore it for now.

V: This did not look to be related to cgroup cfs balancing. I did not know about the new scenario that you mentioned. I will work this as an independent issue.

N: No it's not, it's some bug we haven't been able to track down. It happens without cgroups at all. It doesn't appear to be too harmful though, doesn't seem to cause real lockups.

@liyi-ibm
Copy link
Owner Author

V: The cfs settings can help in solving the lockup caused by cfs scheduler. You have not yet hit that, but you will if you run long enough.

hard lockup only means a cpu was doing something for very long time (10 seconds) and not run any workload. We are solving each scenario where cpu spend lots of time. So the hard lock message is a way to observe what the cpu is doing (memory allocation work or cpu scheduler, etc)
The above message you got is a missed timer. the cpu was really idle and does not point to a problem where it was stuck doing some kernel work. Hence that can be ignored. You should continue your tests with different load and utilization to be sure that we can hit the problem and also solve it with different settings.

liyi-ibm pushed a commit that referenced this issue Dec 28, 2018
commit 23edca8 upstream.

There is a problem if we are going to unmap a rbd device and the
watch_dwork is going to queue delayed work for watch:

unmap Thread                    watch Thread                  timer
do_rbd_remove
  cancel_tasks_sync(rbd_dev)
                                queue_delayed_work for watch
  destroy_workqueue(rbd_dev->task_wq)
    drain_workqueue(wq)
    destroy other resources in wq
                                                              call_timer_fn
                                                                __queue_work()

Then the delayed work escape the cancel_tasks_sync() and
destroy_workqueue() and we will get an user-after-free call trace:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP PTI
  Modules linked in:
  CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           OE     4.17.0-rc6+ #13
  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
  RIP: 0010:__queue_work+0x6a/0x3b0
  RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086
  RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000
  RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00
  R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970
  R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007
  FS:  0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0
  Call Trace:
   <IRQ>
   ? __queue_work+0x3b0/0x3b0
   call_timer_fn+0x2d/0x130
   run_timer_softirq+0x16e/0x430
   ? tick_sched_timer+0x37/0x70
   __do_softirq+0xd2/0x280
   irq_exit+0xd5/0xe0
   smp_apic_timer_interrupt+0x6c/0x130
   apic_timer_interrupt+0xf/0x20

[ Move rbd_dev->watch_dwork cancellation so that rbd_reregister_watch()
  either bails out early because the watch is UNREGISTERED at that point
  or just gets cancelled. ]

Cc: [email protected]
Fixes: 99d1694 ("rbd: retry watch re-registration periodically")
Signed-off-by: Dongsheng Yang <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
liyi-ibm pushed a commit that referenced this issue Dec 28, 2018
[ Upstream commit 401c636 ]

When we get a hung task it can often be valuable to see _all_ the hung
tasks on the system before calling panic().

Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&id=5316056503549952
----------------------------------------
INFO: task syz-executor0:6540 blocked for more than 120 seconds.
      Not tainted 4.16.0+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor0   D23560  6540   4521 0x80000004
Call Trace:
 context_switch kernel/sched/core.c:2848 [inline]
 __schedule+0x8fb/0x1ef0 kernel/sched/core.c:3490
 schedule+0xf5/0x430 kernel/sched/core.c:3549
 schedule_preempt_disabled+0x10/0x20 kernel/sched/core.c:3607
 __mutex_lock_common kernel/locking/mutex.c:833 [inline]
 __mutex_lock+0xb7f/0x1810 kernel/locking/mutex.c:893
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
 lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
 __blkdev_driver_ioctl block/ioctl.c:303 [inline]
 blkdev_ioctl+0x1759/0x1e00 block/ioctl.c:601
 ioctl_by_bdev+0xa5/0x110 fs/block_dev.c:2060
 isofs_get_last_session fs/isofs/inode.c:567 [inline]
 isofs_fill_super+0x2ba9/0x3bc0 fs/isofs/inode.c:660
 mount_bdev+0x2b7/0x370 fs/super.c:1119
 isofs_mount+0x34/0x40 fs/isofs/inode.c:1560
 mount_fs+0x66/0x2d0 fs/super.c:1222
 vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:2514 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0xea4/0x2b90 fs/namespace.c:2847
 ksys_mount+0xab/0x120 fs/namespace.c:3063
 SYSC_mount fs/namespace.c:3077 [inline]
 SyS_mount+0x39/0x50 fs/namespace.c:3074
 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
(...snipped...)
Showing all locks held in the system:
(...snipped...)
2 locks held by syz-executor0/6540:
 #0: 00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: alloc_super fs/super.c:211 [inline]
 #0: 00000000566d4c39 (&type->s_umount_key#49/1){+.+.}, at: sget_userns+0x3b2/0xe60 fs/super.c:502 /* down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); */
 #1: 0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */
(...snipped...)
3 locks held by syz-executor7/6541:
 #0: 0000000043ca8836 (&lo->lo_ctl_mutex/1){+.+.}, at: lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355 /* mutex_lock_nested(&lo->lo_ctl_mutex, 1); */
 #1: 000000007bf3d3f9 (&bdev->bd_mutex){+.+.}, at: blkdev_reread_part+0x1e/0x40 block/ioctl.c:192
 #2: 00000000566d4c39 (&type->s_umount_key#50){.+.+}, at: __get_super.part.10+0x1d3/0x280 fs/super.c:663 /* down_read(&sb->s_umount); */
----------------------------------------

When reporting an AB-BA deadlock like shown above, it would be nice if
trace of PID=6541 is printed as well as trace of PID=6540 before calling
panic().

Showing hung tasks up to /proc/sys/kernel/hung_task_warnings could delay
calling panic() but normally there should not be so many hung tasks.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tetsuo Handa <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Acked-by: Dmitry Vyukov <[email protected]>
Cc: Vegard Nossum <[email protected]>
Cc: Mandeep Singh Baines <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

Signed-off-by: Linus Torvalds <[email protected]>

Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
liyi-ibm pushed a commit that referenced this issue Dec 28, 2018
[ Upstream commit ecd6053 ]

Booting a ColdFire m68k core with MMU enabled causes a "bad page state"
oops since commit 1d40a5e ("mm: mark pages in use for page tables"):

 BUG: Bad page state in process sh  pfn:01ce2
 page:004fefc8 count:0 mapcount:-1024 mapping:00000000 index:0x0
 flags: 0x0()
 raw: 00000000 00000000 00000000 fffffbff 00000000 00000100 00000200 00000000
 raw: 039c4000
 page dumped because: nonzero mapcount
 Modules linked in:
 CPU: 0 PID: 22 Comm: sh Not tainted 4.17.0-07461-g1d40a5ea01d5 #13

Fix by calling pgtable_page_dtor() in our __pte_free_tlb() code path,
so that the PG_table flag is cleared before we free the pte page.

Note that I had to change the type of pte_free() to be static from
extern. Otherwise you get a lot of warnings like this:

./arch/m68k/include/asm/mcf_pgalloc.h:80:2: warning: ‘pgtable_page_dtor’ is static but used in inline function ‘pte_free’ which is not static
  pgtable_page_dtor(page);
  ^

And making it static is consistent with our use of this in the other
m68k pgalloc definitions of pte_free().

Signed-off-by: Greg Ungerer <[email protected]>
CC: Matthew Wilcox <[email protected]>
Reviewed-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant