Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rcu stalls #10

Open
liyi-ibm opened this issue Dec 12, 2018 · 2 comments
Open

rcu stalls #10

liyi-ibm opened this issue Dec 12, 2018 · 2 comments

Comments

@liyi-ibm
Copy link
Owner

There is rcu stalls on P9, which causes system reboot.

Dec  6 14:47:07 tdw-9-10-25-239 kernel: NETDEV WATCHDOG: eth1 (ixgbe): transmit queue 3 timed out
Dec  6 14:47:07 tdw-9-10-25-239 kernel: ------------[ cut here ]------------
Dec  6 14:47:07 tdw-9-10-25-239 kernel: WARNING: CPU: 136 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x35c/0x370
Dec  6 14:47:07 tdw-9-10-25-239 kernel: Modules linked in: dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag i2c_dev joydev ixgbe ptp at24 pps_core mdio ofpart opal_prd powernv_flash ipmi_powernv ipmi_devintf ipmi_msghandler mtd i2c_opal nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc binfmt_misc usb_storage ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mpt3sas drm raid_class scsi_transport_sas i2c_core
Dec  6 14:47:07 tdw-9-10-25-239 kernel: CPU: 136 PID: 0 Comm: swapper/136 Tainted: G        W       4.14.49-3.ppc64le #1
Dec  6 14:47:07 tdw-9-10-25-239 kernel: task: c000201cb7440000 task.stack: c000201cb74cc000
Dec  6 14:47:07 tdw-9-10-25-239 kernel: NIP:  c000000000989d8c LR: c000000000989d88 CTR: 0000000000000000
Dec  6 14:47:07 tdw-9-10-25-239 kernel: REGS: c000201cb74cf4d0 TRAP: 0700   Tainted: G        W        (4.14.49-3.ppc64le)
Dec  6 14:47:07 tdw-9-10-25-239 kernel: MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004822  XER: 20040000
Dec  6 14:47:07 tdw-9-10-25-239 kernel: CFAR: c000000000172f18 SOFTE: 1 #012GPR00: c000000000989d88 c000201cb74cf750 c0000000013d3000 0000000000000039 #012GPR04: c000201cc755abd0 c000201cc7571410 0000000000000001 c000201cc6b50000 #012GPR08: 0000000000000000 c000000000f3126c 0000201cc6630000 0000000000000006 #012GPR12: 0000000000004000 c000000007d9d800 c000201cb74cff90 0000000000000000 #012GPR16: 0000000000200042 0000000112ea5201 c000201cb74cc000 0000000000000000 #012GPR20: c000000000f44f80 c000000001403b00 c000000000f44f80 000000000000000a #012GPR24: 0000000000000000 ffffffffffffffff 0000000000000000 0000000000000088 #012GPR28: 0000000000000004 c000000001403b00 c000201ca04c0000 0000000000000003 
Dec  6 14:47:07 tdw-9-10-25-239 kernel: NIP [c000000000989d8c] dev_watchdog+0x35c/0x370
Dec  6 14:47:07 tdw-9-10-25-239 kernel: LR [c000000000989d88] dev_watchdog+0x358/0x370
Dec  6 14:47:07 tdw-9-10-25-239 kernel: Call Trace:
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cf750] [c000000000989d88] dev_watchdog+0x358/0x370 (unreliable)
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cf7f0] [c000000000193bc0] call_timer_fn+0x60/0x1d0
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cf880] [c000000000193eb0] expire_timers+0x140/0x1e0
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cf8f0] [c000000000194028] run_timer_softirq+0xd8/0x230
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cf980] [c000000000aec96c] __do_softirq+0x15c/0x3a4
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfa70] [c000000000104288] irq_exit+0x118/0x130
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfa90] [c000000000023d6c] timer_interrupt+0xac/0xe0
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfac0] [c0000000000092e8] decrementer_common+0x158/0x160
Dec  6 14:47:07 tdw-9-10-25-239 kernel: --- interrupt: 901 at replay_interrupt_return+0x0/0x4#012    LR = arch_local_irq_restore+0x74/0x90
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfdb0] [c000201cb74cfe30] 0xc000201cb74cfe30 (unreliable)
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfdd0] [c0000000008d0e10] cpuidle_enter_state+0x110/0x3f0
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfe30] [c00000000015bd3c] call_cpuidle+0x4c/0x80
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfe50] [c00000000015c130] do_idle+0x2b0/0x350
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfec0] [c00000000015c3b8] cpu_startup_entry+0x38/0x40
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cfef0] [c000000000048894] start_secondary+0x4e4/0x530
Dec  6 14:47:07 tdw-9-10-25-239 kernel: [c000201cb74cff90] [c00000000000b26c] start_secondary_prolog+0x10/0x14
Dec  6 14:47:07 tdw-9-10-25-239 kernel: Instruction dump:
Dec  6 14:47:07 tdw-9-10-25-239 kernel: 3d02fff3 7fc3f378 99282650 4bfc6171 60000000 7fc4f378 7fe6fb78 7c651b78 
Dec  6 14:47:07 tdw-9-10-25-239 kernel: 3c62ff9e 3863f838 4b7e914d 60000000 <0fe00000> 4bffff84 60000000 60000000 
Dec  6 14:47:07 tdw-9-10-25-239 kernel: ---[ end trace 823c30e96f862b4a ]---
Dec  6 14:47:07 tdw-9-10-25-239 kernel: ixgbe 0034:01:00.1 eth1: initiating reset due to tx timeout
Dec  6 14:47:07 tdw-9-10-25-239 kernel: ixgbe 0034:01:00.1 eth1: Reset adapter
Dec  6 14:47:50 tdw-9-10-25-239 kernel: INFO: rcu_sched self-detected stall on CPU
Dec  6 14:47:50 tdw-9-10-25-239 kernel: #01188-...: (6001 ticks this GP) idle=dde/140000000000001/0 softirq=66975497/66975497 fqs=2973 
Dec  6 14:47:50 tdw-9-10-25-239 kernel: #011 (t=6001 jiffies g=68660684 c=68660683 q=26406)
Dec  6 14:47:50 tdw-9-10-25-239 kernel: NMI backtrace for cpu 88
Dec  6 14:47:50 tdw-9-10-25-239 kernel: CPU: 88 PID: 95334 Comm: drop_cache.sh Tainted: G        W       4.14.49-3.ppc64le #1
Dec  6 14:47:50 tdw-9-10-25-239 kernel: Call Trace:
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff2d0] [c000000000acb99c] dump_stack+0xb0/0xf4 (unreliable)
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff310] [c000000000ad4ac4] nmi_cpu_backtrace+0x1a4/0x210
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff3a0] [c000000000ad4d0c] nmi_trigger_cpumask_backtrace+0x1dc/0x220
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff440] [c00000000002e5b8] arch_trigger_cpumask_backtrace+0x28/0x40
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff460] [c00000000018a9b4] rcu_dump_cpu_stacks+0xfc/0x158
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff4b0] [c000000000189df8] rcu_check_callbacks+0x898/0xaa0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff5e0] [c000000000195334] update_process_times+0x44/0x90
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff610] [c0000000001abf4c] tick_sched_handle.isra.13+0x4c/0x80
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff630] [c0000000001abfe0] tick_sched_timer+0x60/0xc0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff670] [c000000000195f38] __hrtimer_run_queues+0xf8/0x330
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff6f0] [c000000000196cfc] hrtimer_interrupt+0xec/0x290
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff7b0] [c000000000023668] __timer_interrupt+0x98/0x280
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff800] [c000000000023d68] timer_interrupt+0xa8/0xe0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ff830] [c0000000000092e8] decrementer_common+0x158/0x160
Dec  6 14:47:50 tdw-9-10-25-239 kernel: --- interrupt: 901 at _raw_spin_lock+0x40/0xc0#012    LR = drop_pagecache_sb+0xac/0x1d0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffb20] [0000000000000000]           (null) (unreliable)
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffb50] [c0000000003e96bc] drop_pagecache_sb+0xac/0x1d0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffbb0] [c000000000363898] iterate_supers+0x1b8/0x1f0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffc20] [c0000000003e9890] drop_caches_sysctl_handler+0xb0/0x170
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffc90] [c00000000040e608] proc_sys_call_handler+0x108/0x130
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffd00] [c00000000035ec98] __vfs_write+0x48/0x1f0
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffd90] [c00000000035f070] vfs_write+0xd0/0x240
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffde0] [c00000000035f3b8] SyS_write+0x68/0x110
Dec  6 14:47:50 tdw-9-10-25-239 kernel: [c000200bca6ffe30] [c00000000000b9e0] system_call+0x58/0x6c
Dec  6 14:50:50 tdw-9-10-25-239 kernel: INFO: rcu_sched self-detected stall on CPU
Dec  6 14:50:50 tdw-9-10-25-239 kernel: #01188-...: (24004 ticks this GP) idle=dde/140000000000001/0 softirq=66975497/66975497 fqs=11973 
Dec  6 14:50:50 tdw-9-10-25-239 kernel: #011 (t=24004 jiffies g=68660684 c=68660683 q=29221)
Dec  6 14:50:50 tdw-9-10-25-239 kernel: NMI backtrace for cpu 88
Dec  6 14:50:50 tdw-9-10-25-239 kernel: CPU: 88 PID: 95334 Comm: drop_cache.sh Tainted: G        W       4.14.49-3.ppc64le #1
Dec  6 14:50:50 tdw-9-10-25-239 kernel: Call Trace:
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff2d0] [c000000000acb99c] dump_stack+0xb0/0xf4 (unreliable)
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff310] [c000000000ad4ac4] nmi_cpu_backtrace+0x1a4/0x210
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff3a0] [c000000000ad4d0c] nmi_trigger_cpumask_backtrace+0x1dc/0x220
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff440] [c00000000002e5b8] arch_trigger_cpumask_backtrace+0x28/0x40
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff460] [c00000000018a9b4] rcu_dump_cpu_stacks+0xfc/0x158
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff4b0] [c000000000189df8] rcu_check_callbacks+0x898/0xaa0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff5e0] [c000000000195334] update_process_times+0x44/0x90
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff610] [c0000000001abf4c] tick_sched_handle.isra.13+0x4c/0x80
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff630] [c0000000001abfe0] tick_sched_timer+0x60/0xc0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff670] [c000000000195f38] __hrtimer_run_queues+0xf8/0x330
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff6f0] [c000000000196cfc] hrtimer_interrupt+0xec/0x290
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff7b0] [c000000000023668] __timer_interrupt+0x98/0x280
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff800] [c000000000023d68] timer_interrupt+0xa8/0xe0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ff830] [c0000000000092e8] decrementer_common+0x158/0x160
Dec  6 14:50:50 tdw-9-10-25-239 kernel: --- interrupt: 901 at _raw_spin_lock+0x48/0xc0#012    LR = drop_pagecache_sb+0xac/0x1d0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffb20] [0000000000000000]           (null) (unreliable)
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffb50] [c0000000003e96bc] drop_pagecache_sb+0xac/0x1d0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffbb0] [c000000000363898] iterate_supers+0x1b8/0x1f0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffc20] [c0000000003e9890] drop_caches_sysctl_handler+0xb0/0x170
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffc90] [c00000000040e608] proc_sys_call_handler+0x108/0x130
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffd00] [c00000000035ec98] __vfs_write+0x48/0x1f0
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffd90] [c00000000035f070] vfs_write+0xd0/0x240
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffde0] [c00000000035f3b8] SyS_write+0x68/0x110
Dec  6 14:50:50 tdw-9-10-25-239 kernel: [c000200bca6ffe30] [c00000000000b9e0] system_call+0x58/0x6c
Dec  6 14:53:50 tdw-9-10-25-239 kernel: INFO: rcu_sched self-detected stall on CPU
Dec  6 14:53:50 tdw-9-10-25-239 kernel: #01188-...: (42008 ticks this GP) idle=dde/140000000000001/0 softirq=66975497/66975497 fqs=20924 
Dec  6 14:53:50 tdw-9-10-25-239 kernel: #011 (t=42008 jiffies g=68660684 c=68660683 q=31790)
Dec  6 14:53:50 tdw-9-10-25-239 kernel: NMI backtrace for cpu 88
Dec  6 14:53:50 tdw-9-10-25-239 kernel: CPU: 88 PID: 95334 Comm: drop_cache.sh Tainted: G        W       4.14.49-3.ppc64le #1
Dec  6 14:53:50 tdw-9-10-25-239 kernel: Call Trace:
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff2d0] [c000000000acb99c] dump_stack+0xb0/0xf4 (unreliable)
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff310] [c000000000ad4ac4] nmi_cpu_backtrace+0x1a4/0x210
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff3a0] [c000000000ad4d0c] nmi_trigger_cpumask_backtrace+0x1dc/0x220
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff440] [c00000000002e5b8] arch_trigger_cpumask_backtrace+0x28/0x40
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff460] [c00000000018a9b4] rcu_dump_cpu_stacks+0xfc/0x158
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff4b0] [c000000000189df8] rcu_check_callbacks+0x898/0xaa0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff5e0] [c000000000195334] update_process_times+0x44/0x90
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff610] [c0000000001abf4c] tick_sched_handle.isra.13+0x4c/0x80
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff630] [c0000000001abfe0] tick_sched_timer+0x60/0xc0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff670] [c000000000195f38] __hrtimer_run_queues+0xf8/0x330
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff6f0] [c000000000196cfc] hrtimer_interrupt+0xec/0x290
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff7b0] [c000000000023668] __timer_interrupt+0x98/0x280
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff800] [c000000000023d68] timer_interrupt+0xa8/0xe0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ff830] [c0000000000092e8] decrementer_common+0x158/0x160
Dec  6 14:53:50 tdw-9-10-25-239 kernel: --- interrupt: 901 at _raw_spin_lock+0x40/0xc0#012    LR = drop_pagecache_sb+0xac/0x1d0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffb20] [0000000000000000]           (null) (unreliable)
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffb50] [c0000000003e96bc] drop_pagecache_sb+0xac/0x1d0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffbb0] [c000000000363898] iterate_supers+0x1b8/0x1f0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffc20] [c0000000003e9890] drop_caches_sysctl_handler+0xb0/0x170
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffc90] [c00000000040e608] proc_sys_call_handler+0x108/0x130
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffd00] [c00000000035ec98] __vfs_write+0x48/0x1f0
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffd90] [c00000000035f070] vfs_write+0xd0/0x240
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffde0] [c00000000035f3b8] SyS_write+0x68/0x110
Dec  6 14:53:50 tdw-9-10-25-239 kernel: [c000200bca6ffe30] [c00000000000b9e0] system_call+0x58/0x6c
Dec  6 14:56:50 tdw-9-10-25-239 kernel: INFO: rcu_sched self-detected stall on CPU
Dec  6 14:56:50 tdw-9-10-25-239 kernel: #01188-...: (60012 ticks this GP) idle=dde/140000000000001/0 softirq=66975497/66975497 fqs=29914 
Dec  6 14:56:50 tdw-9-10-25-239 kernel: #011 (t=60012 jiffies g=68660684 c=68660683 q=39139)
Dec  6 14:56:50 tdw-9-10-25-239 kernel: NMI backtrace for cpu 88
Dec  6 14:56:50 tdw-9-10-25-239 kernel: CPU: 88 PID: 95334 Comm: drop_cache.sh Tainted: G        W       4.14.49-3.ppc64le #1
Dec  6 14:56:50 tdw-9-10-25-239 kernel: Call Trace:
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff2d0] [c000000000acb99c] dump_stack+0xb0/0xf4 (unreliable)
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff310] [c000000000ad4ac4] nmi_cpu_backtrace+0x1a4/0x210
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff3a0] [c000000000ad4d0c] nmi_trigger_cpumask_backtrace+0x1dc/0x220
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff440] [c00000000002e5b8] arch_trigger_cpumask_backtrace+0x28/0x40
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff460] [c00000000018a9b4] rcu_dump_cpu_stacks+0xfc/0x158
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff4b0] [c000000000189df8] rcu_check_callbacks+0x898/0xaa0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff5e0] [c000000000195334] update_process_times+0x44/0x90
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff610] [c0000000001abf4c] tick_sched_handle.isra.13+0x4c/0x80
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff630] [c0000000001abfe0] tick_sched_timer+0x60/0xc0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff670] [c000000000195f38] __hrtimer_run_queues+0xf8/0x330
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff6f0] [c000000000196cfc] hrtimer_interrupt+0xec/0x290
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff7b0] [c000000000023668] __timer_interrupt+0x98/0x280
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff800] [c000000000023d68] timer_interrupt+0xa8/0xe0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ff830] [c0000000000092e8] decrementer_common+0x158/0x160
Dec  6 14:56:50 tdw-9-10-25-239 kernel: --- interrupt: 901 at _raw_spin_lock+0x30/0xc0#012    LR = drop_pagecache_sb+0xac/0x1d0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffb20] [0000000000000000]           (null) (unreliable)
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffb50] [c0000000003e96bc] drop_pagecache_sb+0xac/0x1d0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffbb0] [c000000000363898] iterate_supers+0x1b8/0x1f0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffc20] [c0000000003e9890] drop_caches_sysctl_handler+0xb0/0x170
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffc90] [c00000000040e608] proc_sys_call_handler+0x108/0x130
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffd00] [c00000000035ec98] __vfs_write+0x48/0x1f0
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffd90] [c00000000035f070] vfs_write+0xd0/0x240
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffde0] [c00000000035f3b8] SyS_write+0x68/0x110
Dec  6 14:56:50 tdw-9-10-25-239 kernel: [c000200bca6ffe30] [c00000000000b9e0] system_call+0x58/0x6c
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Dec  6 15:11:31 tdw-9-10-25-239 rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="3731" x-info="http://www.rsyslog.com"] start
Dec  6 23:11:02 tdw-9-10-25-239 journal: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 127.3G available → current limit 4.0G).
Dec  6 23:11:02 tdw-9-10-25-239 journal: Runtime journal is using 8.0M (max allowed 4.0G, trying to leave 4.0G free of 127.3G available → current limit 4.0G).
Dec  6 23:11:02 tdw-9-10-25-239 kernel: opal: OPAL detected !

@liyi-ibm
Copy link
Owner Author

The rcu_stall also happens on P8
kern.log.txt

@liyi-ibm
Copy link
Owner Author

This patch might fix:
https://patchwork.kernel.org/patch/10716303/

liyi-ibm pushed a commit that referenced this issue Dec 28, 2018
commit a4f843b upstream.

syzbot hit the following crash on upstream commit
83beed7 (Fri Apr 20 17:56:32 2018 +0000)
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=d154ec99402c6f628887

C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5414336294027264
syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5471683234234368
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5436660795834368
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
F2FS-fs (loop0): invalid crc value
------------[ cut here ]------------
kernel BUG at fs/f2fs/node.c:1185!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4549 Comm: syzkaller704305 Not tainted 4.17.0-rc1+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185
RSP: 0018:ffff8801d960e820 EFLAGS: 00010293
RAX: ffff8801d88205c0 RBX: 0000000000000003 RCX: ffffffff82f6cc06
RDX: 0000000000000000 RSI: ffffffff82f6d5e8 RDI: 0000000000000004
RBP: ffff8801d960ec30 R08: ffff8801d88205c0 R09: ffffed003b5e46c2
R10: 0000000000000003 R11: 0000000000000003 R12: ffff8801a86e00c0
R13: 0000000000000001 R14: ffff8801a86e0530 R15: ffff8801d9745240
FS:  000000000072c880(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3d403209b8 CR3: 00000001d8f3f000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 get_node_page fs/f2fs/node.c:1237 [inline]
 truncate_xattr_node+0x152/0x2e0 fs/f2fs/node.c:1014
 remove_inode_page+0x200/0xaf0 fs/f2fs/node.c:1039
 f2fs_evict_inode+0xe86/0x1710 fs/f2fs/inode.c:547
 evict+0x4a6/0x960 fs/inode.c:557
 iput_final fs/inode.c:1519 [inline]
 iput+0x62d/0xa80 fs/inode.c:1545
 f2fs_fill_super+0x5f4e/0x7bf0 fs/f2fs/super.c:2849
 mount_bdev+0x30c/0x3e0 fs/super.c:1164
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020
 mount_fs+0xae/0x328 fs/super.c:1267
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2518 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2848
 ksys_mount+0x12d/0x140 fs/namespace.c:3064
 __do_sys_mount fs/namespace.c:3078 [inline]
 __se_sys_mount fs/namespace.c:3075 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x443dea
RSP: 002b:00007ffcc7882368 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443dea
RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffcc7882370
RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a
R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004
R13: 0000000000402ce0 R14: 0000000000000000 R15: 0000000000000000
RIP: __get_node_page+0xb68/0x16e0 fs/f2fs/node.c:1185 RSP: ffff8801d960e820
---[ end trace 4edbeb71f002bb76 ]---

Reported-and-tested-by: [email protected]
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
liyi-ibm pushed a commit that referenced this issue Dec 28, 2018
commit 8a29c12 upstream.

This patch enhances sanity check for SIT entries.

syzbot hit the following crash on upstream commit
83beed7 (Fri Apr 20 17:56:32 2018 +0000)
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=bf9253040425feb155ad

syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=5692130282438656
Raw console output: https://syzkaller.appspot.com/x/log.txt?id=5095924598571008
Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]
It will help syzbot understand when the bug is fixed. See footer for details.
If you forward the report, please keep this part and the footer.

F2FS-fs (loop0): invalid crc value
F2FS-fs (loop0): Try to recover 1th superblock, ret: 0
F2FS-fs (loop0): Mounted with checkpoint version = d
F2FS-fs (loop0): Bitmap was wrongly cleared, blk:9740
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:1884!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4508 Comm: syz-executor0 Not tainted 4.17.0-rc1+ #10
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882
RSP: 0018:ffff8801af526708 EFLAGS: 00010282
RAX: ffffed0035ea4cc0 RBX: ffff8801ad454f90 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff82eeb87e RDI: ffffed0035ea4cb6
RBP: ffff8801af526760 R08: ffff8801ad4a2480 R09: ffffed003b5e4f90
R10: ffffed003b5e4f90 R11: ffff8801daf27c87 R12: ffff8801adb8d380
R13: 0000000000000001 R14: 0000000000000008 R15: 00000000ffffffff
FS:  00000000014af940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f06bc223000 CR3: 00000001adb02000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 allocate_data_block+0x66f/0x2050 fs/f2fs/segment.c:2663
 do_write_page+0x105/0x1b0 fs/f2fs/segment.c:2727
 write_node_page+0x129/0x350 fs/f2fs/segment.c:2770
 __write_node_page+0x7da/0x1370 fs/f2fs/node.c:1398
 sync_node_pages+0x18cf/0x1eb0 fs/f2fs/node.c:1652
 block_operations+0x429/0xa60 fs/f2fs/checkpoint.c:1088
 write_checkpoint+0x3ba/0x5380 fs/f2fs/checkpoint.c:1405
 f2fs_sync_fs+0x2fb/0x6a0 fs/f2fs/super.c:1077
 __sync_filesystem fs/sync.c:39 [inline]
 sync_filesystem+0x265/0x310 fs/sync.c:67
 generic_shutdown_super+0xd7/0x520 fs/super.c:429
 kill_block_super+0xa4/0x100 fs/super.c:1191
 kill_f2fs_super+0x9f/0xd0 fs/f2fs/super.c:3030
 deactivate_locked_super+0x97/0x100 fs/super.c:316
 deactivate_super+0x188/0x1b0 fs/super.c:347
 cleanup_mnt+0xbf/0x160 fs/namespace.c:1174
 __cleanup_mnt+0x16/0x20 fs/namespace.c:1181
 task_work_run+0x1e4/0x290 kernel/task_work.c:113
 tracehook_notify_resume include/linux/tracehook.h:191 [inline]
 exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457d97
RSP: 002b:00007ffd46f9c8e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000457d97
RDX: 00000000014b09a3 RSI: 0000000000000002 RDI: 00007ffd46f9da50
RBP: 00007ffd46f9da50 R08: 0000000000000000 R09: 0000000000000009
R10: 0000000000000005 R11: 0000000000000246 R12: 00000000014b0940
R13: 0000000000000000 R14: 0000000000000002 R15: 000000000000658e
RIP: update_sit_entry+0x1215/0x1590 fs/f2fs/segment.c:1882 RSP: ffff8801af526708
---[ end trace f498328bb02610a2 ]---

Reported-and-tested-by: [email protected]
Reported-and-tested-by: [email protected]
Reported-and-tested-by: [email protected]
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Sudip Mukherjee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant