page allocation stalls #9

liyi-ibm · 2018-12-11T02:49:27Z

When free memory is low, applications that allocate memory have to wait until some memory is reclaimed. If the delay is greater than 10 seconds, an allocation stall message is printed.

An upstream patch has been identified that should help reduce overhead and latencies. This patch is currently being backported and tested.
commit a983b5e
Author: Johannes Weiner [email protected]
Date: Wed Jan 31 16:16:45 2018 -0800

mm: memcontrol: fix excessive complexity in memory.stat reporting

The text was updated successfully, but these errors were encountered:

liyi-ibm · 2018-12-11T02:50:33Z

committed above patch (and dependency) as: aa243dd, 23da735, 848ed2a

liyi-ibm · 2018-12-13T07:06:13Z

Note: in page allocation stalls we still see many available memory, but the memory cannot be reclaimed.

In another case, there is OOM followed by stall, the stall happens because out-of-memory:

[Wed Dec  5 22:03:18 2018] collect_kprobe.: page allocation stalls for 14860ms, order:0, mode:0x15080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), nodemask=(null)
[Wed Dec  5 22:03:18 2018] collect_kprobe. cpuset=/ mems_allowed=0,8
[Wed Dec  5 22:03:18 2018] CPU: 44 PID: 6412 Comm: collect_kprobe. Not tainted 4.14.49-memctrl #1
[Wed Dec  5 22:03:18 2018] Call Trace:
[Wed Dec  5 22:03:18 2018] [c000001fdd04b950] [c000000000ad8b14] dump_stack+0xe8/0x154 (unreliable)
[Wed Dec  5 22:03:18 2018] [c000001fdd04b990] [c00000000028d8a8] warn_alloc+0x128/0x1c0
[Wed Dec  5 22:03:18 2018] [c000001fdd04ba40] [c00000000028e3b4] __alloc_pages_nodemask+0x9e4/0x1080
[Wed Dec  5 22:03:18 2018] [c000001fdd04bc30] [c000000000317140] alloc_pages_current+0xa0/0x130
[Wed Dec  5 22:03:18 2018] [c000001fdd04bc70] [c000000000287cc8] __get_free_pages+0x28/0x90
[Wed Dec  5 22:03:18 2018] [c000001fdd04bc90] [c0000000000f8af0] mm_init+0x150/0x2f0
[Wed Dec  5 22:03:18 2018] [c000001fdd04bcd0] [c0000000000fa738] copy_process.isra.31.part.32+0x9a8/0x19f0
[Wed Dec  5 22:03:18 2018] [c000001fdd04bdc0] [c0000000000fb97c] _do_fork+0xdc/0x4b0
[Wed Dec  5 22:03:18 2018] [c000001fdd04be30] [c00000000000c304] ppc_clone+0x8/0xc
[Wed Dec  5 22:03:18 2018] Mem-Info:
[Wed Dec  5 22:03:18 2018] active_anon:1930707 inactive_anon:1789300 isolated_anon:0
 active_file:144 inactive_file:299 isolated_file:0
 unevictable:0 dirty:0 writeback:157 unstable:0
 slab_reclaimable:47303 slab_unreclaimable:201360
 mapped:639 shmem:526 pagetables:9075 bounce:0
 free:8106 free_pcp:27 free_cma:4524
[Wed Dec  5 22:03:18 2018] Node 0 active_anon:64086656kB inactive_anon:60984640kB active_file:6080kB inactive_file:9600kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:16832kB dirty:0kB writeback:5952kB shmem:33216kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 14336kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed Dec  5 22:03:18 2018] Node 8 active_anon:59478592kB inactive_anon:53530560kB active_file:3136kB inactive_file:9536kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:24064kB dirty:0kB writeback:4096kB shmem:448kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Wed Dec  5 22:03:18 2018] Node 0 DMA free:165056kB min:179712kB low:312896kB high:446080kB active_anon:64086656kB inactive_anon:60984640kB active_file:7552kB inactive_file:4928kB unevictable:0kB writepending:0kB present:134217728kB managed:133227968kB mlocked:0kB kernel_stack:43136kB pagetables:285184kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[Wed Dec  5 22:03:18 2018] lowmem_reserve[]: 0 0 0 0
[Wed Dec  5 22:03:18 2018] Node 8 DMA free:353728kB min:180672kB low:314560kB high:448448kB active_anon:59478592kB inactive_anon:53530560kB active_file:15552kB inactive_file:17280kB unevictable:0kB writepending:0kB present:134217728kB managed:133945792kB mlocked:0kB kernel_stack:107600kB pagetables:295616kB bounce:0kB free_pcp:1728kB local_pcp:0kB free_cma:289536kB
[Wed Dec  5 22:03:18 2018] lowmem_reserve[]: 0 0 0 0
[Wed Dec  5 22:03:18 2018] Node 0 DMA: 2583*64kB (UME) 6*128kB (ME) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 166080kB
[Wed Dec  5 22:03:18 2018] Node 8 DMA: 2980*64kB (UMEC) 352*128kB (MEC) 261*256kB (C) 66*512kB (C) 10*1024kB (C) 2*2048kB (C) 0*4096kB 0*8192kB 0*16384kB = 350720kB
[Wed Dec  5 22:03:18 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed Dec  5 22:03:18 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed Dec  5 22:03:18 2018] Node 8 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Wed Dec  5 22:03:18 2018] Node 8 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Wed Dec  5 22:03:18 2018] 2898 total pagecache pages
[Wed Dec  5 22:03:18 2018] 763 pages in swap cache
[Wed Dec  5 22:03:18 2018] Swap cache stats: add 71856, delete 71093, find 26068/42083
[Wed Dec  5 22:03:18 2018] Free swap  = 0kB
[Wed Dec  5 22:03:18 2018] Total swap = 524224kB
[Wed Dec  5 22:03:18 2018] 4194304 pages RAM
[Wed Dec  5 22:03:18 2018] 0 pages HighMem/MovableOnly
[Wed Dec  5 22:03:18 2018] 19714 pages reserved
[Wed Dec  5 22:03:18 2018] 209920 pages cma reserved
[Wed Dec  5 22:03:18 2018] 0 pages hwpoisoned

liyi-ibm · 2018-12-13T07:09:48Z

[Wed Dec  5 22:03:19 2018] Node 8 active_anon:59477376kB inactive_anon:53530560kB active_file:27520kB inactive_file:14848kB unevictable:0kB isolated(anon):0kB isolated(file):128kB mapped:30400kB dirty:11264kB writeback:10432kB shmem:448kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no```

Active/inactive file memory is shrunk down to almost nothing, and anon can't be reclaimed because no swap.
Reclaimable slab is a bit bigger, but reclaimable slab doesn't mean it can be reclaimed right now.
Looks pretty out of memory.

liyi-ibm · 2018-12-13T07:11:55Z

If stall not caused by OOM, there are free swap, and there are many inactive file memory:

Nov 21 14:39:31 tdw-9-10-28-232 kernel: java: page allocation stalls for 16180ms, order:0, mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null)
Nov 21 14:39:31 tdw-9-10-28-232 kernel: java cpuset=/ mems_allowed=0,8
Nov 21 14:39:31 tdw-9-10-28-232 kernel: CPU: 77 PID: 129639 Comm: java Tainted: G        W       4.14.49-3.ppc64le #1
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Call Trace:
Nov 21 14:39:31 tdw-9-10-28-232 kernel: [c00020081ae578f0] [c000000000acb99c] dump_stack+0xb0/0xf4 (unreliable)
Nov 21 14:39:31 tdw-9-10-28-232 kernel: [c00020081ae57930] [c000000000283e28] warn_alloc+0x128/0x1c0
Nov 21 14:39:31 tdw-9-10-28-232 kernel: [c00020081ae579e0] [c000000000284934] __alloc_pages_nodemask+0x9e4/0x1080
Nov 21 14:39:31 tdw-9-10-28-232 kernel: [c00020081ae57bd0] [c00000000030e878] alloc_pages_vma+0xa8/0x2a0
Nov 21 14:39:31 tdw-9-10-28-232 kernel: [c00020081ae57c40] [c0000000002d2730] __handle_mm_fault+0xef0/0x1ee0
Nov 21 14:39:31 tdw-9-10-28-232 kernel: [c00020081ae57d20] [c0000000002d3848] handle_mm_fault+0x128/0x200
Nov 21 14:39:31 tdw-9-10-28-232 kernel: [c00020081ae57d60] [c00000000006498c] __do_page_fault+0x1cc/0x8c0
Nov 21 14:39:31 tdw-9-10-28-232 kernel: [c00020081ae57e30] [c00000000000aca4] handle_page_fault+0x18/0x38
Nov 21 14:39:31 tdw-9-10-28-232 kernel: warn_alloc_show_mem: 4 callbacks suppressed
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Mem-Info:
Nov 21 14:39:31 tdw-9-10-28-232 kernel: active_anon:1420266 inactive_anon:642206 isolated_anon:0#012 active_file:263518 inactive_file:607740 isolated_file:231#012 unevictable:0 dirty:547 writeback:0 unstable:0#012 slab_reclaimable:89790 slab_unreclaimable:182653#012 mapped:2629 shmem:11995 pagetables:6249 bounce:0#012 free:5572 free_pcp:860 free_cma:44
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Node 0 active_anon:44499968kB inactive_anon:19903296kB active_file:7169536kB inactive_file:15138624kB unevictable:0kB isolated(anon):0kB isolated(file):6400kB mapped:105536kB dirty:16128kB writeback:0kB shmem:527488kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Node 8 active_anon:46397056kB inactive_anon:21197888kB active_file:9695616kB inactive_file:23756736kB unevictable:0kB isolated(anon):0kB isolated(file):8384kB mapped:62720kB dirty:18880kB writeback:0kB shmem:240192kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Node 0 DMA free:179584kB min:179712kB low:312896kB high:446080kB active_anon:44461568kB inactive_anon:20003904kB active_file:7169536kB inactive_file:15053248kB unevictable:0kB writepending:4864kB present:134217728kB managed:133228160kB mlocked:0kB kernel_stack:149648kB pagetables:223296kB bounce:0kB free_pcp:26304kB local_pcp:0kB free_cma:0kB
Nov 21 14:39:31 tdw-9-10-28-232 kernel: lowmem_reserve[]: 0 0 0 0
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Node 8 DMA free:177024kB min:180672kB low:314560kB high:448448kB active_anon:46378112kB inactive_anon:21287744kB active_file:9683072kB inactive_file:23644288kB unevictable:0kB writepending:6720kB present:134217728kB managed:133945792kB mlocked:0kB kernel_stack:95584kB pagetables:176640kB bounce:0kB free_pcp:28736kB local_pcp:0kB free_cma:2816kB
Nov 21 14:39:31 tdw-9-10-28-232 kernel: lowmem_reserve[]: 0 0 0 0
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Node 0 DMA: 2126*64kB (UME) 432*128kB (UME) 6*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 192896kB
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Node 8 DMA: 921*64kB (UMEHC) 834*128kB (UMEHC) 112*256kB (UMH) 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 194368kB
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Node 8 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Node 8 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Nov 21 14:39:31 tdw-9-10-28-232 kernel: 883380 total pagecache pages
Nov 21 14:39:31 tdw-9-10-28-232 kernel: 1 pages in swap cache
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Swap cache stats: add 2, delete 1, find 0/0
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Free swap  = 524096kB
Nov 21 14:39:31 tdw-9-10-28-232 kernel: Total swap = 524224kB
Nov 21 14:39:31 tdw-9-10-28-232 kernel: 4194304 pages RAM
Nov 21 14:39:31 tdw-9-10-28-232 kernel: 0 pages HighMem/MovableOnly
Nov 21 14:39:31 tdw-9-10-28-232 kernel: 19711 pages reserved
Nov 21 14:39:31 tdw-9-10-28-232 kernel: 209920 pages cma reserved
Nov 21 14:39:31 tdw-9-10-28-232 kernel: 0 pages hwpoisoned

@p-

…ilure While forking, if delayacct init fails due to memory shortage, it continues expecting all delayacct users to check task->delays pointer against NULL before dereferencing it, which all of them used to do. Commit c96f547 ("delayacct: Account blkio completion on the correct task"), while updating delayacct_blkio_end() to take the target task instead of always using %current, made the function test NULL on %current->delays and then continue to operated on @p->delays. If %current succeeded init while @p didn't, it leads to the following crash. BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: __delayacct_blkio_end+0xc/0x40 PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9 RIP: 0010:__delayacct_blkio_end+0xc/0x40 Call Trace: try_to_wake_up+0x2c0/0x600 autoremove_wake_function+0xe/0x30 __wake_up_common+0x74/0x120 wake_up_page_bit+0x9c/0xe0 mpage_end_io+0x27/0x70 blk_update_request+0x78/0x2c0 scsi_end_request+0x2c/0x1e0 scsi_io_completion+0x20b/0x5f0 blk_mq_complete_request+0xa2/0x100 ata_scsi_qc_complete+0x79/0x400 ata_qc_complete_multiple+0x86/0xd0 ahci_handle_port_interrupt+0xc9/0x5c0 ahci_handle_port_intr+0x54/0xb0 ahci_single_level_irq_intr+0x3b/0x60 __handle_irq_event_percpu+0x43/0x190 handle_irq_event_percpu+0x20/0x50 handle_irq_event+0x2a/0x50 handle_edge_irq+0x80/0x1c0 handle_irq+0xaf/0x120 do_IRQ+0x41/0xc0 common_interrupt+0xf/0xf Fix it by updating delayacct_blkio_end() check @p->delays instead. Link: http://lkml.kernel.org/r/[email protected] Fixes: c96f547 ("delayacct: Account blkio completion on the correct task") Signed-off-by: Tejun Heo <[email protected]> Reported-by: Dave Jones <[email protected]> Debugged-by: Dave Jones <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Josh Snyder <[email protected]> Cc: <[email protected]> [4.15+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

[ Upstream commit 9954b80 ] platform_domain_notifier contains a variable sized array, which the pm_clk_notify() notifier treats as a NULL terminated array: for (con_id = clknb->con_ids; *con_id; con_id++) pm_clk_add(dev, *con_id); Omitting the initialiser for con_ids means that the array is zero sized, and there is no NULL terminator. This leads to pm_clk_notify() overrunning into what ever structure follows, which may not be NULL. This leads to an oops: Unable to handle kernel NULL pointer dereference at virtual address 0000008c pgd = c0003000 [0000008c] *pgd=80000800004003c, *pmd=00000000c Internal error: Oops: 206 [#1] PREEMPT SMP ARM Modules linked in:c CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0+ #9 Hardware name: Keystone PC is at strlen+0x0/0x34 LR is at kstrdup+0x18/0x54 pc : [<c0623340>] lr : [<c0111d6c>] psr: 20000013 sp : eec73dc0 ip : eed780c0 fp : 00000001 r10: 00000000 r9 : 00000000 r8 : eed71e10 r7 : 0000008c r6 : 0000008c r5 : 014000c0 r4 : c03a6ff4 r3 : c09445d0 r2 : 00000000 r1 : 014000c0 r0 : 0000008c Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5387d Table: 00003000 DAC: fffffffd Process swapper/0 (pid: 1, stack limit = 0xeec72210) Stack: (0xeec73dc0 to 0xeec74000) ... [<c0623340>] (strlen) from [<c0111d6c>] (kstrdup+0x18/0x54) [<c0111d6c>] (kstrdup) from [<c03a6ff4>] (__pm_clk_add+0x58/0x120) [<c03a6ff4>] (__pm_clk_add) from [<c03a731c>] (pm_clk_notify+0x64/0xa8) [<c03a731c>] (pm_clk_notify) from [<c004614c>] (notifier_call_chain+0x44/0x84) [<c004614c>] (notifier_call_chain) from [<c0046320>] (__blocking_notifier_call_chain+0x48/0x60) [<c0046320>] (__blocking_notifier_call_chain) from [<c0046350>] (blocking_notifier_call_chain+0x18/0x20) [<c0046350>] (blocking_notifier_call_chain) from [<c0390234>] (device_add+0x36c/0x534) [<c0390234>] (device_add) from [<c047fc00>] (of_platform_device_create_pdata+0x70/0xa4) [<c047fc00>] (of_platform_device_create_pdata) from [<c047fea0>] (of_platform_bus_create+0xf0/0x1ec) [<c047fea0>] (of_platform_bus_create) from [<c047fff8>] (of_platform_populate+0x5c/0xac) [<c047fff8>] (of_platform_populate) from [<c08b1f04>] (of_platform_default_populate_init+0x8c/0xa8) [<c08b1f04>] (of_platform_default_populate_init) from [<c000a78c>] (do_one_initcall+0x3c/0x164) [<c000a78c>] (do_one_initcall) from [<c087bd9c>] (kernel_init_freeable+0x10c/0x1d0) [<c087bd9c>] (kernel_init_freeable) from [<c0628db0>] (kernel_init+0x8/0xf0) [<c0628db0>] (kernel_init) from [<c00090d8>] (ret_from_fork+0x14/0x3c) Exception stack(0xeec73fb0 to 0xeec73ff8) 3fa0: 00000000 00000000 00000000 00000000 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 Code: e3520000 1afffff7 e12fff1e c0801730 (e5d02000) ---[ end trace cafa8f148e262e80 ]--- Fix this by adding the necessary initialiser. Fixes: fc20ffe ("ARM: keystone: add PM domain support for clock management") Signed-off-by: Russell King <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: Olof Johansson <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 2efd4fc ] Syzbot reported a read beyond the end of the skb head when returning IPV6_ORIGDSTADDR: BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242 CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125 kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219 kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261 copy_to_user include/linux/uaccess.h:184 [inline] put_cmsg+0x5ef/0x860 net/core/scm.c:242 ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719 ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733 rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521 [..] This logic and its ipv4 counterpart read the destination port from the packet at skb_transport_offset(skb) + 4. With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a packet that stores headers exactly up to skb_transport_offset(skb) in the head and the remainder in a frag. Call pskb_may_pull before accessing the pointer to ensure that it lies in skb head. Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com Reported-by: [email protected] Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

@p-

…ilure commit b512719 upstream. While forking, if delayacct init fails due to memory shortage, it continues expecting all delayacct users to check task->delays pointer against NULL before dereferencing it, which all of them used to do. Commit c96f547 ("delayacct: Account blkio completion on the correct task"), while updating delayacct_blkio_end() to take the target task instead of always using %current, made the function test NULL on %current->delays and then continue to operated on @p->delays. If %current succeeded init while @p didn't, it leads to the following crash. BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: __delayacct_blkio_end+0xc/0x40 PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9 RIP: 0010:__delayacct_blkio_end+0xc/0x40 Call Trace: try_to_wake_up+0x2c0/0x600 autoremove_wake_function+0xe/0x30 __wake_up_common+0x74/0x120 wake_up_page_bit+0x9c/0xe0 mpage_end_io+0x27/0x70 blk_update_request+0x78/0x2c0 scsi_end_request+0x2c/0x1e0 scsi_io_completion+0x20b/0x5f0 blk_mq_complete_request+0xa2/0x100 ata_scsi_qc_complete+0x79/0x400 ata_qc_complete_multiple+0x86/0xd0 ahci_handle_port_interrupt+0xc9/0x5c0 ahci_handle_port_intr+0x54/0xb0 ahci_single_level_irq_intr+0x3b/0x60 __handle_irq_event_percpu+0x43/0x190 handle_irq_event_percpu+0x20/0x50 handle_irq_event+0x2a/0x50 handle_edge_irq+0x80/0x1c0 handle_irq+0xaf/0x120 do_IRQ+0x41/0xc0 common_interrupt+0xf/0xf Fix it by updating delayacct_blkio_end() check @p->delays instead. Link: http://lkml.kernel.org/r/[email protected] Fixes: c96f547 ("delayacct: Account blkio completion on the correct task") Signed-off-by: Tejun Heo <[email protected]> Reported-by: Dave Jones <[email protected]> Debugged-by: Dave Jones <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Josh Snyder <[email protected]> Cc: <[email protected]> [4.15+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 4f4616c ] Similar to what we do when we remove a PCI function, set the QEDF_UNLOADING flag to prevent any requests from being queued while a vport is being deleted. This prevents any requests from getting stuck in limbo when the vport is unloaded or deleted. Fixes the crash: PID: 106676 TASK: ffff9a436aa90000 CPU: 12 COMMAND: "multipathd" #0 [ffff9a43567d3550] machine_kexec+522 at ffffffffaca60b2a #1 [ffff9a43567d35b0] __crash_kexec+114 at ffffffffacb13512 #2 [ffff9a43567d3680] crash_kexec+48 at ffffffffacb13600 #3 [ffff9a43567d3698] oops_end+168 at ffffffffad117768 #4 [ffff9a43567d36c0] no_context+645 at ffffffffad106f52 #5 [ffff9a43567d3710] __bad_area_nosemaphore+116 at ffffffffad106fe9 #6 [ffff9a43567d3760] bad_area+70 at ffffffffad107379 #7 [ffff9a43567d3788] __do_page_fault+1247 at ffffffffad11a8cf #8 [ffff9a43567d37f0] do_page_fault+53 at ffffffffad11a915 #9 [ffff9a43567d3820] page_fault+40 at ffffffffad116768 [exception RIP: qedf_init_task+61] RIP: ffffffffc0e13c2d RSP: ffff9a43567d38d0 RFLAGS: 00010046 RAX: 0000000000000000 RBX: ffffbe920472c738 RCX: ffff9a434fa0e3e8 RDX: ffff9a434f695280 RSI: ffffbe920472c738 RDI: ffff9a43aa359c80 RBP: ffff9a43567d3950 R8: 0000000000000c15 R9: ffff9a3fb09b9880 R10: ffff9a434fa0e3e8 R11: ffff9a43567d35ce R12: 0000000000000000 R13: ffff9a434f695280 R14: ffff9a43aa359c80 R15: ffff9a3fb9e005c0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 Signed-off-by: Chad Dupuis <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 89da619 upstream. Kernel panic when with high memory pressure, calltrace looks like, PID: 21439 TASK: ffff881be3afedd0 CPU: 16 COMMAND: "java" #0 [ffff881ec7ed7630] machine_kexec at ffffffff81059beb #1 [ffff881ec7ed7690] __crash_kexec at ffffffff81105942 #2 [ffff881ec7ed7760] crash_kexec at ffffffff81105a30 #3 [ffff881ec7ed7778] oops_end at ffffffff816902c8 #4 [ffff881ec7ed77a0] no_context at ffffffff8167ff46 #5 [ffff881ec7ed77f0] __bad_area_nosemaphore at ffffffff8167ffdc #6 [ffff881ec7ed7838] __node_set at ffffffff81680300 #7 [ffff881ec7ed7860] __do_page_fault at ffffffff8169320f #8 [ffff881ec7ed78c0] do_page_fault at ffffffff816932b5 #9 [ffff881ec7ed78f0] page_fault at ffffffff8168f4c8 [exception RIP: _raw_spin_lock_irqsave+47] RIP: ffffffff8168edef RSP: ffff881ec7ed79a8 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffffea0019740d00 RCX: ffff881ec7ed7fd8 RDX: 0000000000020000 RSI: 0000000000000016 RDI: 0000000000000008 RBP: ffff881ec7ed79a8 R8: 0000000000000246 R9: 000000000001a098 R10: ffff88107ffda000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000008 R14: ffff881ec7ed7a80 R15: ffff881be3afedd0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 It happens in the pagefault and results in double pagefault during compacting pages when memory allocation fails. Analysed the vmcore, the page leads to second pagefault is corrupted with _mapcount=-256, but private=0. It's caused by the race between migration and ballooning, and lock missing in virtballoon_migratepage() of virtio_balloon driver. This patch fix the bug. Fixes: e225042 ("virtio_balloon: introduce migration primitives to balloon pages") Cc: [email protected] Signed-off-by: Jiang Biao <[email protected]> Signed-off-by: Huang Chong <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 934140a ] cachefiles_read_waiter() has the right to access a 'monitor' object by virtue of being called under the waitqueue lock for one of the pages in its purview. However, it has no ref on that monitor object or on the associated operation. What it is allowed to do is to move the monitor object to the operation's to_do list, but once it drops the work_lock, it's actually no longer permitted to access that object. However, it is trying to enqueue the retrieval operation for processing - but it can only do this via a pointer in the monitor object, something it shouldn't be doing. If it doesn't enqueue the operation, the operation may not get processed. If the order is flipped so that the enqueue is first, then it's possible for the work processor to look at the to_do list before the monitor is enqueued upon it. Fix this by getting a ref on the operation so that we can trust that it will still be there once we've added the monitor to the to_do list and dropped the work_lock. The op can then be enqueued after the lock is dropped. The bug can manifest in one of a couple of ways. The first manifestation looks like: FS-Cache: FS-Cache: Assertion failed FS-Cache: 6 == 5 is false ------------[ cut here ]------------ kernel BUG at fs/fscache/operation.c:494! RIP: 0010:fscache_put_operation+0x1e3/0x1f0 ... fscache_op_work_func+0x26/0x50 process_one_work+0x131/0x290 worker_thread+0x45/0x360 kthread+0xf8/0x130 ? create_worker+0x190/0x190 ? kthread_cancel_work_sync+0x10/0x10 ret_from_fork+0x1f/0x30 This is due to the operation being in the DEAD state (6) rather than INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through fscache_put_operation(). The bug can also manifest like the following: kernel BUG at fs/fscache/operation.c:69! ... [exception RIP: fscache_enqueue_operation+246] ... #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6 #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48 #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028 I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not entirely clear which assertion failed. Fixes: 9ae326a ("CacheFiles: A cache that backs onto a mounted filesystem") Reported-by: Lei Xue <[email protected]> Reported-by: Vegard Nossum <[email protected]> Reported-by: Anthony DeRobertis <[email protected]> Reported-by: NeilBrown <[email protected]> Reported-by: Daniel Axtens <[email protected]> Reported-by: Kiran Kumar Modukuri <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Daniel Axtens <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

page allocation stalls #9

page allocation stalls #9

liyi-ibm commented Dec 11, 2018

liyi-ibm commented Dec 11, 2018

liyi-ibm commented Dec 13, 2018

liyi-ibm commented Dec 13, 2018

liyi-ibm commented Dec 13, 2018

page allocation stalls #9

page allocation stalls #9

Comments

liyi-ibm commented Dec 11, 2018

liyi-ibm commented Dec 11, 2018

liyi-ibm commented Dec 13, 2018

liyi-ibm commented Dec 13, 2018

liyi-ibm commented Dec 13, 2018