From b5bfe7dca3e00a7c7de900b2d671c06cd66d7dee Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 18 Sep 2020 21:20:00 -0700 Subject: [PATCH 01/14] mailmap: add older email addresses for Kees Cook This adds explicit mailmap entries for my older/other email addresses. Reported-by: Joe Perches Signed-off-by: Kees Cook Signed-off-by: Andrew Morton Cc: Jonathan Corbet Link: https://lkml.kernel.org/r/20200910193939.3798377-1-keescook@chromium.org Signed-off-by: Linus Torvalds --- .mailmap | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/.mailmap b/.mailmap index 50096b96c85dfc..a780211468e4e4 100644 --- a/.mailmap +++ b/.mailmap @@ -169,6 +169,10 @@ Juha Yrjola Julien Thierry Kamil Konieczny Kay Sievers +Kees Cook +Kees Cook +Kees Cook +Kees Cook Kenneth W Chen Konstantin Khlebnikov Konstantin Khlebnikov From 62fdb1632bcbed30c40f6bd2b58297617e442658 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Fri, 18 Sep 2020 21:20:03 -0700 Subject: [PATCH 02/14] ksm: reinstate memcg charge on copied pages Patch series "mm: fixes to past from future testing". Here's a set of independent fixes against 5.9-rc2: prompted by testing Alex Shi's "warning on !memcg" and lru_lock series, but I think fit for 5.9 - though maybe only the first for stable. This patch (of 5): In 5.8 some instances of memcg charging in do_swap_page() and unuse_pte() were removed, on the understanding that swap cache is now already charged at those points; but a case was missed, when ksm_might_need_to_copy() has decided it must allocate a substitute page: such pages were never charged. Fix it inside ksm_might_need_to_copy(). This was discovered by Alex Shi's prospective commit "mm/memcg: warning on !memcg after readahead page charged". But there is a another surprise: this also fixes some rarer uncharged PageAnon cases, when KSM is configured in, but has never been activated. ksm_might_need_to_copy()'s anon_vma->root and linear_page_index() check sometimes catches a case which would need to have been copied if KSM were turned on. Or that's my optimistic interpretation (of my own old code), but it leaves some doubt as to whether everything is working as intended there - might it hint at rare anon ptes which rmap cannot find? A question not easily answered: put in the fix for missed memcg charges. Cc; Matthew Wilcox Fixes: 4c6355b25e8b ("mm: memcontrol: charge swapin pages on instantiation") Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Reviewed-by: Shakeel Butt Acked-by: Johannes Weiner Cc: Alex Shi Cc: Michal Hocko Cc: Mike Kravetz Cc: Qian Cai Cc: [5.8] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301343270.5954@eggly.anvils Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301358020.5954@eggly.anvils Signed-off-by: Linus Torvalds --- mm/ksm.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/ksm.c b/mm/ksm.c index 235f55d015410d..9afccc36dbd209 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2586,6 +2586,10 @@ struct page *ksm_might_need_to_copy(struct page *page, return page; /* let do_swap_page report the error */ new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); + if (new_page && mem_cgroup_charge(new_page, vma->vm_mm, GFP_KERNEL)) { + put_page(new_page); + new_page = NULL; + } if (new_page) { copy_user_highpage(new_page, page, address, vma); From a333e3e73b6648d3bd3ef6b971a59a6363bfcfc5 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Fri, 18 Sep 2020 21:20:06 -0700 Subject: [PATCH 03/14] mm: migration of hugetlbfs page skip memcg hugetlbfs pages do not participate in memcg: so although they do find most of migrate_page_states() useful, it would be better if they did not call into mem_cgroup_migrate() - where Qian Cai reported that LTP's move_pages12 triggers the warning in Alex Shi's prospective commit "mm/memcg: warning on !memcg after readahead page charged". Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Reviewed-by: Shakeel Butt Acked-by: Johannes Weiner Cc: Alex Shi Cc: Michal Hocko Cc: Mike Kravetz Cc: Qian Cai Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301359460.5954@eggly.anvils Signed-off-by: Linus Torvalds --- mm/migrate.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 941b89383cf3de..aecb1433cf3cc5 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -668,7 +668,8 @@ void migrate_page_states(struct page *newpage, struct page *page) copy_page_owner(page, newpage); - mem_cgroup_migrate(page, newpage); + if (!PageHuge(page)) + mem_cgroup_migrate(page, newpage); } EXPORT_SYMBOL(migrate_page_states); From 8d8869ca5d2d9d86db96271ab063fdcfa9baf5b4 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Fri, 18 Sep 2020 21:20:12 -0700 Subject: [PATCH 04/14] mm: fix check_move_unevictable_pages() on THP check_move_unevictable_pages() is used in making unevictable shmem pages evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and i915/gem check_release_pagevec(). Those may pass down subpages of a huge page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force". That does not crash or warn at present, but the accounting of vmstats unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent: scanned being incremented on each subpage, rescued only on the head (since tails already appear evictable once the head has been updated). 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has established that vm_events in general (and unevictable_pgs_rescued in particular) should count every subpage: so follow that precedent here. Do this in such a way that if mem_cgroup_page_lruvec() is made stricter (to check page->mem_cgroup is always set), no problem: skip the tails before calling it, and add thp_nr_pages() to vmstats on the head. Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Reviewed-by: Shakeel Butt Acked-by: Yang Shi Cc: Johannes Weiner Cc: Michal Hocko Cc: Mike Kravetz Cc: Qian Cai Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301405000.5954@eggly.anvils Signed-off-by: Linus Torvalds --- mm/vmscan.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 9727dd8e2581b2..466fc3144fffc6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4268,8 +4268,14 @@ void check_move_unevictable_pages(struct pagevec *pvec) for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; struct pglist_data *pagepgdat = page_pgdat(page); + int nr_pages; + + if (PageTransTail(page)) + continue; + + nr_pages = thp_nr_pages(page); + pgscanned += nr_pages; - pgscanned++; if (pagepgdat != pgdat) { if (pgdat) spin_unlock_irq(&pgdat->lru_lock); @@ -4288,7 +4294,7 @@ void check_move_unevictable_pages(struct pagevec *pvec) ClearPageUnevictable(page); del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE); add_page_to_lru_list(page, lruvec, lru); - pgrescued++; + pgrescued += nr_pages; } } From 0964730bf46b4e271c5ecad5badbbd95737c087b Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Fri, 18 Sep 2020 21:20:15 -0700 Subject: [PATCH 05/14] mlock: fix unevictable_pgs event counts on THP 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has established that vm_events should count every subpage of a THP, including unevictable_pgs_culled and unevictable_pgs_rescued; but lru_cache_add_inactive_or_unevictable() was not doing so for unevictable_pgs_mlocked, and mm/mlock.c was not doing so for unevictable_pgs mlocked, munlocked, cleared and stranded. Fix them; but THPs don't go the pagevec way in mlock.c, so no fixes needed on that path. Fixes: 5d91f31faf8e ("mm: swap: fix vmstats for huge page") Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Reviewed-by: Shakeel Butt Acked-by: Yang Shi Cc: Alex Shi Cc: Johannes Weiner Cc: Michal Hocko Cc: Mike Kravetz Cc: Qian Cai Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008301408230.5954@eggly.anvils Signed-off-by: Linus Torvalds --- mm/mlock.c | 24 +++++++++++++++--------- mm/swap.c | 6 +++--- 2 files changed, 18 insertions(+), 12 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c index 93ca2bf30b4fd2..884b1216da6a6d 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -58,11 +58,14 @@ EXPORT_SYMBOL(can_do_mlock); */ void clear_page_mlock(struct page *page) { + int nr_pages; + if (!TestClearPageMlocked(page)) return; - mod_zone_page_state(page_zone(page), NR_MLOCK, -thp_nr_pages(page)); - count_vm_event(UNEVICTABLE_PGCLEARED); + nr_pages = thp_nr_pages(page); + mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages); + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); /* * The previous TestClearPageMlocked() corresponds to the smp_mb() * in __pagevec_lru_add_fn(). @@ -76,7 +79,7 @@ void clear_page_mlock(struct page *page) * We lost the race. the page already moved to evictable list. */ if (PageUnevictable(page)) - count_vm_event(UNEVICTABLE_PGSTRANDED); + count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages); } } @@ -93,9 +96,10 @@ void mlock_vma_page(struct page *page) VM_BUG_ON_PAGE(PageCompound(page) && PageDoubleMap(page), page); if (!TestSetPageMlocked(page)) { - mod_zone_page_state(page_zone(page), NR_MLOCK, - thp_nr_pages(page)); - count_vm_event(UNEVICTABLE_PGMLOCKED); + int nr_pages = thp_nr_pages(page); + + mod_zone_page_state(page_zone(page), NR_MLOCK, nr_pages); + count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages); if (!isolate_lru_page(page)) putback_lru_page(page); } @@ -138,7 +142,7 @@ static void __munlock_isolated_page(struct page *page) /* Did try_to_unlock() succeed or punt? */ if (!PageMlocked(page)) - count_vm_event(UNEVICTABLE_PGMUNLOCKED); + count_vm_events(UNEVICTABLE_PGMUNLOCKED, thp_nr_pages(page)); putback_lru_page(page); } @@ -154,10 +158,12 @@ static void __munlock_isolated_page(struct page *page) */ static void __munlock_isolation_failed(struct page *page) { + int nr_pages = thp_nr_pages(page); + if (PageUnevictable(page)) - __count_vm_event(UNEVICTABLE_PGSTRANDED); + __count_vm_events(UNEVICTABLE_PGSTRANDED, nr_pages); else - __count_vm_event(UNEVICTABLE_PGMUNLOCKED); + __count_vm_events(UNEVICTABLE_PGMUNLOCKED, nr_pages); } /** diff --git a/mm/swap.c b/mm/swap.c index d16d65d9b4e099..e7bdf094f76a03 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -494,14 +494,14 @@ void lru_cache_add_inactive_or_unevictable(struct page *page, unevictable = (vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) == VM_LOCKED; if (unlikely(unevictable) && !TestSetPageMlocked(page)) { + int nr_pages = thp_nr_pages(page); /* * We use the irq-unsafe __mod_zone_page_stat because this * counter is not modified from interrupt context, and the pte * lock is held(spinlock), which implies preemption disabled. */ - __mod_zone_page_state(page_zone(page), NR_MLOCK, - thp_nr_pages(page)); - count_vm_event(UNEVICTABLE_PGMLOCKED); + __mod_zone_page_state(page_zone(page), NR_MLOCK, nr_pages); + count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages); } lru_cache_add(page); } From bb3e96d63eb75a2f4ff790b089f6b93614c729a1 Mon Sep 17 00:00:00 2001 From: Byron Stanoszek Date: Fri, 18 Sep 2020 21:20:18 -0700 Subject: [PATCH 06/14] tmpfs: restore functionality of nr_inodes=0 Commit e809d5f0b5c9 ("tmpfs: per-superblock i_ino support") made changes to shmem_reserve_inode() in mm/shmem.c, however the original test for (sbinfo->max_inodes) got dropped. This causes mounting tmpfs with option nr_inodes=0 to fail: # mount -ttmpfs -onr_inodes=0 none /ext0 mount: /ext0: mount(2) system call failed: Cannot allocate memory. This patch restores the nr_inodes=0 functionality. Fixes: e809d5f0b5c9 ("tmpfs: per-superblock i_ino support") Signed-off-by: Byron Stanoszek Signed-off-by: Andrew Morton Acked-by: Hugh Dickins Acked-by: Chris Down Link: https://lkml.kernel.org/r/20200902035715.16414-1-gandalf@winds.org Signed-off-by: Linus Torvalds --- mm/shmem.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 271548ca20f314..8e2b35ba93ad17 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -279,11 +279,13 @@ static int shmem_reserve_inode(struct super_block *sb, ino_t *inop) if (!(sb->s_flags & SB_KERNMOUNT)) { spin_lock(&sbinfo->stat_lock); - if (!sbinfo->free_inodes) { - spin_unlock(&sbinfo->stat_lock); - return -ENOSPC; + if (sbinfo->max_inodes) { + if (!sbinfo->free_inodes) { + spin_unlock(&sbinfo->stat_lock); + return -ENOSPC; + } + sbinfo->free_inodes--; } - sbinfo->free_inodes--; if (inop) { ino = sbinfo->next_ino++; if (unlikely(is_zero_ino(ino))) From b0399092ccebd9feef68d4ceb8d6219a8c0caa05 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Fri, 18 Sep 2020 21:20:21 -0700 Subject: [PATCH 07/14] kprobes: fix kill kprobe which has been marked as gone If a kprobe is marked as gone, we should not kill it again. Otherwise, we can disarm the kprobe more than once. In that case, the statistics of kprobe_ftrace_enabled can unbalance which can lead to that kprobe do not work. Fixes: e8386a0cb22f ("kprobes: support probing module __exit function") Co-developed-by: Chengming Zhou Signed-off-by: Muchun Song Signed-off-by: Chengming Zhou Signed-off-by: Andrew Morton Acked-by: Masami Hiramatsu Cc: "Naveen N . Rao" Cc: Anil S Keshavamurthy Cc: David S. Miller Cc: Song Liu Cc: Steven Rostedt Cc: Link: https://lkml.kernel.org/r/20200822030055.32383-1-songmuchun@bytedance.com Signed-off-by: Linus Torvalds --- kernel/kprobes.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 287b263c9cb957..049da84e19525a 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -2140,6 +2140,9 @@ static void kill_kprobe(struct kprobe *p) lockdep_assert_held(&kprobe_mutex); + if (WARN_ON_ONCE(kprobe_gone(p))) + return; + p->flags |= KPROBE_FLAG_GONE; if (kprobe_aggrprobe(p)) { /* @@ -2419,7 +2422,10 @@ static int kprobes_module_callback(struct notifier_block *nb, mutex_lock(&kprobe_mutex); for (i = 0; i < KPROBE_TABLE_SIZE; i++) { head = &kprobe_table[i]; - hlist_for_each_entry(p, head, hlist) + hlist_for_each_entry(p, head, hlist) { + if (kprobe_gone(p)) + continue; + if (within_module_init((unsigned long)p->addr, mod) || (checkcore && within_module_core((unsigned long)p->addr, mod))) { @@ -2436,6 +2442,7 @@ static int kprobes_module_callback(struct notifier_block *nb, */ kill_kprobe(p); } + } } if (val == MODULE_STATE_GOING) remove_module_kprobe_blacklist(mod); From ec0abae6dcdf7ef88607c869bf35a4b63ce1b370 Mon Sep 17 00:00:00 2001 From: Ralph Campbell Date: Fri, 18 Sep 2020 21:20:24 -0700 Subject: [PATCH 08/14] mm/thp: fix __split_huge_pmd_locked() for migration PMD A migrating transparent huge page has to already be unmapped. Otherwise, the page could be modified while it is being copied to a new page and data could be lost. The function __split_huge_pmd() checks for a PMD migration entry before calling __split_huge_pmd_locked() leading one to think that __split_huge_pmd_locked() can handle splitting a migrating PMD. However, the code always increments the page->_mapcount and adjusts the memory control group accounting assuming the page is mapped. Also, if the PMD entry is a migration PMD entry, the call to is_huge_zero_pmd(*pmd) is incorrect because it calls pmd_pfn(pmd) instead of migration_entry_to_pfn(pmd_to_swp_entry(pmd)). Fix these problems by checking for a PMD migration entry. Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Ralph Campbell Signed-off-by: Andrew Morton Reviewed-by: Yang Shi Reviewed-by: Zi Yan Cc: Jerome Glisse Cc: John Hubbard Cc: Alistair Popple Cc: Christoph Hellwig Cc: Jason Gunthorpe Cc: Bharata B Rao Cc: Ben Skeggs Cc: Shuah Khan Cc: [4.14+] Link: https://lkml.kernel.org/r/20200903183140.19055-1-rcampbell@nvidia.com Signed-off-by: Linus Torvalds --- mm/huge_memory.c | 42 +++++++++++++++++++++++------------------- 1 file changed, 23 insertions(+), 19 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7ff29cc3d55cb5..faadc449cca590 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2022,7 +2022,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, put_page(page); add_mm_counter(mm, mm_counter_file(page), -HPAGE_PMD_NR); return; - } else if (is_huge_zero_pmd(*pmd)) { + } else if (pmd_trans_huge(*pmd) && is_huge_zero_pmd(*pmd)) { /* * FIXME: Do we want to invalidate secondary mmu by calling * mmu_notifier_invalidate_range() see comments below inside @@ -2116,30 +2116,34 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, pte = pte_offset_map(&_pmd, addr); BUG_ON(!pte_none(*pte)); set_pte_at(mm, addr, pte, entry); - atomic_inc(&page[i]._mapcount); - pte_unmap(pte); - } - - /* - * Set PG_double_map before dropping compound_mapcount to avoid - * false-negative page_mapped(). - */ - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) { - for (i = 0; i < HPAGE_PMD_NR; i++) + if (!pmd_migration) atomic_inc(&page[i]._mapcount); + pte_unmap(pte); } - lock_page_memcg(page); - if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { - /* Last compound_mapcount is gone. */ - __dec_lruvec_page_state(page, NR_ANON_THPS); - if (TestClearPageDoubleMap(page)) { - /* No need in mapcount reference anymore */ + if (!pmd_migration) { + /* + * Set PG_double_map before dropping compound_mapcount to avoid + * false-negative page_mapped(). + */ + if (compound_mapcount(page) > 1 && + !TestSetPageDoubleMap(page)) { for (i = 0; i < HPAGE_PMD_NR; i++) - atomic_dec(&page[i]._mapcount); + atomic_inc(&page[i]._mapcount); + } + + lock_page_memcg(page); + if (atomic_add_negative(-1, compound_mapcount_ptr(page))) { + /* Last compound_mapcount is gone. */ + __dec_lruvec_page_state(page, NR_ANON_THPS); + if (TestClearPageDoubleMap(page)) { + /* No need in mapcount reference anymore */ + for (i = 0; i < HPAGE_PMD_NR; i++) + atomic_dec(&page[i]._mapcount); + } } + unlock_page_memcg(page); } - unlock_page_memcg(page); smp_wmb(); /* make pte visible before pmd */ pmd_populate(mm, pmd, pgtable); From 1ec882fc81e3177faf055877310dbdb0c68eb7db Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 18 Sep 2020 21:20:28 -0700 Subject: [PATCH 09/14] selftests/vm: fix display of page size in map_hugetlb The displayed size is in bytes while the text says it is in kB. Shift it by 10 to really display kBytes. Fixes: fa7b9a805c79 ("tools/selftest/vm: allow choosing mem size and page size in map_hugetlb") Signed-off-by: Christophe Leroy Signed-off-by: Andrew Morton Cc: Link: https://lkml.kernel.org/r/e27481224564a93d14106e750de31189deaa8bc8.1598861977.git.christophe.leroy@csgroup.eu Signed-off-by: Linus Torvalds --- tools/testing/selftests/vm/map_hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/vm/map_hugetlb.c b/tools/testing/selftests/vm/map_hugetlb.c index 6af951900aa39f..312889edb84ab7 100644 --- a/tools/testing/selftests/vm/map_hugetlb.c +++ b/tools/testing/selftests/vm/map_hugetlb.c @@ -83,7 +83,7 @@ int main(int argc, char **argv) } if (shift) - printf("%u kB hugepages\n", 1 << shift); + printf("%u kB hugepages\n", 1 << (shift - 10)); else printf("Default size hugepages\n"); printf("Mapping %lu Mbytes\n", (unsigned long)length >> 20); From 9683182612214aa5f5e709fad49444b847cd866a Mon Sep 17 00:00:00 2001 From: Pavel Tatashin Date: Fri, 18 Sep 2020 21:20:31 -0700 Subject: [PATCH 10/14] mm/memory_hotplug: drain per-cpu pages again during memory offline There is a race during page offline that can lead to infinite loop: a page never ends up on a buddy list and __offline_pages() keeps retrying infinitely or until a termination signal is received. Thread#1 - a new process: load_elf_binary begin_new_exec exec_mmap mmput exit_mmap tlb_finish_mmu tlb_flush_mmu release_pages free_unref_page_list free_unref_page_prepare set_pcppage_migratetype(page, migratetype); // Set page->index migration type below MIGRATE_PCPTYPES Thread#2 - hot-removes memory __offline_pages start_isolate_page_range set_migratetype_isolate set_pageblock_migratetype(page, MIGRATE_ISOLATE); Set migration type to MIGRATE_ISOLATE-> set drain_all_pages(zone); // drain per-cpu page lists to buddy allocator. Thread#1 - continue free_unref_page_commit migratetype = get_pcppage_migratetype(page); // get old migration type list_add(&page->lru, &pcp->lists[migratetype]); // add new page to already drained pcp list Thread#2 Never drains pcp again, and therefore gets stuck in the loop. The fix is to try to drain per-cpu lists again after check_pages_isolated_cb() fails. Fixes: c52e75935f8d ("mm: remove extra drain pages on pcp list") Signed-off-by: Pavel Tatashin Signed-off-by: Andrew Morton Acked-by: David Rientjes Acked-by: Vlastimil Babka Acked-by: Michal Hocko Acked-by: David Hildenbrand Cc: Oscar Salvador Cc: Wei Yang Cc: Link: https://lkml.kernel.org/r/20200903140032.380431-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20200904151448.100489-2-pasha.tatashin@soleen.com Link: http://lkml.kernel.org/r/20200904070235.GA15277@dhcp22.suse.cz Signed-off-by: Linus Torvalds --- mm/memory_hotplug.c | 14 ++++++++++++++ mm/page_isolation.c | 8 ++++++++ 2 files changed, 22 insertions(+) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index e9d5ab5d3ca097..b11a269e23561b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1575,6 +1575,20 @@ static int __ref __offline_pages(unsigned long start_pfn, /* check again */ ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL, check_pages_isolated_cb); + /* + * per-cpu pages are drained in start_isolate_page_range, but if + * there are still pages that are not free, make sure that we + * drain again, because when we isolated range we might + * have raced with another thread that was adding pages to pcp + * list. + * + * Forward progress should be still guaranteed because + * pages on the pcp list can only belong to MOVABLE_ZONE + * because has_unmovable_pages explicitly checks for + * PageBuddy on freed pages on other zones. + */ + if (ret) + drain_all_pages(zone); } while (ret); /* Ok, all of our target is isolated. diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 242c03121d7317..63a3db10a8c0c2 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * pageblocks we may have modified and return -EBUSY to caller. This * prevents two threads from simultaneously working on overlapping ranges. * + * Please note that there is no strong synchronization with the page allocator + * either. Pages might be freed while their page blocks are marked ISOLATED. + * In some cases pages might still end up on pcp lists and that would allow + * for their allocation even when they are in fact isolated already. Depending + * on how strong of a guarantee the caller needs drain_all_pages might be needed + * (e.g. __offline_pages will need to call it after check for isolated range for + * a next retry). + * * Return: the number of isolated pageblocks on success and -EBUSY if any part * of range cannot be isolated. */ From 7bb82ac30c3dd4ecf1485685cbe84d2ba10dddf4 Mon Sep 17 00:00:00 2001 From: Tobias Klauser Date: Fri, 18 Sep 2020 21:20:34 -0700 Subject: [PATCH 11/14] ftrace: let ftrace_enable_sysctl take a kernel pointer buffer Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of ftrace_enable_sysctl to match ctl_table.proc_handler which fixes the following sparse warning: kernel/trace/ftrace.c:7544:43: warning: incorrect type in argument 3 (different address spaces) kernel/trace/ftrace.c:7544:43: expected void * kernel/trace/ftrace.c:7544:43: got void [noderef] __user *buffer Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser Signed-off-by: Andrew Morton Cc: Christoph Hellwig Cc: Al Viro Link: https://lkml.kernel.org/r/20200907093207.13540-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds --- include/linux/ftrace.h | 3 +-- kernel/trace/ftrace.c | 3 +-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index ce2c06f72e863f..e5c2d5cc6e6ad7 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -85,8 +85,7 @@ static inline int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *val extern int ftrace_enabled; extern int ftrace_enable_sysctl(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, - loff_t *ppos); + void *buffer, size_t *lenp, loff_t *ppos); struct ftrace_ops; diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 275441254bb57b..e9fa580f3083a1 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -7531,8 +7531,7 @@ static bool is_permanent_ops_registered(void) int ftrace_enable_sysctl(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, - loff_t *ppos) + void *buffer, size_t *lenp, loff_t *ppos) { int ret = -ENODEV; From 4773ef33fc6e59bad2e5d19e334de2fa79c27b74 Mon Sep 17 00:00:00 2001 From: Tobias Klauser Date: Fri, 18 Sep 2020 21:20:37 -0700 Subject: [PATCH 12/14] stackleak: let stack_erasing_sysctl take a kernel pointer buffer Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the signature of stack_erasing_sysctl to match ctl_table.proc_handler which fixes the following sparse warning: kernel/stackleak.c:31:50: warning: incorrect type in argument 3 (different address spaces) kernel/stackleak.c:31:50: expected void * kernel/stackleak.c:31:50: got void [noderef] __user *buffer Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser Signed-off-by: Andrew Morton Cc: Christoph Hellwig Cc: Al Viro Link: https://lkml.kernel.org/r/20200907093253.13656-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds --- include/linux/stackleak.h | 2 +- kernel/stackleak.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h index 3d5c3271a9a8cc..a59db2f08e76b7 100644 --- a/include/linux/stackleak.h +++ b/include/linux/stackleak.h @@ -25,7 +25,7 @@ static inline void stackleak_task_init(struct task_struct *t) #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE int stack_erasing_sysctl(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos); + void *buffer, size_t *lenp, loff_t *ppos); #endif #else /* !CONFIG_GCC_PLUGIN_STACKLEAK */ diff --git a/kernel/stackleak.c b/kernel/stackleak.c index a8fc9ae1d03d95..ce161a8e8d9758 100644 --- a/kernel/stackleak.c +++ b/kernel/stackleak.c @@ -20,7 +20,7 @@ static DEFINE_STATIC_KEY_FALSE(stack_erasing_bypass); int stack_erasing_sysctl(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos) + void *buffer, size_t *lenp, loff_t *ppos) { int ret = 0; int state = !static_branch_unlikely(&stack_erasing_bypass); From 9ca48e20ec5cb3427ca57ffac9ed2b87090ab488 Mon Sep 17 00:00:00 2001 From: Tobias Klauser Date: Fri, 18 Sep 2020 21:20:39 -0700 Subject: [PATCH 13/14] fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") changed ctl_table.proc_handler to take a kernel pointer. Adjust the definition of dirtytime_interval_handler to match its prototype in linux/writeback.h which fixes the following sparse error/warning: fs/fs-writeback.c:2189:50: warning: incorrect type in argument 3 (different address spaces) fs/fs-writeback.c:2189:50: expected void * fs/fs-writeback.c:2189:50: got void [noderef] __user *buffer fs/fs-writeback.c:2184:5: error: symbol 'dirtytime_interval_handler' redeclared with different type (incompatible argument 3 (different address spaces)): fs/fs-writeback.c:2184:5: int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... ) fs/fs-writeback.c: note: in included file: ./include/linux/writeback.h:374:5: note: previously declared as: ./include/linux/writeback.h:374:5: int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... ) Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Signed-off-by: Tobias Klauser Signed-off-by: Andrew Morton Reviewed-by: Jan Kara Cc: Christoph Hellwig Cc: Al Viro Link: https://lkml.kernel.org/r/20200907093140.13434-1-tklauser@distanz.ch Signed-off-by: Linus Torvalds --- fs/fs-writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 149227160ff0b0..58b27e4070a30d 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -2184,7 +2184,7 @@ static int __init start_dirtytime_writeback(void) __initcall(start_dirtytime_writeback); int dirtytime_interval_handler(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos) + void *buffer, size_t *lenp, loff_t *ppos) { int ret; From 2645d432051cd4e6c04ee8af23be07c92f1f52a2 Mon Sep 17 00:00:00 2001 From: Changbin Du Date: Fri, 18 Sep 2020 21:20:42 -0700 Subject: [PATCH 14/14] kcsan: kconfig: move to menu 'Generic Kernel Debugging Instruments' This moves the KCSAN kconfig items under menu 'Generic Kernel Debugging Instruments' where UBSAN resides. Signed-off-by: Changbin Du Signed-off-by: Andrew Morton Tested-by: Randy Dunlap Reviewed-by: Randy Dunlap Cc: Greg Kroah-Hartman Cc: Marco Elver Link: https://lkml.kernel.org/r/20200904152224.5570-1-changbin.du@gmail.com Signed-off-by: Linus Torvalds --- lib/Kconfig.debug | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index e068c3c7189a19..0c781f912f9f0b 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -520,8 +520,8 @@ config DEBUG_FS_ALLOW_NONE endchoice source "lib/Kconfig.kgdb" - source "lib/Kconfig.ubsan" +source "lib/Kconfig.kcsan" endmenu @@ -1620,8 +1620,6 @@ config PROVIDE_OHCI1394_DMA_INIT source "samples/Kconfig" -source "lib/Kconfig.kcsan" - config ARCH_HAS_DEVMEM_IS_ALLOWED bool