Skip to content

Commit

Permalink
mm/memory_hotplug: drain per-cpu pages again during memory offline
Browse files Browse the repository at this point in the history
There is a race during page offline that can lead to infinite loop:
a page never ends up on a buddy list and __offline_pages() keeps
retrying infinitely or until a termination signal is received.

Thread#1 - a new process:

load_elf_binary
 begin_new_exec
  exec_mmap
   mmput
    exit_mmap
     tlb_finish_mmu
      tlb_flush_mmu
       release_pages
        free_unref_page_list
         free_unref_page_prepare
          set_pcppage_migratetype(page, migratetype);
             // Set page->index migration type below  MIGRATE_PCPTYPES

Thread#2 - hot-removes memory
__offline_pages
  start_isolate_page_range
    set_migratetype_isolate
      set_pageblock_migratetype(page, MIGRATE_ISOLATE);
        Set migration type to MIGRATE_ISOLATE-> set
        drain_all_pages(zone);
             // drain per-cpu page lists to buddy allocator.

Thread#1 - continue
         free_unref_page_commit
           migratetype = get_pcppage_migratetype(page);
              // get old migration type
           list_add(&page->lru, &pcp->lists[migratetype]);
              // add new page to already drained pcp list

Thread#2
Never drains pcp again, and therefore gets stuck in the loop.

The fix is to try to drain per-cpu lists again after
check_pages_isolated_cb() fails.

Fixes: c52e759 ("mm: remove extra drain pages on pcp list")
Signed-off-by: Pavel Tatashin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
soleen authored and torvalds committed Sep 19, 2020
1 parent 1ec882f commit 9683182
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 0 deletions.
14 changes: 14 additions & 0 deletions mm/memory_hotplug.c
Original file line number Diff line number Diff line change
Expand Up @@ -1575,6 +1575,20 @@ static int __ref __offline_pages(unsigned long start_pfn,
/* check again */
ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
NULL, check_pages_isolated_cb);
/*
* per-cpu pages are drained in start_isolate_page_range, but if
* there are still pages that are not free, make sure that we
* drain again, because when we isolated range we might
* have raced with another thread that was adding pages to pcp
* list.
*
* Forward progress should be still guaranteed because
* pages on the pcp list can only belong to MOVABLE_ZONE
* because has_unmovable_pages explicitly checks for
* PageBuddy on freed pages on other zones.
*/
if (ret)
drain_all_pages(zone);
} while (ret);

/* Ok, all of our target is isolated.
Expand Down
8 changes: 8 additions & 0 deletions mm/page_isolation.c
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
* pageblocks we may have modified and return -EBUSY to caller. This
* prevents two threads from simultaneously working on overlapping ranges.
*
* Please note that there is no strong synchronization with the page allocator
* either. Pages might be freed while their page blocks are marked ISOLATED.
* In some cases pages might still end up on pcp lists and that would allow
* for their allocation even when they are in fact isolated already. Depending
* on how strong of a guarantee the caller needs drain_all_pages might be needed
* (e.g. __offline_pages will need to call it after check for isolated range for
* a next retry).
*
* Return: the number of isolated pageblocks on success and -EBUSY if any part
* of range cannot be isolated.
*/
Expand Down

0 comments on commit 9683182

Please sign in to comment.