Skip to content

Commit

Permalink
mm: compaction: avoid GFP_NOFS deadlock
Browse files Browse the repository at this point in the history
Sync compaction takes the page lock and buffer locks in migration. If
the caller is a filesystem, this deadlocks. The too_many_isolated()
check can also cause a deadlock if all regular compactors are stuck on
an fs lock and the fs compactor is stuck on too_many_isolated().

Fix this up by making NOFS allocations only do async migration with
trylocks, and copy the too_many_isolated() exemption from reclaim.

It appears this currently doesn't happen in practice. It might become
more likely with folios making it into the filesystems. It triggers
with the next patches that have order-0 requests defragment blocks.

Signed-off-by: Johannes Weiner <[email protected]>
  • Loading branch information
hnaz committed Mar 9, 2023
1 parent 1070c53 commit 4c525a2
Showing 1 changed file with 19 additions and 4 deletions.
23 changes: 19 additions & 4 deletions mm/compaction.c
Original file line number Diff line number Diff line change
Expand Up @@ -745,8 +745,9 @@ isolate_freepages_range(struct compact_control *cc,
}

/* Similar to reclaim, but different enough that they don't share logic */
static bool too_many_isolated(pg_data_t *pgdat)
static bool too_many_isolated(struct compact_control *cc)
{
pg_data_t *pgdat = cc->zone->zone_pgdat;
bool too_many;

unsigned long active, inactive, isolated;
Expand All @@ -758,6 +759,16 @@ static bool too_many_isolated(pg_data_t *pgdat)
isolated = node_page_state(pgdat, NR_ISOLATED_FILE) +
node_page_state(pgdat, NR_ISOLATED_ANON);

/*
* GFP_NOFS callers are allowed to isolate more pages, so they
* won't get blocked by normal direct-reclaimers, forming a
* circular deadlock. GFP_NOIO won't get here.
*/
if (cc->gfp_mask & __GFP_FS) {
inactive >>= 3;
active >>= 3;
}

too_many = isolated > (inactive + active) / 2;
if (!too_many)
wake_throttle_isolated(pgdat);
Expand Down Expand Up @@ -806,7 +817,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
* list by either parallel reclaimers or compaction. If there are,
* delay for some time until fewer pages are isolated
*/
while (unlikely(too_many_isolated(pgdat))) {
while (unlikely(too_many_isolated(cc))) {
/* stop isolation if there are still pages not migrated */
if (cc->nr_migratepages)
return -EAGAIN;
Expand Down Expand Up @@ -2507,8 +2518,6 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
.search_order = order,
.gfp_mask = gfp_mask,
.zone = zone,
.mode = (prio == COMPACT_PRIO_ASYNC) ?
MIGRATE_ASYNC : MIGRATE_SYNC_LIGHT,
.alloc_flags = alloc_flags,
.highest_zoneidx = highest_zoneidx,
.direct_compaction = true,
Expand All @@ -2521,6 +2530,12 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
.page = NULL,
};

/* Use trylocks in migration if this is a filesystem allocation */
if (prio == COMPACT_PRIO_ASYNC || !(gfp_mask & __GFP_FS))
cc.mode = MIGRATE_ASYNC;
else
cc.mode = MIGRATE_SYNC_LIGHT;

/*
* Make sure the structs are really initialized before we expose the
* capture control, in case we are interrupted and the interrupt handler
Expand Down

0 comments on commit 4c525a2

Please sign in to comment.