Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix a race between GC threads in concurrent scavenging
While debugging #15285 I realized that free block lists (free_list in BlockAlloc.c) get corrupted when multiple scavenge threads allocate and release blocks concurrently. Here's a picture of one such race: Thread 2 (Thread 32573.32601): #0 check_tail (bd=0x940d40 <stg_TSO_info>) at rts/sm/BlockAlloc.c:860 #1 0x0000000000928ef7 in checkFreeListSanity () at rts/sm/BlockAlloc.c:896 #2 0x0000000000928979 in freeGroup (p=0x7e998ce02880) at rts/sm/BlockAlloc.c:721 #3 0x0000000000928a17 in freeChain (bd=0x7e998ce02880) at rts/sm/BlockAlloc.c:738 #4 0x0000000000926911 in freeChain_sync (bd=0x7e998ce02880) at rts/sm/GCUtils.c:80 #5 0x0000000000934720 in scavenge_capability_mut_lists (cap=0x1acae80) at rts/sm/Scav.c:1665 #6 0x000000000092b411 in gcWorkerThread (cap=0x1acae80) at rts/sm/GC.c:1157 #7 0x000000000090be9a in yieldCapability (pCap=0x7f9994e69e20, task=0x7e9984000b70, gcAllowed=true) at rts/Capability.c:861 #8 0x0000000000906120 in scheduleYield (pcap=0x7f9994e69e50, task=0x7e9984000b70) at rts/Schedule.c:673 #9 0x0000000000905500 in schedule (initialCapability=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:293 #10 0x0000000000908d4f in scheduleWorker (cap=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:2554 #11 0x000000000091a30a in workerStart (task=0x7e9984000b70) at rts/Task.c:444 #12 0x00007f99937fa6db in start_thread (arg=0x7f9994e6a700) at pthread_create.c:463 #13 0x000061654d59f88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 32573.32573): #0 checkFreeListSanity () at rts/sm/BlockAlloc.c:887 #1 0x0000000000928979 in freeGroup (p=0x7e998d303540) at rts/sm/BlockAlloc.c:721 #2 0x0000000000926f23 in todo_block_full (size=513, ws=0x1aa8ce0) at rts/sm/GCUtils.c:264 #3 0x00000000009583b9 in alloc_for_copy (size=513, gen_no=0) at rts/sm/Evac.c:80 #4 0x000000000095850d in copy_tag_nolock (p=0x7e998c675f28, info=0x421d98 <Main_Large_con_info>, src=0x7e998d075d80, size=513, gen_no=0, tag=1) at rts/sm/Evac.c:153 #5 0x0000000000959177 in evacuate (p=0x7e998c675f28) at rts/sm/Evac.c:715 #6 0x0000000000932388 in scavenge_small_bitmap (p=0x7e998c675f28, size=1, bitmap=0) at rts/sm/Scav.c:271 #7 0x0000000000934aaf in scavenge_stack (p=0x7e998c675f28, stack_end=0x7e998c676000) at rts/sm/Scav.c:1908 #8 0x0000000000934295 in scavenge_one (p=0x7e998c66e000) at rts/sm/Scav.c:1466 #9 0x0000000000934662 in scavenge_mutable_list (bd=0x7e998d300440, gen=0x1b1d880) at rts/sm/Scav.c:1643 #10 0x0000000000934700 in scavenge_capability_mut_lists (cap=0x1aaa340) at rts/sm/Scav.c:1664 #11 0x00000000009299b6 in GarbageCollect (collect_gen=0, do_heap_census=false, gc_type=2, cap=0x1aaa340, idle_cap=0x1b38aa0) at rts/sm/GC.c:378 #12 0x0000000000907a4a in scheduleDoGC (pcap=0x7ffdec5b5310, task=0x1b36650, force_major=false) at rts/Schedule.c:1798 #13 0x0000000000905de7 in schedule (initialCapability=0x1aaa340, task=0x1b36650) at rts/Schedule.c:546 #14 0x0000000000908bc4 in scheduleWaitThread (tso=0x7e998c0067c8, ret=0x0, pcap=0x7ffdec5b5430) at rts/Schedule.c:2537 ghc#15 0x000000000091b5a0 in rts_evalLazyIO (cap=0x7ffdec5b5430, p=0x9c11f0, ret=0x0) at rts/RtsAPI.c:530 ghc#16 0x000000000091ca56 in hs_main (argc=1, argv=0x7ffdec5b5628, main_closure=0x9c11f0, rts_config=...) at rts/RtsMain.c:72 ghc#17 0x0000000000421ea0 in main () In particular, dbl_link_onto() which is used to add a freed block to a doubly-linked free list is not thread safe and corrupts the list when called concurrently. Note that thread 1 is to blame here as thread 2 is properly taking the spinlock. With this patch we now take the spinlock when freeing a todo block in GC, avoiding this race. Test Plan: - Tried slow validate locally: this patch does not introduce new failures. - circleci: https://circleci.com/gh/ghc/ghc-diffs/283 The test got killed because it took 5 hours but T7919 (which was previously failing on circleci) passed. Reviewers: simonmar, bgamari, erikd Reviewed By: simonmar Subscribers: rwbarton, carter GHC Trac Issues: #15285 Differential Revision: https://phabricator.haskell.org/D5115 (cherry picked from commit c6fbac6)
- Loading branch information