Skip to content

Enable new exception handling on win-x86 #115957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

filipnavara
Copy link
Member

@filipnavara filipnavara commented May 24, 2025

Switches JIT to the funclet model that is used by other platforms and VM to use the new exception handling introduced in .NET 9.

Fixes #113985

@filipnavara filipnavara requested a review from jkotas May 24, 2025 04:25
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 24, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 24, 2025
@filipnavara filipnavara added area-ExceptionHandling-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 24, 2025
@jkotas
Copy link
Member

jkotas commented May 24, 2025

/azp run runtime-coreclr outerloop

@jkotas
Copy link
Member

jkotas commented May 24, 2025

/azp run runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@am11
Copy link
Member

am11 commented May 24, 2025

!CREATE_CHECK_STRING(pMT && pMT->Validate()) is #110512.

@filipnavara
Copy link
Member Author

!CREATE_CHECK_STRING(pMT && pMT->Validate()) is #110512.

I'm not so sure it's the same root cause. I'll look at the dumps later on.

@filipnavara
Copy link
Member Author

filipnavara commented May 24, 2025

The JIT.opt stress test is failing when reporting GC references at the nop instruction in funclet prolog that was added in #115630.

More than one value in the dump seems off, but at least it explains why I didn't see this in earlier gcstress pass.

Disassembly:

0eedf182 59             pop     ecx
0eedf183 5b             pop     ebx
0eedf184 5e             pop     esi
0eedf185 5f             pop     edi
0eedf186 5d             pop     ebp
0eedf187 c3             ret     
0eedf188 90             nop                                       <--- GC stress here
0eedf189 8945f0         mov     dword ptr [ebp-10h], eax
0eedf18c 8d0517a2d10e   lea     eax, ds:[0ED1A217h]
0eedf192 c3             ret     
0eedf193 0000           add     byte ptr [eax], al
0eedf195 0000           add     byte ptr [eax], al
0eedf197 00cf           add     bh, cl

Locals in EnumGCRefsX86:

methodStart = 0x000000000eedf0d8 <-- likely wrong
curOffs = 0xb0
funcletStart = 0x000000000ed1a220

GC info:

!ip2md 0x000000000ed1a220
MethodDesc:   097098f0
Method Name:          Xunit.Assert.RecordException(System.Func`1<System.Object>, System.String)
Class:                0970afb8
MethodTable:          0970afb8
mdToken:              060000CE
Module:               09707674
IsJitted:             yes
Current CodeAddr:     0ed1a170
Version History:
  ILCodeVersion:      00000000
  ReJIT ID:           0
  IL Addr:            09735450
     CodeAddr:           0ed1a170  (Optimized)
     NativeCodeVersion:  00000000
Source file:  /_/src/arcade/src/Microsoft.DotNet.XUnitAssert/src/Record.cs @ 72
0:000> !gcinfo 097098f0
entry point 0ED1A170
Normal JIT generated code
GC info 0EEAE34C
Method info block:
    method      size   = 00BB
    prolog      size   = 11 
    epilog      size   =  6 
    epilog     count   =  2 
    epilog      end    = no  
    callee-saved regs  = EDI ESI EBX EBP 
    ebp frame          = yes  
    fully interruptible= yes  
    double align       = no  
    arguments size     =  0 DWORDs
    stack frame size   =  1 DWORDs
    untracked count    =  0 
    var ptr tab count  =  2 
    exception handlers = yes
    epilog # 0    at   0027
    epilog # 1    at   00AA
    argTabOffset = 7  
81 3B AB B1 9C | 
BF C3 CF 01 02 | 
27 81 03 07    | 

Pointer table:
10 81 27 0A    | 00A7..00B1  [EBP-10H] a  pointer
10 0D 07    ...| 00B4..00BB  [EBP-10H] a  pointer
F0 41    4F ...| 0009        reg EAX becoming live
72    0B 52 ...| 000B        reg ESI becoming live
4F    52 00 ...| 0012        reg ECX becoming live
0B    00 F0 ...| 0015        reg ECX becoming dead
52    F0 42 ...| 0017        reg EDX becoming live
00    42 10 ...| 0017        reg EAX becoming dead
F0 42    F0 ...| 0021        reg EAX becoming live
10    04 30 ...| 0021        reg EDX becoming dead
F0 04    F0 ...| 002D        reg EAX becoming dead
30    42 72 ...| 002D        reg ESI becoming dead
F0 42    F1 ...| 0037        reg EAX becoming live
72    49 F0 ...| 0039        reg ESI becoming live
F1 49    0B ...| 004A        reg ECX becoming live
F0 0B    4A ...| 0055        reg ECX becoming dead
52    06 08 ...| 0057        reg EDX becoming live
4A    08 10 ...| 0059        reg ECX becoming live
06    10 4A ...| 005F        reg EAX becoming dead
08    4A 0D ...| 005F        reg ECX becoming dead
10    0D 30 ...| 005F        reg EDX becoming dead
4A    30 71 ...| 0061        reg ECX becoming live
0D    71 F0 ...| 0066        reg ECX becoming dead
30    F0 42 ...| 0066        reg ESI becoming dead
71    42 7A ...| 0067        reg ESI becoming live
F0 42    F0 ...| 0071        reg EAX becoming live
7A    58 F1 ...| 0073        reg EDI becoming live
F0 58    51 ...| 007B        reg EBX becoming live
F1 51    4A ...| 008C        reg EDX becoming live
81    0E 10 ...| 008D        push ptr  0
4A    10 18 ...| 008F        reg ECX becoming live
0E    18 30 ...| 0095        reg ECX becoming dead
10    30 C8 ...| 0095        reg EDX becoming dead
18    C8 52 ...| 0095        reg EBX becoming dead
30    52 4A ...| 0095        reg ESI becoming dead
C8    4A 06 ...| 0095        pop  1 ptrs
52    06 08 ...| 0097        reg EDX becoming live
4A    08 10 ...| 0099        reg ECX becoming live
06    10 4A ...| 009F        reg EAX becoming dead
08    4A 0D ...| 009F        reg ECX becoming dead
10    0D 38 ...| 009F        reg EDX becoming dead
4A    38 44 ...| 00A1        reg ECX becoming live
0D    44 F1 ...| 00A6        reg ECX becoming dead
38    F1 00 ...| 00A6        reg EDI becoming dead
44    00 FF ...| 00AA        reg EAX becoming live
F1 00    4C ...| 00BA        reg EAX becoming dead
FF    00 00 ...| 

Presumably it's trying to report 10 81 27 0A | 00A7..00B1 [EBP-10H] a pointer at the time of the crash which is indeed not valid yet. It only becomes valid on the next instruction.

@filipnavara
Copy link
Member Author

@filipnavara
Copy link
Member Author

I'm starting to suspect that the liveness of the variable may be misreported on other platforms too. We just get away with it because the funclet prolog is marked as nogc which excludes it from GC stress. Until recently we didn't have the capability to report these no-GC regions on x86 and to keep the GC info small I opted to never emit it for any prolog/epilog, main function or funclet. We can change that to emit it for funclet prologs/epilogs but I would like someone to look at the liveness issue first before patching it up with a bandaid. /cc @dotnet/jit-contrib

x86 JIT dump: https://gist.githubusercontent.com/filipnavara/87489fc9b6224706111055ce1ee79583/raw/80122358359dbe95e53fd7708c59611f7024aa87/gistfile1.txt
x64 JIT dump: https://gist.githubusercontent.com/filipnavara/c76975c784519ed718e1701a24e6a458/raw/88a2fc3e1eddf932d62065cd63d578a38eca3a11/gistfile1.txt

@SingleAccretion
Copy link
Contributor

SingleAccretion commented May 24, 2025

We just get away with it because the funclet prolog is marked as nogc which excludes it from GC stress. Until recently we didn't have the capability to report these no-GC regions on x86 and to keep the GC info small I opted to never emit it for any prolog/epilog, main function or funclet. We can change that to emit it for funclet prologs/epilogs but I would like someone to look at the liveness issue first before patching it up with a bandaid.

It is an expected invariant that prolog/epilog are no-GC regions. Why did this issue not manifest for filter funclets in the old scheme (because filter-live variable are always pinned?)?

@filipnavara
Copy link
Member Author

filipnavara commented May 24, 2025

It is an expected invariant that prolog/epilog are no-GC regions.

Yeah, and that's fine. x86 GC info always encoded the (non-funclet) prologs/epilogs and treated them as no-GC on the VM side.

Until #115630 the funclet prolog was empty on win-x86 so we didn't run into this problem because there was no code in the no-GC region. Likewise, the funclet epilog is single "ret" instruction so we don't really run into problem either. (These assumptions are not true for linux-x86 which has additional stack alignment instruction and which would have likely run into this problem if someone ever got far enough to run GC stress tests.)

I did, however, expect that calling gcInfo.gcResetForBB(); from genFuncletProlog would reset the liveness of the variables. I'm not sure that's actually happening (properly). Observably, the variable slot gets invalidated only before the IN002d: 0000B1 mov gword ptr [V06 ebp-0x10], eax instruction and then it becomes valid again. I would have expected the previous validity of the slot to be already reset at the beginning of genFuncletProlog and then the new one to start at 0xB4 as it does. (UPD: Or rather, I would have expected the genEpilog to end the liveness; whoever does it, it's unexpected that the variable is alive past the ret instruction and into the following block.)

Why did this issue not manifest for filter funclets in the old scheme (because filter-live variable are always pinned?)?

Not sure I know the precise answer. I remember stumbling upon some specific JIT32_GCENCODER blocks but cannot find it right now.

@SingleAccretion
Copy link
Contributor

SingleAccretion commented May 24, 2025

(UPD: Or rather, I would have expected the genEpilog to end the liveness; whoever does it, it's unexpected that the variable is alive past the ret instruction and into the following block.)

Looks like we have a comment that this is indeed intentional:

void CodeGen::genReserveFuncletProlog(BasicBlock* block)
{
    assert(compiler->UsesFunclets());
    assert(block != nullptr);

    /* Currently, no registers are live on entry to the prolog, except maybe
       the exception object. There might be some live stack vars, but they
       cannot be accessed until after the frame pointer is re-established.
       In order to potentially prevent emitting a death before the prolog
       and a birth right after it, we just report it as live during the
       prolog, and rely on the prolog being non-interruptible. Trust
       genCodeForBBlist to correctly initialize all the sets.

       We might need to relax these asserts if the VM ever starts
       restoring any registers, then we could have live-in reg vars...
    */

    noway_assert((gcInfo.gcRegGCrefSetCur & RBM_EXCEPTION_OBJECT) == gcInfo.gcRegGCrefSetCur);
    noway_assert(gcInfo.gcRegByrefSetCur == 0);

    JITDUMP("Reserving funclet prolog IG for block " FMT_BB "\n", block->bbNum);

    GetEmitter()->emitCreatePlaceholderIG(IGPT_FUNCLET_PROLOG, block, gcInfo.gcVarPtrSetCur, gcInfo.gcRegGCrefSetCur,
                                          gcInfo.gcRegByrefSetCur, false);
}

@filipnavara
Copy link
Member Author

Looks like we have a comment that this is indeed intentional:

Thanks. I adjusted emitting the no-GC regions for funclet prologs/epilogs but I will look into this at some later point to see if we can improve the codegen. We no longer establish the frame pointer in funclet prologs on any platform so the reasoning makes little sense.

@jkotas
Copy link
Member

jkotas commented May 24, 2025

/azp run runtime-coreclr gcstress0x3-gcstress0xc

@jkotas
Copy link
Member

jkotas commented May 24, 2025

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@am11
Copy link
Member

am11 commented May 24, 2025

ASSERT_AND_CHECK(SyncTableEntry::GetSyncTableEntry()[sbIndex].m_Object == obj);

failed:

> C:\h\w\A2A90912\w\AFCF0954\e\Interop\Interop\../BestFitMapping/BestFitMapping/BestFitMapping.cmd
Xunit.Sdk.TrueException: 
Assert failure(PID 9828 [0x00002664], Thread: 7420 [0x1cfc]): SyncTableEntry::GetSyncTableEntry()[sbIndex].m_Object == obj

CORECLR! ObjHeader::Validate + 0x147 (0x73f2686f)
CORECLR! Object::ValidateInner + 0x20F (0x73ed9e1f)
CORECLR! Object::Validate + 0x98 (0x73ed9bd8)
CORECLR! WKS::GCHeap::Promote + 0x8F (0x741fedaf)
CORECLR! GcEnumObject + 0x6F (0x73ff522f)
CORECLR! EnumGcRefsX86 + 0x10C6 (0x73e00776)
CORECLR! EECodeManager::EnumGcRefs + 0x19B (0x73dff66b)
CORECLR! GcStackCrawlCallBack + 0x2BC (0x73ff55ec)
CORECLR! Thread::MakeStackwalkerCallback + 0x4B (0x73f17fdd)
CORECLR! Thread::StackWalkFramesEx + 0xEC (0x73f1909f)
    File: D:\a\_work\1\s\src\coreclr\vm\syncblk.cpp:2019
    Image: C:\h\w\A2A90912\p\corerun.exe

@filipnavara
Copy link
Member Author

Yeah, I downloaded the artifacts for BestFitMapping to investigate. I didn't expect a clean run just yet but it's already looking better than the first attempt.

@filipnavara
Copy link
Member Author

The BestFitMapping is not like the other failures I have seen. The object is valid but it has a header pointing to sync block. The sync block table has 2 free entries. That's not really my area of expertise, so any advice is welcome.

The other failure - Interop/COM/NETClients/IDispatch/NETClientIDispatch/NETClientIDispatch - was not reproducible on my machine last time it occurred. However, it does happen consistently on the CI machines. I'll have another look.

@am11
Copy link
Member

am11 commented May 25, 2025

The BestFitMapping is not like the other failures I have seen.

Previous discussion at #74741.

The other failure - Interop/COM/NETClients/IDispatch/NETClientIDispatch/NETClientIDispatch

Is it failing on x86? https://helix.dot.net/api/2019-06-17/jobs/05d45426-3dee-421e-a098-6215d618b5ce/workitems/Interop.0.1/console suggests it has passed.

@jkotas
Copy link
Member

jkotas commented May 25, 2025

Interop/COM/NETClients/IDispatch/NETClientIDispatch/NETClientIDispatch

The failure log is here: https://helixr1107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-115957-merge-e222de0ef7c24526a4/Interop.0.1/1/console.0d3011b6.log

It is failing at:

Assert failure(PID 7312 [0x00001c90], Thread: 6796 [0x1a8c]): stkOffs < 0

CORECLR! EnumGcRefsX86 + 0x11B1 (0x72fd0861)
CORECLR! EECodeManager::EnumGcRefs + 0x19B (0x72fcf66b)

We are doing a GC in the middle of this filter:

catch (TargetInvocationException e) when (unwrapExceptions)

My guess is that the following condition has something to do with it

#if defined(FEATURE_EH_FUNCLETS) // funclets
// Filters are the only funclet that run during the 1st pass, and must have
// both the leaf and the parent frame reported. In order to avoid double
// reporting of the untracked variables, do not report them for the filter.
if (!isFilterFunclet)
#endif // FEATURE_EH_FUNCLETS

@filipnavara
Copy link
Member Author

My guess is that the following condition has something to do with it

That sounds extremely plausible. It fails to skip over the table data for untracked variables.

@filipnavara
Copy link
Member Author

The BestFitMapping is not like the other failures I have seen.

Previous discussion at #74741.

Interesting. It does sound awfully similar. The difference is that we are dealing with WKS GC here and that we are just about to mark the object, so the initial conditions are quite different.

@jkotas
Copy link
Member

jkotas commented May 25, 2025

Methodical_r1.0.1

Failure log: https://helixr1107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-115957-merge-e222de0ef7c24526a4/Methodical_r1.0.1/1/console.8c41ce38.log

Failing at:

04d2d03c 73ededaf coreclr!Object::Validate+0x98
04d2d05c 73cd522f coreclr!WKS::GCHeap::Promote+0x8f
04d2d078 73adfbdc coreclr!GcEnumObject+0x6f
04d2d3e0 73adf66b coreclr!EnumGcRefsX86+0x52c
04d2d48c 73cd55ec coreclr!EECodeManager::EnumGcRefs+0x19b

We just returned from fault funclet

fault
{
IL_01c0: ldc.i4.6
IL_01c1: newarr [mscorlib]System.String
IL_01c6: stloc.s V_9
IL_01c8: ldloc.s V_9
IL_01ca: ldc.i4.0
IL_01cb: ldstr "Current totals "
IL_01d0: stelem.ref
IL_01d1: ldloc.s V_9
IL_01d3: ldc.i4.1
IL_01d4: ldloc.3
IL_01d5: callvirt instance class System.String [mscorlib]System.Object::ToString()
IL_01da: stelem.ref
IL_01db: ldloc.s V_9
IL_01dd: ldc.i4.2
IL_01de: ldstr " and "
IL_01e3: stelem.ref
IL_01e4: ldloc.s V_9
IL_01e6: ldc.i4.3
IL_01e7: ldloc.s total2
IL_01e9: unbox [mscorlib]System.Double
IL_01ee: ldind.r8
IL_01ef: stloc.s V_7
IL_01f1: ldloca.s V_7
IL_01f3: call instance class System.String [mscorlib]System.Double::ToString()
IL_01f8: stelem.ref
IL_01f9: ldloc.s V_9
IL_01fb: ldc.i4.4
IL_01fc: ldstr " and "
IL_0201: stelem.ref
IL_0202: ldloc.s V_9
IL_0204: ldc.i4.5
IL_0205: ldloc.s total3
IL_0207: callvirt instance class System.String [mscorlib]System.Object::ToString()
IL_020c: stelem.ref
IL_020d: ldloc.s V_9
IL_020f: call class System.String [mscorlib]System.String::Concat(class System.String[])
IL_0214: call void [System.Console]System.Console::WriteLine(class System.String)
IL_0219: endfinally
} // end handler
and GC triggered when returning from RhpCallFinallyFunclet back to main EH second pass loop
// N.B. -- We need to suppress GC "in-between" calls to finallys in this loop because we do
// not have the correct next-execution point live on the stack and, therefore, may cause a GC
// hole if we allow a GC between invocation of finally funclets (i.e. after one has returned
// here to the dispatcher, but before the next one is invoked). Once they are running, it's
// fine for them to trigger a GC, obviously.
//
// As a result, RhpCallFinallyFunclet will set this state in the runtime upon return from the
// funclet, and we need to reset it if/when we fall out of the loop and we know that the
// method will no longer get any more GC callbacks.
byte* pFinallyHandler = ehClause._handlerAddress;
exInfo._idxCurClause = curIdx;
#if NATIVEAOT
InternalCalls.RhpCallFinallyFunclet(pFinallyHandler, exInfo._frameIter.RegisterSet);
#else // NATIVEAOT
fixed (EH.ExInfo* pExInfo = &exInfo)
{
InternalCalls.RhpCallFinallyFunclet(pFinallyHandler, exInfo._frameIter.RegisterSet, pExInfo);
}
#endif // NATIVEAOT
.

The problem is that we are trying to report GC references based on the IP of call inside the try region (exception was thrown inside this call). These GC references are no longer valid

CHK_AND_REPORT_REG(REGI_EDI, regs & RM_EDI, iregs & RM_EDI, Edi);
.

Fault funclets should work exactly same as finally funclets. Fault funclets are rare since they are not generated by C#. Are we missing check for the fault funclets somewhere?

@filipnavara
Copy link
Member Author

filipnavara commented May 25, 2025

Failure log: https://helixr1107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-115957-merge-e222de0ef7c24526a4/Methodical_r1.0.1/1/console.8c41ce38.log

Isn't that from one of the earlier runs that didn't produce no-GC region for funclet prologs? The symptoms match it.

Nvm, I see it now.

@filipnavara
Copy link
Member Author

I suspect the Methodical_r1 test may be failing because this code path is not implemented for !USE_GC_INFO_DECODER:

#if defined(FEATURE_EH_FUNCLETS) && defined(USE_GC_INFO_DECODER)
if (pCF->ShouldParentToFuncletUseUnwindTargetLocationForGCReporting())
{
GCInfoToken gcInfoToken = pCF->GetGCInfoToken();
GcInfoDecoder _gcInfoDecoder(
gcInfoToken,
DECODE_CODE_LENGTH
);
if(_gcInfoDecoder.WantsReportOnlyLeaf())
{
// We're in a special case of unwinding from a funclet, and resuming execution in
// another catch funclet associated with same parent function. We need to report roots.
// Reporting at the original throw site gives incorrect liveness information. We choose to
// report the liveness information at the first interruptible instruction of the catch funclet
// that we are going to execute. We also only report stack slots, since no registers can be
// live at the first instruction of a handler, except the catch object, which the VM protects
// specially. If the catch funclet has not interruptible point, we fall back and just report
// what we used to: at the original throw instruction. This might lead to bad GC behavior
// if the liveness is not correct.
const EE_ILEXCEPTION_CLAUSE& ehClauseForCatch = pCF->GetEHClauseForCatch();
relOffsetOverride = FindFirstInterruptiblePoint(pCF, ehClauseForCatch.HandlerStartPC,
ehClauseForCatch.HandlerEndPC);
_ASSERTE(relOffsetOverride != NO_OVERRIDE_OFFSET);
STRESS_LOG3(LF_GCROOTS, LL_INFO1000, "Setting override offset = %u for method %pM ControlPC = %p\n",
relOffsetOverride, pMD, GetControlPC(pCF->GetRegisterSet()));
}
}
#endif // FEATURE_EH_FUNCLETS && USE_GC_INFO_DECODER

It would have been taken otherwise based on the state of the variables.

@jkotas
Copy link
Member

jkotas commented May 25, 2025

BestFitMapping

Log https://helixr1107v0xdcypoyl9e7f.blob.core.windows.net/dotnet-runtime-refs-pull-115957-merge-05d454263dee421ea0/Interop.0.1/1/console.b2222a36.log

Failure:

Assert failure(PID 9828 [0x00002664], Thread: 7420 [0x1cfc]): SyncTableEntry::GetSyncTableEntry()[sbIndex].m_Object == obj

CORECLR! ObjHeader::Validate + 0x147 (0x73f2686f)
CORECLR! Object::ValidateInner + 0x20F (0x73ed9e1f)
CORECLR! Object::Validate + 0x98 (0x73ed9bd8)
CORECLR! WKS::GCHeap::Promote + 0x8F (0x741fedaf)
CORECLR! GcEnumObject + 0x6F (0x73ff522f)
CORECLR! EnumGcRefsX86 + 0x10C6 (0x73e00776)

This assert is one of the typical GC hole symptoms. It is likely to be hit anytime we miss reporting an object with a sync block. The offending object in this case is a marshalled delegate. Marshalled delegates have syncblocks.

The object is valid but it has a header pointing to sync block.

It likely means that the object was collected, but the space was not reused by the GC for a new object yet. We likely missed reporting the slot in some earlier GC but reporting it again.

The callstack looks like this: BestFitMapping!ILStubClass.IL_STUB_PInvoke -> native code -> BestFitMapping!ILStubClass.IL_STUB_ReversePInvoke

  • IL_STUB_ReversePInvoke has finally block. We just returned from calling the finally block (non-exceptionally).
  • IL_STUB_PInvoke has a local that stores the invalid delegate object.

Let's take a look what !DumpLog says about the previous GC:

1cfc   1.930619591 : `GC`GCROOTS`         Ending scan of Thread 02FAD110 ID = 0x2 {
1cfc   1.930618407 : `GCROOTS`            STACKWALK: SKIPPING_TO_FUNCLET_PARENT: not making callback for this frame, SPOfParent = 00B7E8D4,                                      isILStub = 0, m_crawl.pFunc = 08AE0314 (__GeneratedMainWrapper.Main())
1cfc   1.930618280 : `EH`GCROOTS`         CrawlFrame (00B7BE54): Frameless: Yes CallerSP: 00B7E980
1cfc   1.930615468 : `GCROOTS`            STACKWALK: SKIPPING_TO_FUNCLET_PARENT: not making callback for this frame, SPOfParent = 00B7E8D4,                                      isILStub = 0, m_crawl.pFunc = 08AE2960 (BestFitMapping.TestEntryPoint())
1cfc   1.930615330 : `EH`GCROOTS`         CrawlFrame (00B7BE54): Frameless: Yes CallerSP: 00B7E968
1cfc   1.930612350 : `GCROOTS`            STACKWALK: SKIPPING_TO_FUNCLET_PARENT: not making callback for this frame, SPOfParent = 00B7E8D4,                                      isILStub = 1, m_crawl.pFunc = 08DA84E4 (ILStubClass.IL_STUB_PInvoke(SCallBackInOutByRef))
1cfc   1.930612200 : `EH`GCROOTS`         CrawlFrame (00B7BE54): Frameless: Yes CallerSP: 00B7E938
1cfc   1.930608805 : `GC`GCROOTS`             GC Root 00B7E8B8 RELOCATED 0553E480 -> 05538754  MT = 035B8FC4 (System.String)
1cfc   1.930606588 : `GCROOTS`            Scanning Frameless method 08DA89DC (ILStubClass.IL_STUB_ReversePInvoke(IntPtr*)) EIP = 04D973C7 &EIP = 00B7DE90
1cfc   1.930606158 : `EH`GCROOTS`         CrawlFrame (00B7BE54): Frameless: Yes CallerSP: 00B7E87C
1cfc   1.930605891 : `GCROOTS`            STACKWALK: Found Non-Filter funclet @ SP: 00B7E878, m_crawl.pFunc = 08DA89DC; FuncletParentCallerSP: 00B7E8D4
1cfc   1.930605836 : `EH`GCROOTS`         CrawlFrame (00B7BE54): Frameless: Yes CallerSP: 00B7E87C
1cfc   1.930599724 : `GCROOTS`            Scanning ExplicitFrame 00B7D0E8 AssocMethod = 00000000 FrameIdentifier = ResumableFrame

This GC was triggered in the finally funclet inside IL_STUB_ReversePInvoke. But then the stackwalk got confused and skipped reporting for ILStubClass.IL_STUB_PInvoke. That's the GC hole.

@am11
Copy link
Member

am11 commented May 25, 2025

BestFitMapping was fixed by b9cd5ac. Could you rerun gcstress?

@jkotas
Copy link
Member

jkotas commented May 25, 2025

BestFitMapping was fixed by b9cd5ac.

This was a fix for Interop/COM/NETClients/IDispatch/NETClientIDispatch/NETClientIDispatch. I do not think BestFitMapping is fixed.

@jkotas
Copy link
Member

jkotas commented May 25, 2025

/azp run runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@filipnavara
Copy link
Member Author

filipnavara commented May 25, 2025

I have a fix locally for Methodical_r1.0.1. I'll push it after going through more tests and once the current GCStress run finishes.

UPD: Hmm, still needs some refining.

UPD2: As usual, the refining consisted of actually saving a file before starting the compilation... (ac4373f)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-ExceptionHandling-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New exception handling on win-x86
4 participants