-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement InterlockedCompareExchange
using CAS
#21
base: master
Are you sure you want to change the base?
Conversation
This implementation is only usable on 486 or newer CPUs, so there is also code that detects at runtime whether the AC bit of the EFLAGS register can be set. On 386, this register always returns the same value. The CAS implementation uses the "lock cmpxchg" instruction, which is only executed if the above runtime check is satisfied. The overhead of calling this implementation is a "cmp" followed by a "jnz" instruction, which should have a negligible cost when execution already had to cross a DLL boundary to get here. Using the command find -name "*.dll" -o -name "*.vxd" | perl -nE "print if grep { $_ =~ /InterlockedCompareExchange/ } `dumpbin /imports $_`;" Here is a list of .dll and .vxd files in the installer that import the "InterlockedCompareExchange" symbol and thus benefit from this: $WINDIR/Microsoft.NET/Framework/sbscmp10.dll $WINDIR/Microsoft.NET/Framework/sbscmp20_mscorwks.dll $WINDIR/Microsoft.NET/Framework/sbscmp20_perfcounter.dll $WINDIR/Microsoft.NET/Framework/SharedReg12.dll $WINDIR/Microsoft.NET/Framework/v1.0.3705/mscormmc.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/AdoNetDiag.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/alink.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/aspnet_filter.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/aspnet_isapi.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/Aspnet_perf.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/CORPerfMonExt.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/cscomp.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/Culture.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/dfdll.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/diasymreader.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/fusion.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/MmcAspExt.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordacwks.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordbc.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordbi.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorie.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorjit.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorld.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorpe.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsec.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsn.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsvc.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscortim.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorwks.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/normalization.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/PerfCounter.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/peverify.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/sbscmp20_mscorlib.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/shfusion.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/ShFusRes.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Data.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Data.OracleClient.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/System.EnterpriseServices.Wrapper.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Transactions.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/VsaVb7rt.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/webengine.dll $WINDIR/Microsoft.NET/Framework/v2.0.50727/WMINet_Utils.dll $WINDIR/RegisteredPackages/{D5D40355-5FB0-48fb-A231-CDC637FA16E0}/NETFXMigration.dll $WINDIR/System/dfshim.dll $WINDIR/System/mscoree.dll $WINDIR/System/mscories.dll $WINDIR/System/msvcm80.dll $WINDIR/System/msvcp80.dll $WINDIR/System/WBEM/Wmidcad.dll $WINDIR/winsxs/x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.42_none_db5f52fb98cb24ad/msvcm80.dll $WINDIR/winsxs/x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.42_none_db5f52fb98cb24ad/msvcp80.dll $WINDIR/winsxs/x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.42_x-ww_0de06acd/msvcm80.dll $WINDIR/winsxs/x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.42_x-ww_0de06acd/msvcp80.dll
I have also cleaned the CMakeLists.txt up a little. I don't know what toolchain you used to build the project, but now with an extra |
I just noticed that you are using the MSVC420 (nice) toolchain, which is missing MASM. You would need to update CI to also grab MASM6. |
Alright, I have installed Win95 in a VM, then installed MASM 6.11 on it, then upgraded that to 6.11d then 6.14. I extracted the installation folder from the .vdi file and with the :: call C:\MSVC420\bin\VCVARS32.BAT x86
:: set "PATH=%PATH%;C:\MASM611\BIN"
C:\cmake-3.13.5-win64-x64\bin\cmake.exe -B build -G "NMake Makefiles" -D CMAKE_BUILD_TYPE=Release -D CMAKE_ASM_MASM_FLAGS=/coff
C:\cmake-3.13.5-win64-x64\bin\cmake.exe --build build And with a modern toolchain: :: call "C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\VsDevCmd.bat" -arch=x86 -host_arch=amd64 -no_logo
cmake -B build-modern -G Ninja -D CMAKE_BUILD_TYPE=Release -D "CMAKE_ASM_MASM_FLAGS=/nologo /quiet" -D "CMAKE_C_FLAGS_RELEASE=/O1 /Ob1 /DNDEBUG=1 /Gy /Gw /GL /arch:IA32 /GS- /Gs1000000000" -D "CMAKE_MODULE_LINKER_FLAGS_RELEASE=/INCREMENTAL:NO /Brepro /emitpogophaseinfo /opt:icf /opt:ref /nocoffgrpinfo /manifest:no /emitvolatilemetadata:no /LTCG"
cmake --build build-modern |
Would be possible to implement exactly this code but using XCHG for supporting 386 / x86 processors? XCHG would work for all cpus (slower than CMPXCHG, of course). If we want to optimize we can compile for 386/486 processors, or conditionally run either version at runtime. |
That defeats the purpose. I don't see how an atomic unconditional exchange without a return value would help implementing an atomic conditional exchange with a return value. For a 386 target, the existing solution is "good enough", since it wouldn't be really atomic even with XCHG. There is a reason why Microsoft had a really heavyweight solution to this. |
Hi there! Been thinking about this and I get the impression it probably isn't worth the effort. It appears the .NET JIT itself will happily emit I think a better option is to correctly implement InterlockedCompareExchange using |
Sounds good to me. I originally intended to put this PR up as a draft, mainly to gather feedback from Matt, but he hasn't commented on this topic yet. Explicitly excluding 386 would be a kind of compromise I wouldn't want to force, even though the JIT apparently does not care. |
Ahh sorry for not commenting, I was following the discussion but forgot to weigh in myself. I think the x86 asm solution is great (though inlined asm probably is preferable if that's doable). As for 386s, that's a tough one. We can probably just fallback to the current implementation for the time being, but I'm currently of the opinion that this project should start angling towards becoming a more general kernel extender since there are limitations to what we can do (even with .NET apps) only patching the runtime. In that scenario, perhaps there would be a way to bring 98's driver-based solution over for 386? |
Sure, but that's outside my current abilities or the scope of this PR. |
Actually, inline asm is causing values to be left on the stack: Ghidra **************************************************************
* FUNCTION *
**************************************************************
LONG __stdcall InterlockedCompareExchange(LONG * Destina
LONG EAX:4 <RETURN>
LONG * Stack[0x4]:4 Destination XREF[2]: 10001383(R),
10001393(R)
LONG Stack[0x8]:4 Exchange XREF[1]: 100013b0(R)
LONG Stack[0xc]:4 Comperand XREF[1]: 100013a5(R)
undefined4 Stack[0x0]:4 local_res0 XREF[1]: 1000137f(R)
undefined4 Stack[-0x4]:4 local_4 XREF[1]: 1000137b(R)
0x1370 223 InterlockedCompareExchange
Ordinal_223 XREF[2]: Entry Point(*), 100023a0(*)
InterlockedCompareExchange
10001370 83 3d 10 CMP dword ptr [DAT_10007010],0x0
70 00 10 00
10001377 56 PUSH ESI <-- These are
10001378 57 PUSH EDI <-- never popped.
10001379 74 18 JZ LAB_10001393
1000137b 8b 4c 24 04 MOV ECX,dword ptr [ESP + local_4]
1000137f 8b 54 24 08 MOV EDX,dword ptr [ESP + local_res0]
10001383 8b 44 24 0c MOV EAX,dword ptr [ESP + Destination]
10001387 f0 0f b1 11 CMPXCHG. dword ptr [ECX],EDX
1000138b c2 0c 00 RET 0xc Pretty sure this is a compiler bug. I made sure I call another function to avoid the prologue/epilogue causing issues. Might just rewrite the whole function using |
This avoids having to rely on MASM.
This implementation is only usable on 486 or newer CPUs, so there is also code that detects at runtime whether the AC bit of the EFLAGS register can be set. On 386, this register always returns the same value.
The CAS implementation uses the
lock cmpxchg
instruction, which is only executed if the above runtime check is satisfied. The overhead of calling this implementation is acmp
followed by ajnz
instruction, which should have a negligible cost when execution already had to cross a DLL boundary to get here.Using the command
Here is a list of .dll and .vxd files in the installer that import the "InterlockedCompareExchange" symbol and thus benefit from this:
List of .dll files
Fixes #19