Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement InterlockedCompareExchange using CAS #21

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

friendlyanon
Copy link

@friendlyanon friendlyanon commented Apr 14, 2024

This implementation is only usable on 486 or newer CPUs, so there is also code that detects at runtime whether the AC bit of the EFLAGS register can be set. On 386, this register always returns the same value.

The CAS implementation uses the lock cmpxchg instruction, which is only executed if the above runtime check is satisfied. The overhead of calling this implementation is a cmp followed by a jnz instruction, which should have a negligible cost when execution already had to cross a DLL boundary to get here.

Using the command

find -name "*.dll" -o -name "*.vxd" | perl -nE "print if grep { $_ =~ /InterlockedCompareExchange/ } `dumpbin /imports $_`;"

Here is a list of .dll and .vxd files in the installer that import the "InterlockedCompareExchange" symbol and thus benefit from this:

List of .dll files
$WINDIR/Microsoft.NET/Framework/sbscmp10.dll
$WINDIR/Microsoft.NET/Framework/sbscmp20_mscorwks.dll
$WINDIR/Microsoft.NET/Framework/sbscmp20_perfcounter.dll
$WINDIR/Microsoft.NET/Framework/SharedReg12.dll
$WINDIR/Microsoft.NET/Framework/v1.0.3705/mscormmc.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/AdoNetDiag.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/alink.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/aspnet_filter.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/aspnet_isapi.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/Aspnet_perf.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/CORPerfMonExt.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/cscomp.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/Culture.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/dfdll.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/diasymreader.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/fusion.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/MmcAspExt.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordacwks.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordbc.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordbi.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorie.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorjit.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorld.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorpe.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsec.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsn.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsvc.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscortim.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorwks.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/normalization.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/PerfCounter.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/peverify.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/sbscmp20_mscorlib.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/shfusion.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/ShFusRes.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Data.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Data.OracleClient.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/System.EnterpriseServices.Wrapper.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Transactions.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/VsaVb7rt.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/webengine.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/WMINet_Utils.dll
$WINDIR/RegisteredPackages/{D5D40355-5FB0-48fb-A231-CDC637FA16E0}/NETFXMigration.dll
$WINDIR/System/dfshim.dll
$WINDIR/System/mscoree.dll
$WINDIR/System/mscories.dll
$WINDIR/System/msvcm80.dll
$WINDIR/System/msvcp80.dll
$WINDIR/System/WBEM/Wmidcad.dll
$WINDIR/winsxs/x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.42_none_db5f52fb98cb24ad/msvcm80.dll
$WINDIR/winsxs/x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.42_none_db5f52fb98cb24ad/msvcp80.dll
$WINDIR/winsxs/x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.42_x-ww_0de06acd/msvcm80.dll
$WINDIR/winsxs/x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.42_x-ww_0de06acd/msvcp80.dll

Fixes #19

This implementation is only usable on 486 or newer CPUs, so there is
also code that detects at runtime whether the AC bit of the EFLAGS
register can be set. On 386, this register always returns the same
value.

The CAS implementation uses the "lock cmpxchg" instruction, which is
only executed if the above runtime check is satisfied. The overhead of
calling this implementation is a "cmp" followed by a "jnz" instruction,
which should have a negligible cost when execution already had to cross
a DLL boundary to get here.

Using the command

find -name "*.dll" -o -name "*.vxd" | perl -nE "print if grep { $_ =~ /InterlockedCompareExchange/ } `dumpbin /imports $_`;"

Here is a list of .dll and .vxd files in the installer that import the
"InterlockedCompareExchange" symbol and thus benefit from this:

$WINDIR/Microsoft.NET/Framework/sbscmp10.dll
$WINDIR/Microsoft.NET/Framework/sbscmp20_mscorwks.dll
$WINDIR/Microsoft.NET/Framework/sbscmp20_perfcounter.dll
$WINDIR/Microsoft.NET/Framework/SharedReg12.dll
$WINDIR/Microsoft.NET/Framework/v1.0.3705/mscormmc.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/AdoNetDiag.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/alink.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/aspnet_filter.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/aspnet_isapi.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/Aspnet_perf.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/CORPerfMonExt.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/cscomp.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/Culture.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/dfdll.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/diasymreader.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/fusion.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/MmcAspExt.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordacwks.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordbc.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscordbi.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorie.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorjit.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorld.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorpe.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsec.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsn.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorsvc.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscortim.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/mscorwks.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/normalization.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/PerfCounter.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/peverify.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/sbscmp20_mscorlib.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/shfusion.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/ShFusRes.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Data.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Data.OracleClient.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/System.EnterpriseServices.Wrapper.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/System.Transactions.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/VsaVb7rt.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/webengine.dll
$WINDIR/Microsoft.NET/Framework/v2.0.50727/WMINet_Utils.dll
$WINDIR/RegisteredPackages/{D5D40355-5FB0-48fb-A231-CDC637FA16E0}/NETFXMigration.dll
$WINDIR/System/dfshim.dll
$WINDIR/System/mscoree.dll
$WINDIR/System/mscories.dll
$WINDIR/System/msvcm80.dll
$WINDIR/System/msvcp80.dll
$WINDIR/System/WBEM/Wmidcad.dll
$WINDIR/winsxs/x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.42_none_db5f52fb98cb24ad/msvcm80.dll
$WINDIR/winsxs/x86_microsoft.vc80.crt_1fc8b3b9a1e18e3b_8.0.50727.42_none_db5f52fb98cb24ad/msvcp80.dll
$WINDIR/winsxs/x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.42_x-ww_0de06acd/msvcm80.dll
$WINDIR/winsxs/x86_Microsoft.VC80.CRT_1fc8b3b9a1e18e3b_8.0.50727.42_x-ww_0de06acd/msvcp80.dll
@friendlyanon
Copy link
Author

I have also cleaned the CMakeLists.txt up a little. I don't know what toolchain you used to build the project, but now with an extra .lib file for the missing exports (EnumSystemLocalesA EnumSystemLocalesW GetThreadLocale SetThreadLocale from kernel32) the project also builds with the VS2022 toolchain and a modern Windows SDK. I am doing something similar in simcity-noinstall, which technically also targets Windows 95.

@friendlyanon friendlyanon marked this pull request as ready for review April 14, 2024 21:08
@friendlyanon
Copy link
Author

I just noticed that you are using the MSVC420 (nice) toolchain, which is missing MASM. You would need to update CI to also grab MASM6.

@friendlyanon
Copy link
Author

Alright, I have installed Win95 in a VM, then installed MASM 6.11 on it, then upgraded that to 6.11d then 6.14. I extracted the installation folder from the .vdi file and with the bin directory added to PATH, CMake will find ML.EXE and properly compile the assembly files when configuring with -D CMAKE_ASM_MASM_FLAGS=/coff. Full build commands with MSVC420 and MASM614:

:: call C:\MSVC420\bin\VCVARS32.BAT x86
:: set "PATH=%PATH%;C:\MASM611\BIN"
C:\cmake-3.13.5-win64-x64\bin\cmake.exe -B build -G "NMake Makefiles" -D CMAKE_BUILD_TYPE=Release -D CMAKE_ASM_MASM_FLAGS=/coff
C:\cmake-3.13.5-win64-x64\bin\cmake.exe --build build

And with a modern toolchain:

:: call "C:\Program Files\Microsoft Visual Studio\2022\Community\Common7\Tools\VsDevCmd.bat" -arch=x86 -host_arch=amd64 -no_logo
cmake -B build-modern -G Ninja -D CMAKE_BUILD_TYPE=Release -D "CMAKE_ASM_MASM_FLAGS=/nologo /quiet" -D "CMAKE_C_FLAGS_RELEASE=/O1 /Ob1 /DNDEBUG=1 /Gy /Gw /GL /arch:IA32 /GS- /Gs1000000000" -D "CMAKE_MODULE_LINKER_FLAGS_RELEASE=/INCREMENTAL:NO /Brepro /emitpogophaseinfo /opt:icf /opt:ref /nocoffgrpinfo /manifest:no /emitvolatilemetadata:no /LTCG"
cmake --build build-modern

@andresvettori
Copy link

Would be possible to implement exactly this code but using XCHG for supporting 386 / x86 processors? XCHG would work for all cpus (slower than CMPXCHG, of course). If we want to optimize we can compile for 386/486 processors, or conditionally run either version at runtime.
What do you think?

@friendlyanon
Copy link
Author

That defeats the purpose. I don't see how an atomic unconditional exchange without a return value would help implementing an atomic conditional exchange with a return value. For a 386 target, the existing solution is "good enough", since it wouldn't be really atomic even with XCHG. There is a reason why Microsoft had a really heavyweight solution to this.

@WamWooWam
Copy link
Contributor

Hi there! Been thinking about this and I get the impression it probably isn't worth the effort. It appears the .NET JIT itself will happily emit lock cmpxchg instructions when using APIs like System.Threading.Interlocked.CompareExchange, as the .NET 2.x JIT appears to only ever have targeted i586-class (Intel Pentium) CPUs and later.

I think a better option is to correctly implement InterlockedCompareExchange using lock cmpxchg and inline assembly (to avoid bringing in MASM) and to only support i486-class CPUs and above for this project, as patching all uses of cpmxchg out of the .NET JIT would likely be a massive undertaking that could only ever emit incorrect code, and would disproportionately cause threading/race condition issues on other platforms like NT4 and Windows 98, where these features are supported by minimum system requirements.

@friendlyanon
Copy link
Author

Sounds good to me. I originally intended to put this PR up as a draft, mainly to gather feedback from Matt, but he hasn't commented on this topic yet. Explicitly excluding 386 would be a kind of compromise I wouldn't want to force, even though the JIT apparently does not care.

@itsmattkc
Copy link
Owner

Ahh sorry for not commenting, I was following the discussion but forgot to weigh in myself. I think the x86 asm solution is great (though inlined asm probably is preferable if that's doable).

As for 386s, that's a tough one. We can probably just fallback to the current implementation for the time being, but I'm currently of the opinion that this project should start angling towards becoming a more general kernel extender since there are limitations to what we can do (even with .NET apps) only patching the runtime. In that scenario, perhaps there would be a way to bring 98's driver-based solution over for 386?

@friendlyanon
Copy link
Author

perhaps there would be a way to bring 98's driver-based solution over for 386?

Sure, but that's outside my current abilities or the scope of this PR.

@friendlyanon
Copy link
Author

friendlyanon commented May 1, 2024

Actually, inline asm is causing values to be left on the stack:

Ghidra
                     **************************************************************
                     *                          FUNCTION                          *
                     **************************************************************
                     LONG __stdcall InterlockedCompareExchange(LONG * Destina
     LONG              EAX:4          <RETURN>
     LONG *            Stack[0x4]:4   Destination                             XREF[2]:     10001383(R), 
                                                                                           10001393(R)  
     LONG              Stack[0x8]:4   Exchange                                XREF[1]:     100013b0(R)  
     LONG              Stack[0xc]:4   Comperand                               XREF[1]:     100013a5(R)  
     undefined4        Stack[0x0]:4   local_res0                              XREF[1]:     1000137f(R)  
     undefined4        Stack[-0x4]:4  local_4                                 XREF[1]:     1000137b(R)  
                     0x1370  223  InterlockedCompareExchange
                     Ordinal_223                                     XREF[2]:     Entry Point(*), 100023a0(*)  
                     InterlockedCompareExchange
10001370 83 3d 10        CMP        dword ptr [DAT_10007010],0x0
         70 00 10 00
10001377 56              PUSH       ESI                    <-- These are
10001378 57              PUSH       EDI                    <-- never popped.
10001379 74 18           JZ         LAB_10001393
1000137b 8b 4c 24 04     MOV        ECX,dword ptr [ESP + local_4]
1000137f 8b 54 24 08     MOV        EDX,dword ptr [ESP + local_res0]
10001383 8b 44 24 0c     MOV        EAX,dword ptr [ESP + Destination]
10001387 f0 0f b1 11     CMPXCHG.   dword ptr [ECX],EDX
1000138b c2 0c 00        RET        0xc

Pretty sure this is a compiler bug. I made sure I call another function to avoid the prologue/epilogue causing issues. Might just rewrite the whole function using __declspec(naked) and omit prologue/epilogue and logging altogether.

This avoids having to rely on MASM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

InterlockedCompareExchange atomic implementation
4 participants