-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCRTbase.dll toupper() is 133x slower wall time than perl/msvcrt.dll #23037
Comments
https://bugs.python.org/issue35195 In 2018 Python identified this problem. Py ticket remains open ATM Feb 2025. IDK enough arch/API/design/tech info to understand all the comments in the cPy tickets if there is a proposed fix or reject fix or unfairly rejected fix in those 2 tickets. |
UCRT works; many bugs went away when we converted to use it. |
I'm not so worried about the performance of toupper() here, but there are a few other problems with this code:
Fixing all this would eliminate the toupper/isupper() calls, I don't know off-hand what the appropriate Win32 API would be. |
Forgot to add in the OP. Since 5.37.10 and commit 8a548d1 The P5P repo's .t files , and less so CPAN, will call Line 857 in 16196ae
make test . Copy pasted from a GH runner, blead perl has 1.2 million tests.
100K*4ms= 6.6 minutes faster core 55 millisecond is 1.7 frames at 30 frames per second. Blead perl currently has a 33,000 OP*s executed timer, before the first time it polls the Win32 GUI loop. Its crazy "link av.obj hv.obj perl.obj /delayload:user32.dll -o perl541.dll" really helps with blead perl core self So The question now is, does WinPerl selectivly replace cherry picked, problematic, slow, libc calls in Because perl.exe has the choice of which one to call at runtime, they both are available at all times inside a perl process. The call stacks, profiler reports, and my benchmarks show an ex^^^^ponential multiple orders of magnitude performance difference, between 2 difference implementations, of the same exact C standard lib function. Next question, why is WinPerl even C linking against MS's Would slurping/looping U8 values 0x00-0xFF, 1x on process start, through MS UCRT's Nobody can justify enumerating all 250 country codes on earth in a SQL DB/for loop+ You can't upper case an ASCII string, for each 8 bit character, you posting a new job ad on LinkedIn, interview and hiring a new developer and agree on a consulting contract and fee schedule, he reads the ASCII char and writes with a pen, 01000001, and hands you the paper with 01000001 written on it, and you hand him a check for $500, and his employment at you company terminates. He was paid $500 for 15-25 seconds of work. Great company to work for. 5 stars employer. Thats what UCRT is doing internally. 3 rd possible fix, the most difficult fix, which is beyond my expertise, figure out why The API docs for So did perl.exe/perl5xx.dll/perl5porters do something wrong and explicitly disable the cache logic inside ucrtbase.dll? Or this is a bug inside ucrtbase.dll, which only Microsoft can fix, and a member of the public must file a public bug ticket with MS, and MS devs must recompiling and publishing a new higher build number of ucrtbase.dll? Beyond scope for me to diag this. IDK enough. |
Maybe my PS I've spend 3 days searching ReactOS for what is the limit for U8's per "char" for a "MBCS" code page on a technical MS NLS C API level. I believe
BTW I believe
IDK enough. Maybe this toupper()/isupper() bug has something to do with that newish in Perl many reader single writer locking process global locale inter-OS thread serializing/anti-race code.
What are Perl in C's mandatory requirement for vendor C std lib toupper()/isupper() ? https://en.cppreference.com/w/cpp/string/byte/toupper says no
As you and me both agreed on IRC, there is some really poor quality Win32 only code, inside https://github.com/Perl/perl5/blob/blead/win32/perlhost.h that turns the But I'm less concerned about performance of creating ithread # 2 in a WinOS proc, vs perl interp executing this broken slow
|
Another idea, on WinPerl, is a codebase wide grep 9 stack That branch in If libperl.dll always passes a locale_t as arg 2, that Perl process-wide thread-wide locale settling race bug with WinPerl serializing multi-OS thread access, using a very poor DIY-ed by Perl re-implementation of MS's Slim reader/writer (SRW) API https://learn.microsoft.com/en-us/windows/win32/sync/slim-reader-writer--srw--locks that whole API thing, basically will disappear through macros/etc from WinPerl/libperl.dll, maybe the exported lock variables stay for less than perfect CPAN XS code, but nothing in libperl.dll will ever obtain that serialize lock ever again, And MS UCRT Devs probably can't even see the It doesn't matter in 2025, but IIRC |
Module:
Description
A certain profiling call stack caught my eye and the final report from my profiler said 8% of all cpu time of perl is spent inside.
isupper()
/toupper()
from ucrtbase.dll, these are floating between place 4- place 8 as highest CPU hogs on random core .t'es. upper() Reaching # 1 was jaw dropping. Hence I investigated.some research this is 1 call about 1 U8 BTW, ::LocaleUpdate has 6 FlsGetValue calls (wraped with glerr preserving), toupper() fires::LocaleUpdate() every time, errorno in ucrt added another 4-5 FLSGV calls __acrt_LCMapStringA�() fires ::LocaleUpdate again ,
soon after
a few cpu ins addrs later (remember lines of code have loops)
kernelbase.dll tries building a tree of nodes or iterating all country codes on earth, data being searched by KernelBase.dll!GetNamedLocaleHashNode looks like
but this is raw memory with unprintables regexped out, i think its country codes but im not going rev eng it
benchmarks its horrible
with psudo threads 3 cores, idk enough if this is scaling or lock contention perl side or ms side is happening
Steps to Reproduce
Expected behavior
Half joke half serious, but remove UCRT from default build config win perl and link against msvcrt.dll.
Perl configuration
The text was updated successfully, but these errors were encountered: