Skip to content

pp_ref() builtin_pp_reftype(): strlen()+Newx()+memcpy()->100% pre-made COWs #23391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: blead
Choose a base branch
from

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Jun 29, 2025

  • ref() PP keyword has extremely high usage. Greping my blead repo shows:
    Searched "ref(" 4347 hits in 605 files of 5879 searched

  • The strings keyword ref() returns are part of the Perl 5 BNF grammer.
    This is not up for debate. Changing their spelling or lowercasing them
    is not for debate, or i18n-ing them dynamically realtime against
    glibc.so's current OS process global locale is not up for debate or
    wiring, or wiring inotify/kqueue into the runloop to monitor /etc or /var
    so this race condition works as designed in a unit test:

    $perl -E "dire('hello')"
    Routine indéfinie &cœur::dire aufgerufen bei -e Zeile 1
    
  • sv_reftype() and sv_ref() have very badly designed prototypes, and the
    first time a new Perl in C dev reads their source code, they will think
    these 2 will cause infinite C stack recursion and a SEGV. Probably most
    automated C code analytic tools will complain these 2 functions do
    infinite recursion too.

  • The 2 functions don't return a string length, forcing all callers to
    execute a libc strlen() call on a string, that could be 8 bytes, or 80 MB.

  • The 2 functions don't split, parse, cat, or glue multiple strings to
    create their output. All null term-ed strings that they return, are
    already sitting in virtual address space. Either const HW RO, or
    RCed HEK*s from the PL_strtab pool, that were found inside something
    similar to a GV*/HV*/HE*/CV*/AV*/GP*/OP*/SV* in a OP* (no threads).

  • COW 255 buffers from Newx() under 9 chars can't COW currently by policy.
    CODE is 4, SCALAR is 6. HASH is 4. ARRAY is 5. But very short SV HEK* COWs
    will COW propagate without problems.

  • PP code if(ref($self) eq 'HASH') {} should never involve all 3-4 calls
    Newx()/Realloc()/strlen()/memcpy().

    So this fix all of this, and make pp_ref()/PP KW ref() be closer in speed
    to C/C++/Asm style object type checking, which is almost always going to
    be 1 or 2 or 3 ptr equality tests against C constant &sum_vtbl_sum_class,
    or in Microsoft ecosystem SW, its a equality test of a 16 byte GUID in
    memory, against a 16 byte SSE literal stored in a SSE opcode (TLDR ver).
    Just convert backends sv_ref()/sv_reftype() to HEK* retvals, and convert
    the front end pp_*() ops to fetch HEK*s and return SV*s with
    POK_on SvPVX()== HEK*. In all likely hood, if right side of PP code is
    if (ref($self) eq 'HASH') {}, during the execution of
    memcpy(pv1, pv2, len) as part of pp_eq, pv1 and pv2 are the same mem addr.
    But I didn't single step eq operator to verify that yet.

  • inside PP(pp_reftype) previously the branch sv_setsv(TARG, &PL_sv_undef);
    did not fire SMG, after this commit it does, IDK why it wasnt firing
    before, or consequences of SMG firing now on sv_set_undef(rsv); path.

  • I suspect "sv_setsv(TARG, &PL_sv_undef);" and "sv_set_undef(rsv);" are
    not perfect behavior copies of each other, in extreme/bizzare/user error
    and bad CPAN XS code situtations but I haven't found any side effects of
    the switch from sv_setsv(TARG, &PL_sv_undef); to sv_set_undef(rsv)

    Untested typothetical cases like
    sv_setsv(gv_star, &PL_sv_undef); sv_setsv(hv_star, &PL_sv_undef);
    sv_setsv(svt_regexp_star, &PL_sv_undef);
    sv_setsv(svt_invlist_star, &PL_sv_undef);
    sv_setsv(svt_object_star, &PL_sv_undef);
    sv_setsv(svt_io_star, &PL_sv_undef);

  • sv_sethek() has a severe pathologic performance problem, if args
    SV* dsv and HEK* src_hek, test true for

    if(SvPVX(dsv) == HEK_KEY(src_hek)) {}.
    

    But its still better than a strlen()/Newx()/memcpy()/push_save_stack()/
    delayed_Safefree(); cycle. Any fix for this would be for the future.

  • these 2 functions are experimental for now, hence undocumented and not
    public API, if they are made public, arg const int ob should be removed
    because of its confusing faux-infinite recursion but not real life
    infinite recursion. The fuctions are exported so P5P hackers and
    CPAN XS devs (unsanctioned by P5P) can benchmark and research these 2 new
    functions using Inline::C/EU::PXS.

  • future improvements not done here, make sv_reftype() and sv_ref() wrappers
    around their HEK* counterparts. Note the HEK* must be RC++ed and stuffed
    in a new SV*, or a PAD TARG SV*, before the rpp_replace_1_1_NN(TARG); call
    because in artificial situations/fuzzing, strange things can happen during
    a SvREFCNT_dec_NN(); call, and the HEK* sitting in a C auto might
    get freed during the SvREFCNT_dec_NN();

  • another improvement, sv_sethek(rsv, hek); is somewhat heavy, and doesn't
    have a shortcut, to RC-- an existing SVPV HEK* COW itself, instead it
    uses SV_THINKFIRST_***() and sv_force_normal***() to RC-- an existing
    SVPV HEK* COW. If the SV* PAD TARG, is being used over and over by ref()
    opcode, its always going to have a stale HEK* SVPVX() that needs to be
    RC--ed.

  • another improvement, check if(sv_reftypehek() == SvPVX(targ)) before
    calling sv_sethek(rsv, hek);

  • another improvement, beyond scope for me, make into 1 OP*/opcode:

    if(ref($self) eq 'HASH')
    

    and

    if(ref($self) eq 'ARRAY')
    
  • another improvement, dont deref my_perl->Iop/PL_ptr many times in a row.
    I didn't do any CPU opcode/instruction stripping in this commit. Thats
    for a future commit.

  • another improvement, investigate if most of large switch() inside
    Perl_sv_reftypehek() can be turned into a
    const I8 arr_of_PL_sv_consts_idxs[]; with a couple tiny special cases.

  • todo invert if (!rsv) { branch, so hot path (yes cached in PL_sv_consts).
    comes first in machine code/asm order.


  • This set of changes requires a perldelta entry, and I need help writing it.

…e COWs

-ref() PP keyword has extremely high usage. Greping my blead repo shows:
 Searched "ref(" 4347 hits in 605 files of 5879 searched
-The strings keyword ref() returns are part of the Perl 5 BNF grammer.
 This is not up for debate. Changing their spelling or lowercasing them
 is not for debate, or i18n-ing them dynamically realtime against
 glibc.so's current OS process global locale is not up for debate or
 wiring, or wiring inotify/kqueue into the runloop to monitor /etc or /var
 so this race condition works as designed in a unit test:
     $perl -E "dire('hello')"
     Routine indéfinie &cœur::dire aufgerufen bei -e Zeile 1
-sv_reftype() and sv_ref() have very badly designed prototypes, and the
 first time a new Perl in C dev reads their source code, they will think
 these 2 will cause infinite C stack recursion and a SEGV. Probably most
 automated C code analytic tools will complain these 2 functions do
 infinite recursion too.
-The 2 functions don't return a string length, forcing all callers to
 execute a libc strlen() call on a string, that could be 8 bytes, or 80 MB.
-The 2 functions don't split, parse, cat, or glue multiple strings to
 create their output. All null term-ed strings that they return, are
 already sitting in virtual address space. Either const HW RO, or
 RCed HEK*s from the PL_strtab pool, that were found inside something
 similar to a GV*/HV*/HE*/CV*/AV*/GP*/OP*/SV* in a OP*(no threads).
-COW 255 buffers from Newx() under 9 chars can't COW currently by policy.
 CODE is 4, SCALAR is 6. HASH is 4. ARRAY is 5. But very short SV HEK* COWs
 will COW propagate without problems.
-PP code "if(ref($self) eq 'HASH') {}" should never involve all 3-4 calls
 Newx()/Realloc()/strlen()/memcpy().

 So this fix all of this, and make pp_ref()/PP KW ref() be closer in speed
 to C/C++/Asm style object type checking, which is almost always going to
 be 1 or 2 or 3 ptr equality tests against C constant &sum_vtbl_sum_class,
 or in Microsoft ecosystem SW, its a equality test of a 16 byte GUID in
 memory, against a 16 byte SSE literal stored in a SSE opcode (TLDR ver).
 Just convert backends sv_ref()/sv_reftype() to HEK* retvals, and convert
 the front end pp_*() ops to fetch HEK*s and return SV*s with
 POK_on SvPVX()== HEK*. In all likely hood, if right side of PP code is
 "if (ref($self) eq 'HASH') {}", during the execution of
 memcpy(pv1, pv2, len) as part of pp_eq, pv1 and pv2 are the same mem addr.
 But I didn't single step eq operator to verify that yet.
-inside PP(pp_reftype) previously the branch sv_setsv(TARG, &PL_sv_undef);
 did not fire SMG, after this commit it does, IDK why it wasnt firing
 before, or consequences of SMG firing now on sv_set_undef(rsv); path.
-I suspect "sv_setsv(TARG, &PL_sv_undef);" and "sv_set_undef(rsv);" are
 not perfect behavior copies of each other, in extreme/bizzare/user error
 and bad CPAN XS code situtations but I haven't found any side effects of
 the switch from sv_setsv(TARG, &PL_sv_undef); to sv_set_undef(rsv)
 Untested typothetical cases like
 sv_setsv(gv_star, &PL_sv_undef);  sv_setsv(hv_star, &PL_sv_undef);
 sv_setsv(svt_regexp_star, &PL_sv_undef);
 sv_setsv(svt_invlist_star, &PL_sv_undef);
 sv_setsv(svt_object_star, &PL_sv_undef);
 sv_setsv(svt_io_star, &PL_sv_undef);

-sv_sethek() has a severe pathologic performance problem, if args
 SV* dsv and HEK* src_hek, test true for
     if(SvPVX(dsv) == HEK_KEY(src_hek)) {}.
 But its still better than a strlen()/Newx()/memcpy()/push_save_stack()/
 delayed_Safefree(); cycle. Any fix for this would be for the future.
-these 2 functions are experimental for now, hence undocumented and not
 public API, if they are made public, arg "const int ob" should be removed
 because of its confusing faux-infinite recursion but not real life
 infinite recursion. The fuctions are exported so P5P hackers and
 CPAN XS devs (unsanctioned by P5P) can benchmark and research these 2 new
 functions using Inline::C/EU::PXS.
-future improvements not done here, make sv_reftype() and sv_ref() wrappers
 around their HEK* counterparts. Note the HEK* must be RC++ed and stuffed
 in a new SV*, or a PAD TARG SV*, before the rpp_replace_1_1_NN(TARG); call
 because in artificial situations/fuzzing, strange things can happen during
 a SvREFCNT_dec_NN(); call, and the HEK* sitting in a C auto might
 get freed during the SvREFCNT_dec_NN();
-another improvement, sv_sethek(rsv, hek); is somewhat heavy, and doesn't
 have a shortcut, to RC-- an existing SVPV HEK* COW itself, instead it
 uses SV_THINKFIRST_***() and sv_force_normal***() to RC-- an existing
 SVPV HEK* COW. If the SV* PAD TARG, is being used over and over by ref()
 opcode, its always going to have a stale HEK* SVPVX() that needs to be
 RC--ed.
-another improvement, check if(sv_reftypehek() == SvPVX(targ)) before
 calling sv_sethek(rsv, hek);
-another improvement, beyond scope for me, make into 1 OP*/opcode:
	if(ref($self) eq 'HASH')
and
	if(ref($self) eq 'ARRAY')
-another improvement, dont deref my_perl->Iop/PL_ptr many times in a row.
 I didn't do any CPU opcode/instruction stripping in this commit. Thats
 for a future commit.
-another improvement, investigate if most of large switch() inside
 Perl_sv_reftypehek() can be turned into a
 const I8 arr_of_PL_sv_consts_idxs[];  with a couple tiny special cases.
-todo invert "if (!rsv) {" branch, so hot path (yes cached in PL_sv_consts).
 comes first in machine code/asm order.
@iabyn
Copy link
Contributor

iabyn commented Jun 30, 2025 via email

@bulk88
Copy link
Contributor Author

bulk88 commented Jun 30, 2025

On Sun, Jun 29, 2025 at 03:56:07PM -0700, bulk88 wrote: -ref() PP keyword has extremely high usage. Greping my blead repo shows:
[ snip 100 further lines] Please try to use meaningful commit summary lines and messages.

All tech decisions are documented with rational. Read them bullet point by bullet point.

If I am the only Subject Matter Expert who knows the Perl VM C code, I can't really help out a React JSX SME or Go SME guru who tries to review the Perl C VM code. At that point I would have to offer a 6 hour pre-conference class at a TPRC or YAPCEU event on P5 VM C level design/optimization/O(n) complexity of interp internals to my students. Not a joke.

I tried to read the commit message. I had no idea what what the commit was about, apart from something to do with a badly designed sv_ref/sv_reftype API perhaps?. Looking at the actual diff I guess the commit is about adding two new functions, sv_refhek and sv_reftypehek and then making use of them to speed up pp_ref() etc.? And perhaps adding some new SV constants?

Correct. I didn't invent SV_CONST()/PL but my eyes are telling me it has design has similarities to 1990s/2000s era Spidermonkey's analogs of SV* POK Newx() vs SV* POK HEK* COW vs SV* POK SVppv_STATIC COW.

Current sv_ref/sv_reftype's protoype/fn signature is atrocious. Why do those 2 functions not return the string length to the caller through any mechanism? Whoever typed that in and saved it, I would never hire that person to work as an IT employee.

Utf8 isn't original to P5, but those 2 can't return a yes/no utf8 flag either. Also the backing storage and lifetime of those char*s is undefined according to public API AFAIK. Clearly anyone can look at the source code and see the 1 word all upper case char*s are C "" lit strings, and the "::" strings are HEK*s, but that is reverse engineering/non-public API.

Returning HEKs always with the 2 new fns fixes pretty much every design problem I can think of. Returning new SV heads with RC=1, or new SV heads with RC=1+mortal, or accepting an in SV* to set, I believe is alot of unnecessary overhead, since those SV heads and SVPV bodies would be constantly alloced then dtored in the caller frame, or at next save_stack or mortal_stack boundary, and 50% of the caller's of sv_ref/sv_reftype are printf() style functions that want a "%s" or "%" SVf or or "%" HEKf ptr for a very brief moment in time. They aren't interested in long term RC++ed storage of the string. But if the caller wants long term RC++ storage, they can get it very quickly and with COW benefits by calling sv_sethek() or newSVhek().

Also I decided returning the global/permanent SV*s, back to callers is a bad idea, I would have to mark the SV*s SvREADONLY(), and there is a risk of SvREADONLY() marked SV* winding up inside an AV* or inside a HE* without a high level PP level copy/assignment op (aka newSVsv()) to strip the SvREADONLY()-ness flag.

$pvref = \ref($self);
${$pvref} .= ' class is unknown.'; 
die ${$pvref};

Now what? Line 2 fatal errored. But if its a SVPV holding a HEK, it is silently decowed on line 2 without problems. Thats why the new API returns HEKs and doesn't use SV APIs.

The SpiderMonkey JS engine's src code's initial commit is 1 year or max 2 years, after Perl 5's initial commit. So SM JS engine and Perl 5 engine are the same exact age. Since Netscape's/Mozilla's/Firefox's JS engine is very well used, tried, true, and tested for decades, borrowing design choices from it, can not be a bad idea.

Perl's PL_sv_consts is way too short. This PR improves the situation. This PR is a small step in a hot code path towards the goal of solving this meta bug I made #22872

Rest of this is FF JS VM vs P5 VM management of CC/link time constants and how they appear on a C runtime level and at a ECMAScript/PP level.

Spidermonkey calls them "Atoms", Perl calls them "HEK *"s or "U32 hash"s. Spidermonkey uses words like "Pinned" and "JSExternalString", to mean Perl's SVppv_STATIC or RO or RW C data globals.

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/String.h

Here is a list of what Spidermonkey says are critical "" string/token/identifier literals that are required to run the JS engine.

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/CommonPropertyNames.h

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/Keywords.h

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/jsatom.cpp#L56

Spidermonkey Immortals

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/Id.cpp
https://github.com/ricardoquesada/Spidermonkey/blob/master/js/public/Id.h#L149
https://github.com/ricardoquesada/Spidermonkey/blob/master/js/public/Value.h#L1985

Spidermonkey has C global RW HEK*s structs baked into the engine (libperl.so or libspidermonkey.so) at CC time.

https://github.com/ricardoquesada/Spidermonkey/blob/4a75ea2543408bd1b2c515aa95901523eeef7858/js/src/jsatom.cpp#L268

Notice Spidermonkey has 1 byte long (latin 1) Immortal SV*s for 0-9 A-Z and a-z, and IDK if im reading the code right, but they also have a array of (26+26+10) x (26+26+10)= 3844 immortal SV*s covering all 2 byte permutations of ( 0-9 A-Z and a-z) x (0-9 A-Z and a-z). This would allow C-like speed in SM JS or Perl char by char string processing with PP substr(), or C-like speed and C-like memory usage, for splitting a SVPV* into an AV* of 1 byte long SVPV*s.

Currently in Perl, splitting a SVPV into an AV* has a 8+24+16+16=64 bytes per 1 original char expansion ratio overhead.
If some permutation, or all permutations, of lower 7 bit or high ASCII Latin-1, printable and/or unprintable, \w, \d, ., \s, were SV* IMMs, the expansion ratio of splitting a SVPV into a AV* would just be 8 bytes per original 1 byte.

https://github.com/ricardoquesada/Spidermonkey/blob/4a75ea2543408bd1b2c515aa95901523eeef7858/js/src/vm/String.cpp#L710

8+24+16+16=64 bytes, detailed math: 8 SV* in AV* + 24 SV head + 16 XPV body + 8 OS malloc header + 16 min buf alloc rule of newSVpvn = 72 bytes

offtopic: stolen buzzword/tech word from Perl VM lol https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/SelfHosting.cpp#L247
https://github.com/ricardoquesada/Spidermonkey/blob/master/js/public/Value.h#L313

Not how SM burns in/attaches/binds XSUBs or does its newXS(); calls. They are const RO arrays of structs, not a long list of serial fn calls in machine code with 2-5 args the way BOOT:{} and EU::PXS do it.

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/SelfHosting.cpp#L793

SM's analog of CV* heads/ CV* body structs are stored in const RO C global memory mmaped/disk backed memory, unlike Perl which uses no-malloc-header-bloat arena pool slots from malloc() memory.

https://github.com/ricardoquesada/Spidermonkey/blob/4a75ea2543408bd1b2c515aa95901523eeef7858/js/src/jsapi.h#L2428

Here is your (davem's) short string experiment perl branch , as production code in SM

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/String-inl.h#L46

I think machine integers 0-99, can be converted to base 10 RCed ASCII string objects, in O(1) time, its just a single pointer dereference to turn ints 0-99 or 0-256, into JS VM ASCII string objects

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/String.h#L1043

This is for another ticket, but SM decided on > 1/4th unused space, or 75% mark, to do a realloc() to shrink operation. Perl's current logic is much much more complicated for deciding when to COW, deCOW, and do shrinking realloc().

https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/vm/StringBuffer.cpp#L30

offtopic, the JS stack, internally is the OS's C stack with some tiny Asm tricks, generic RISC and stack grows up HPUX PARISC compliant https://github.com/ricardoquesada/Spidermonkey/blob/master/js/src/jsnativestack.cpp

The commit also seems to have snuck in an unrelated change to pp_const().

It is unrelated, but a tiny meaningless change, not worth a PR on its own, and then 2 lines long commit in the P5P repo.

I can BP that line now to see what is inside the SV*. It makes no machine code difference in -O1/-O2 before and after. But I can now set a BP on the line, and see what is inside the SV* struct. If someone doesn't like the change, it means they don't know what a C debugger is, or how to use one, and can't call themselves a professional C dev if the only C level diag tool they know how to use is printf().

@xenu
Copy link
Member

xenu commented Jul 1, 2025

Let me summarise the above:

"I am very smart."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants