Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent return of dangling Vector/QAngle to VScript #321

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

z33ky
Copy link

@z33ky z33ky commented Sep 5, 2024

When a Vector or QAngle rvalue reference is returned from a VScript function, the constructed ScriptVariant_t must not store the pointer to it since it is, per convention, a temporary reference. Only do that for lvalue-references, but do a copy when constructing from or assigning a rvalue reference.


I'm not sure ScriptVairant_t really needs to try and prevent copies of Vector and QAngle. They're pretty small and trivial classes, so are cheap to copy. When a copy must be made it surely beats a dynamic allocation & deallocation.
Free() refers to FIELD_HSCRIPT, although it will never set SV_FREE. Perhaps this is just outdated?
Strings will need to be copied (strdup()ed) due to their dynamic length, but it could try to do a small-string optimization.

The types of new and delete are misaligned, since Free() only ever does delete m_pszString. Since HSCRIPT, Vector and QAngle are all trivial to destruct this shouldn't have any ill effects.
At most I'm worried about strdup() being paired with delete instead of free().

Anyway, I wanted to hear your thoughts on this before I change more stuff. This change alone sufficiently fixes issues I've observed in return-values playing a Mapbase-based mod that uses VScripts (https://www.moddb.com/mods/sourceworld).


Does this PR close any issues?

PR Checklist

  • My PR follows all guidelines in the CONTRIBUTING.md file
  • My PR targets a develop branch OR targets another branch with a specific goal in mind

When a Vector or QAngle rvalue reference is returned from a VScript
function, the constructed ScriptVariant_t must not store the pointer to
it since it is, per convention, a temporary reference. Only do that for
lvalue-references, but do a copy when constructing from or assigning a
rvalue reference.
@z33ky
Copy link
Author

z33ky commented Sep 5, 2024

Also, specifically addressing #104 (comment)

Strings are not leaked, doing this for them crashes, so the current non-functional retval.Free() is actually dangerous here.

I didn't see any crashes calling GetName() and similar methods on different entities. Though perhaps other code changed to allow this now, or Linux/GCC is less prone to crashes here.

@samisalreadytaken
Copy link

samisalreadytaken commented Sep 6, 2024

That comment was poorly worded. It was more about being explicit that the only effect of Free() was on Vectors and that it doesn't cover HSCRIPT. Strings were never used after free because they don't set SV_FREE. Though superficially looking at it now, I don't see HSCRIPT leaks either; perhaps I was confusing it with another issue like (#166). I honestly don't remember, the original issue should've included more detail. Also that issue was opened when there was no return value freeing. If it is confirmed that HSCRIPT aren't leaked, then it can be closed as being fixed 3 years ago.

HSCRIPT variants never setting SV_FREE is the cause of #123 and possibly other differences to Valve's implementations of multiple languages.

strdup/delete mismatch is benign in Source because they all call the same global allocator functions. Is the strdup code path even ever hit at the moment? In a project wide grep, I don't see any ScriptVariant_t constructors with bCopy = true parameter except on ScriptRegisterConstantFromTemp, which only registers Vectors. It could be fixed though.

I'm curious what were the problematic functions that required this. Functions that return vec3_origin/vec3_invalid or members from CScriptNetPropManager::GetPropVectorArray now unnecessarily reallocate them with this change. What could be done to prevent this?

@z33ky
Copy link
Author

z33ky commented Sep 6, 2024

If it is confirmed that HSCRIPT aren't leaked, then it can be closed as being fixed 3 years ago.

I don't know about 'HSCRIPT, I didn't follow how these objects are managed and it looks more complex than VectorandQAngle(andchar*`).

Is the strdup() code path even ever hit at the moment?

I didn't even check that.
Though it doesn't seem to matter, especially if when it would happen, delete would still work fine.
Since you seem to be leaning at least slightly in the direction of "fixing" the pairing, I'll just insert a switch to handle the variants correctly. QAngle will still be deleted as Vector though, since it doesn't have it own field-type, but yeah, trivial destructor and all that.

I'm curious what were the problematic functions that required this.

Specifically I noticed it on CBasePlayer::EyeVectors(), but many, if not all, returned absurd values. Honestly I'm surprised this didn't cause issues on Windows, the stack-space the return-value pointer referred to probably gets reused for(/overwritten with) other stuff pretty quickly.

Functions that return vec3_origin/vec3_invalid or members from CScriptNetPropManager::GetPropVectorArray now unnecessarily reallocate them with this change. What could be done to prevent this?

That still uses the (const Vector &val, bool bCopy = false) constructor, since it's returning a const lvalue reference. So it will not be copied.
But as I pondered above, I'm not sure how much memory and performance this really saves in order to try and avoid copying and storing 8 additional bytes. All functions returning a Vector or QAngle by value will incur the cost of dynamic memory allocation and deallocation, though perhaps these are less common/used? I dunno. I'm fine keeping at as I currently have it in this PR, which fixes memory corruptions.
I suppose you could try and somehow store Vector/QAngle return values in a static (i.e. global) variable or reserving stack space in function_stub to use for it, since it will just be copied into VM memory (PushVariant()) and free'd after.
I could perhaps try and implement something along these lines - in a separate PR, that seems out of scope for this bug fix - though I'm not too happy about trying to muck with manual memory management in code I'm not familiar with. You can maybe see that I tried to keep the fix safe and simple.

@z33ky z33ky force-pushed the dangling-vscript-vector branch 2 times, most recently from f9d4f33 to 0551292 Compare September 6, 2024 21:31
@samisalreadytaken
Copy link

samisalreadytaken commented Sep 6, 2024

I think it makes more sense to change strdup into new char[] and memcpy instead of switching delete. Though new[] wouldn't match delete. Maybe malloc Vector?

Specifically I noticed it on CBasePlayer::EyeVectors()

EyeVectors, presumably from ScriptGetEyeForward, returns static Vector as is done everywhere else. I'm confused how returning static Vector references causes that.

That still uses the (const Vector &val, bool bCopy = false)

ScriptVariant_t constructors aren't called for function return values, assignment operators are. Though this seems to be a different issue.

const Vector& CBaseCombatCharacter::ScriptGetAttackSpread( HSCRIPT, HSCRIPT ) script binding is compiled as Vector const &, thus calls operator=( const Vector &vec ):

CMemberScriptBinding2<CBaseCombatCharacter *,Vector const & (__thiscall CBaseCombatCharacter::*)(HSCRIPT__ *,HSCRIPT__ *),Vector const &,HSCRIPT__ *,HSCRIPT__ *>::Call(void *, void *, ScriptVariant_t *, int, ScriptVariant_t *)

However, const Vector &CScriptNetPropManager::GetPropVectorArray( HSCRIPT, const char *, int ) is compiled as Vector, thus calls operator=( Vector &&vec ):

CMemberScriptBinding2<CScriptNetPropManager *,Vector (__thiscall CScriptNetPropManager::*)(HSCRIPT__ *,char const *),Vector,HSCRIPT__ *,char const *>::Call(void *, void *, ScriptVariant_t *, int, ScriptVariant_t *)

They both return vec3_origin/vec3_invalid as const Vector &, so what causes this difference? Note that I'm only testing on msvc.

Test calls: player.GetAttackSpread(null,null), NetProps.GetPropVector(null,"")

Stack allocating a Vector in function_stub sounds interesting, that could be the way to go.

@z33ky
Copy link
Author

z33ky commented Sep 7, 2024

I think it makes more sense to change strdup into new char[] and memcpy instead of switching delete. Though new[] wouldn't match delete. Maybe malloc Vector?

I could new Vector[1]. Or yeah, malloc(), which also maybe makes it more clear we're expecting to work with trivial types, or invoke constructor and destructor manually.

EyeVectors, presumably from ScriptGetEyeForward, returns static Vector as is done everywhere else. I'm confused how returning static Vector references causes that.

Oh sorry, it it Weapon_ShootPosition(). I replaced that with EyeVectors() in the vscript as my first attempt to fix this, which did indeed work fine, before I started to debug why Weapon_ShootPosition() does not work.

ScriptVariant_t constructors aren't called for function return values, assignment operators are. Though this seems to be a different issue.

I don't know what branch you've linked in #321 (comment), but there is different code in both master and develop, which does return a Vector by value for GetPropVector().

@samisalreadytaken
Copy link

I'm using # 260, but it's my mistake. I incorrectly put Vector as return type in the macro that defines non-array versions of the functions; the debugger stepping from GetPropVectorArray straight into script bindings caused my confusion. Ignore that part of my comment.

@z33ky
Copy link
Author

z33ky commented Sep 8, 2024

I think I have been chasing some ghost bugs due to corrupted object files or something.
Anyway, I'm not sure I have much time this week, but I will work on this. Stack allocation for the function_stub return value looks pretty doable.

z33ky added a commit to z33ky/source-sdk-2013 that referenced this pull request Nov 12, 2024
Instead of temporary allocating dynamic memory, which subsequently is
copied into another dynamically allocated piece of memory, provide a
temporary buffer on the stack.
The script value inserted into the VM still needs to allocate
dynamically though...

A few sites are adapted to not create a ScriptVariant_t from a temporary
Vector to avoid dynamic allocation there too.

cf. mapbase-source#321
@z33ky
Copy link
Author

z33ky commented Nov 12, 2024

I've had a family emergency and some stuff got left by the wayside.
I've picked this one back up. I'm not sure if it really was ghost bugs, as memory management issues can have weird effects. It looks like I've managed to get it working stably, so now here's the stack-based (malloc()-free) temporary return-value storage for function_stub() plus a few minor fixes.

z33ky added a commit to z33ky/source-sdk-2013 that referenced this pull request Nov 12, 2024
Instead of temporary allocating dynamic memory, which subsequently is
copied into another dynamically allocated piece of memory, provide a
temporary buffer on the stack.
The script value inserted into the VM still needs to allocate
dynamically though...

A few sites are adapted to not create a ScriptVariant_t from a temporary
Vector to avoid dynamic allocation there too.

cf. mapbase-source#321
@z33ky
Copy link
Author

z33ky commented Nov 13, 2024

Hm, I attempted to fix some sort of syntax error MSVC has with the placement new, but it still complains. gcc accepts both versions. I don't know what its problem is, nor how else I could attempt to fix it :/

z33ky added a commit to z33ky/source-sdk-2013 that referenced this pull request Nov 13, 2024
Instead of temporary allocating dynamic memory, which subsequently is
copied into another dynamically allocated piece of memory, provide a
temporary buffer on the stack.
The script value inserted into the VM still needs to allocate
dynamically though...

A few sites are adapted to not create a ScriptVariant_t from a temporary
Vector to avoid dynamic allocation there too.

cf. mapbase-source#321
z33ky added a commit to z33ky/source-sdk-2013 that referenced this pull request Nov 13, 2024
Instead of temporary allocating dynamic memory, which subsequently is
copied into another dynamically allocated piece of memory, provide a
temporary buffer on the stack.
The script value inserted into the VM still needs to allocate
dynamically though...

A few sites are adapted to not create a ScriptVariant_t from a temporary
Vector to avoid dynamic allocation there too.

cf. mapbase-source#321
@z33ky
Copy link
Author

z33ky commented Nov 13, 2024

Well, why would I expect new to always be a keyword as defined by the standard. It's a macro of course, silly me.
Builds are fixed.

@samisalreadytaken
Copy link

Vector is always float[3] and has a trivial constructor, I don't think placement new is needed at all. It can simply be a copy *pNewVector = *m_pVector

I built z33ky:dangling-vscript-vector with samisalreadytaken:vscript-debugger, and tested this code below on my Windows machine with and without commit 7efec77 "Optimize squirrel function_stub". It was measurably faster without it

sqdbg_prof_stop();
sqdbg_prof_start();

local player = player;

for ( local i = 1000; i--; )
{
	sqdbg_prof_begin("t1");
	player.ShootPosition();
	player.ShootPosition();
	player.ShootPosition();
	sqdbg_prof_end();
}

sqdbg_prof_print("t1");
// 7efec77 reverted
(sqdbg) prof | t1 : total 580.10 us, avg 580.68 ns, peak   3.50 us(1), hits 1000
(sqdbg) prof | t2 : total   5.52 ms, avg 552.17 ns, peak   2.10 us(5035), hits 10000
(sqdbg) prof | t3 : total 563.48 ms, avg 563.48 ns, peak  18.90 us(274844), hits 1000000
// with 7efec77
(sqdbg) prof | t1 : total 778.80 us, avg 779.58 ns, peak   5.00 us(1), hits 1000
(sqdbg) prof | t2 : total   7.83 ms, avg 783.51 ns, peak  41.20 us(1911), hits 10000
(sqdbg) prof | t3 : total 800.75 ms, avg 800.75 ns, peak 127.00 us(496971), hits 1000000

@z33ky
Copy link
Author

z33ky commented Nov 14, 2024

Vector is always float[3] and has a trivial constructor, I don't think placement new is needed at all. It can simply be a copy *pNewVector = *m_pVector

Conceptually yes (unless _DEBUG and VECTOR_PARANOIA is defined), albeit currently the default constructor is user-provided, so is not trivial.
From a cursory glance it seems gcc and MSVC will emit the same code either way. Assignment looks simpler, placement new is more technically correct.

I built z33ky:dangling-vscript-vector with samisalreadytaken:vscript-debugger, and tested this code below on my Windows machine with and without commit 7efec77

Perhaps another change in the commit is responsible? I don't have time to do more testing today; Instead of reverting the whole commit I manually inserted the intermediary malloc() (which was part of operator= before) to keep the rest of the changes in place.

diff --git a/sp/src/public/vscript/vscript_templates.h b/sp/src/public/vscript/vscript_templates.h
index 9f2054cc3..4edf38002 100644
--- a/sp/src/public/vscript/vscript_templates.h
+++ b/sp/src/public/vscript/vscript_templates.h
@@ -374,8 +374,10 @@ inline FUNCPTR_TYPE ScriptConvertFuncPtrFromVoid( void *p )
 			{ \
 				return false; \
 			} \
-			new ( &temporaryReturnStorage.m_vec ) Vector( ((FUNC_TYPE)pFunction)( SCRIPT_BINDING_ARGS_##N ) ); \
-			*pReturn = temporaryReturnStorage.m_vec; \
+			Vector *vec = (Vector*)malloc( sizeof( Vector ) ); \
+			new ( vec ) Vector( ((FUNC_TYPE)pFunction)( SCRIPT_BINDING_ARGS_##N ) ); \
+			*pReturn = vec; \
+			pReturn->m_flags |= SV_FREE; \
 			return true; \
 		} \
 	}; \
@@ -394,8 +396,10 @@ inline FUNCPTR_TYPE ScriptConvertFuncPtrFromVoid( void *p )
 			{ \
 				return false; \
 			} \
-			new ( &temporaryReturnStorage.m_ang ) QAngle( ((FUNC_TYPE)pFunction)( SCRIPT_BINDING_ARGS_##N ) ); \
-			*pReturn = temporaryReturnStorage.m_ang; \
+			QAngle *ang = (QAngle*)malloc( sizeof( QAngle ) ); \
+			new ( ang ) QAngle( ((FUNC_TYPE)pFunction)( SCRIPT_BINDING_ARGS_##N ) ); \
+			*pReturn = ang; \
+			pReturn->m_flags |= SV_FREE; \
 			return true; \
 		} \
 	}; \
@@ -452,8 +456,10 @@ inline FUNCPTR_TYPE ScriptConvertFuncPtrFromVoid( void *p )
 			{ \
 				return false; \
 			} \
-			new ( &temporaryReturnStorage.m_vec ) Vector( (((OBJECT_TYPE_PTR)(pContext))->*ScriptConvertFuncPtrFromVoid<FUNC_TYPE>(pFunction))( SCRIPT_BINDING_ARGS_##N ) ); \
-			*pReturn = temporaryReturnStorage.m_vec; \
+			Vector *vec = (Vector*)malloc( sizeof( Vector ) ); \
+			new ( vec ) Vector( (((OBJECT_TYPE_PTR)(pContext))->*ScriptConvertFuncPtrFromVoid<FUNC_TYPE>(pFunction))( SCRIPT_BINDING_ARGS_##N ) ); \
+			*pReturn = vec; \
+			pReturn->m_flags |= SV_FREE; \
 			return true; \
 		} \
 	}; \
@@ -472,8 +478,10 @@ inline FUNCPTR_TYPE ScriptConvertFuncPtrFromVoid( void *p )
 			{ \
 				return false; \
 			} \
-			new ( &temporaryReturnStorage.m_ang ) QAngle( (((OBJECT_TYPE_PTR)(pContext))->*ScriptConvertFuncPtrFromVoid<FUNC_TYPE>(pFunction))( SCRIPT_BINDING_ARGS_##N ) ); \
-			*pReturn = temporaryReturnStorage.m_ang; \
+			QAngle *ang = (QAngle*)malloc( sizeof( QAngle ) ); \
+			new ( ang ) QAngle( (((OBJECT_TYPE_PTR)(pContext))->*ScriptConvertFuncPtrFromVoid<FUNC_TYPE>(pFunction))( SCRIPT_BINDING_ARGS_##N ) ); \
+			*pReturn = ang; \
+			pReturn->m_flags |= SV_FREE; \
 			return true; \
 		} \
 	}; \
diff --git a/sp/src/vscript/vscript_squirrel.cpp b/sp/src/vscript/vscript_squirrel.cpp
index 47364e007..153dbe2a4 100644
--- a/sp/src/vscript/vscript_squirrel.cpp
+++ b/sp/src/vscript/vscript_squirrel.cpp
@@ -1428,7 +1428,6 @@ SQInteger function_stub(HSQUIRRELVM vm)

 	PushVariant(vm, retval);

-	Assert(!(retval.m_flags & SV_FREE));
 	retval.Free();

 	return pFunc->m_desc.m_ReturnType != FIELD_VOID;

Could you test this on your system? Also, just making sure, you are profiling a release-mode build?

My results look like

// plain
(sqdbg) prof | t1              : total 853.75 us, avg 854.60 ns, peak  10.50 us(664), hits 1000
(sqdbg) prof | t2              : total   8.76 ms, avg 875.61 ns, peak  46.00 us(7494), hits 10000
(sqdbg) prof | t3              : total  86.66 ms, avg 866.64 ns, peak  63.75 us(2333), hits 100000
// patched (+malloc)
(sqdbg) prof | t1              : total 933.50 us, avg 934.43 ns, peak  10.75 us(729), hits 1000
(sqdbg) prof | t2              : total   9.15 ms, avg 914.74 ns, peak  14.75 us(2284), hits 10000
(sqdbg) prof | t3              : total  90.77 ms, avg 907.76 ns, peak  16.25 us(47563), hits 100000

(I assume t2 and t3 are the same loop, just more iterations.)
For some reason there's much higher peaks in the "plain" version, though maybe it was just some inopportune OS scheduling (I ran the test multiple times, just didn't check the peak values until now). And the average doesn't change much admittedly, around 40ns per iteration... I'll profile some changes, like actually reverting 7efec77 (because why not), but perhaps it's just not really worth investigating and should just be removed then.

I'll test with fully reverting 7efec77 tomorrow. The only sensible explanation I can come up with would be that passing the temporaryReturnStorage for the call leads to worse register allocation and the small malloc() can be served really fast or something along those lines.

@samisalreadytaken
Copy link

That malloc patch is slower in the same way as yours. Even removing the unused temporaryReturnStorage parameter had some effect, albeit by very little.

I am of course testing on release with no (native) debugger attached, and t2, t3 are more iterations of the same loop.

// 7efec77 revert
(sqdbg) prof | t1 : total 576.70 us, avg 577.28 ns, peak   3.80 us(816), hits 1000
(sqdbg) prof | t2 : total   6.18 ms, avg 618.02 ns, peak   2.40 us(425), hits 10000
(sqdbg) prof | t3 : total 592.78 ms, avg 592.78 ns, peak  13.50 us(202631), hits 1000000

// with 7efec77
(sqdbg) prof | t1 : total 793.10 us, avg 793.89 ns, peak   3.70 us(1), hits 1000
(sqdbg) prof | t2 : total   7.95 ms, avg 795.54 ns, peak   2.10 us(4917), hits 10000
(sqdbg) prof | t3 : total 789.92 ms, avg 789.92 ns, peak  15.30 us(151586), hits 1000000

// + malloc patch
(sqdbg) prof | t1 : total 854.40 us, avg 855.26 ns, peak   4.80 us(567), hits 1000
(sqdbg) prof | t2 : total   8.26 ms, avg 826.32 ns, peak   2.40 us(8655), hits 10000
(sqdbg) prof | t3 : total 851.58 ms, avg 851.58 ns, peak  15.00 us(108205), hits 1000000

// + remove temporaryReturnStorage parameter
(sqdbg) prof | t1 : total 841.60 us, avg 842.44 ns, peak   4.50 us(1), hits 1000
(sqdbg) prof | t2 : total   8.04 ms, avg 804.10 ns, peak   1.60 us(2413), hits 10000
(sqdbg) prof | t3 : total 829.87 ms, avg 829.87 ns, peak  16.80 us(257791), hits 1000000

Type FIELD_HSCRIPT is removed, it never sets SV_FREE.

All dynamic memory is now free`d via free(). For strdup(), this would be
the conformant way in standard C++. It matters not in Source, since both
use the same global allocator of the engine, but it is tidier.
Vector and QAngle are now allocated via malloc() instead of new to
provide symmetry.
When assigning a null value in getVariant(), make sure a previous
SV_FREE is not carried over.

In SqurrelVM::ReleaseValue(), make sure SV_FREE is not kept after
Free()ing a value.
Instead of temporary allocating dynamic memory, which subsequently is
copied into another dynamically allocated piece of memory, provide a
temporary buffer on the stack.
The script value inserted into the VM still needs to allocate
dynamically though...

A few sites are adapted to not create a ScriptVariant_t from a temporary
Vector to avoid dynamic allocation there too.

cf. mapbase-source#321
@z33ky
Copy link
Author

z33ky commented Nov 15, 2024

I got it, it wins by cheating. If you do print(player.ShootPosition()), it returns null. I am missing 'm_type = FIELD_VECTOR in ScriptVariant_t::operator=(Vector&&) of 0c9622f, so it remains FIELD_VOID and the Vector won't get copied again.
Here's the new timings:

// revert (broken)
(sqdbg) prof | t1              : total 641.00 us, avg 641.64 ns, peak  16.25 us(112), hits 1000
(sqdbg) prof | t2              : total   6.39 ms, avg 639.04 ns, peak  43.50 us(2485), hits 10000
(sqdbg) prof | t3              : total 644.04 ms, avg 644.04 ns, peak  60.25 us(841426), hits 1000000
// revert + fix
(sqdbg) prof | t1              : total 953.00 us, avg 953.95 ns, peak  16.50 us(430), hits 1000
(sqdbg) prof | t2              : total   9.70 ms, avg 970.30 ns, peak  60.75 us(5565), hits 10000
(sqdbg) prof | t3              : total 953.01 ms, avg 953.01 ns, peak  58.25 us(250390), hits 1000000
// unreverted
(sqdbg) prof | t1              : total 967.50 us, avg 968.47 ns, peak  10.25 us(55), hits 1000
(sqdbg) prof | t2              : total   9.46 ms, avg 946.42 ns, peak  30.75 us(540), hits 10000
(sqdbg) prof | t3              : total 953.28 ms, avg 953.28 ns, peak  51.00 us(797629), hits 1000000

So it's not slower, but the speedup looks to be within statistical noise...
I fixed up the offending commit. No changes in the tip of the branches.

@samisalreadytaken
Copy link

Its impact was more noticable in my test

// 0123813 revert
(sqdbg) prof | t1 : total 816.60 us, avg 817.42 ns, peak   4.30 us(1), hits 1000
(sqdbg) prof | t2 : total   7.87 ms, avg 787.48 ns, peak   1.80 us(3691), hits 10000
(sqdbg) prof | t3 : total 808.68 ms, avg 808.68 ns, peak  16.80 us(32719), hits 1000000

// final
(sqdbg) prof | t1 : total 750.30 us, avg 751.05 ns, peak   3.80 us(1), hits 1000
(sqdbg) prof | t2 : total   7.60 ms, avg 760.57 ns, peak   2.70 us(6043), hits 10000
(sqdbg) prof | t3 : total 758.78 ms, avg 758.78 ns, peak  13.70 us(43884), hits 1000000

For reference, testing const ref return (EyePosition)

// 0123813 revert (const ref)
(sqdbg) prof | t1 : total 752.00 us, avg 752.75 ns, peak   3.70 us(1), hits 1000
(sqdbg) prof | t2 : total   7.68 ms, avg 767.72 ns, peak   3.70 us(666), hits 10000
(sqdbg) prof | t3 : total 737.06 ms, avg 737.06 ns, peak  20.70 us(825259), hits 1000000

// final (const ref)
(sqdbg) prof | t1 : total 726.70 us, avg 727.43 ns, peak   3.30 us(1), hits 1000
(sqdbg) prof | t2 : total   7.79 ms, avg 778.62 ns, peak   2.10 us(8384), hits 10000
(sqdbg) prof | t3 : total 759.02 ms, avg 759.02 ns, peak  15.00 us(169738), hits 1000000

Related, return value should only be pushed to script if there is a return value. Otherwise it's unnecessarily pushing null to stack all the time.

diff --git a/sp/src/vscript/vscript_squirrel.cpp b/sp/src/vscript/vscript_squirrel.cpp
index 47364e00..b2a361b1 100644
--- a/sp/src/vscript/vscript_squirrel.cpp
+++ b/sp/src/vscript/vscript_squirrel.cpp
@@ -1426,10 +1426,13 @@ SQInteger function_stub(HSQUIRRELVM vm)
                return sq_throwobject(vm);
        }

-       PushVariant(vm, retval);
+       if ( pFunc->m_desc.m_ReturnType != FIELD_VOID )
+       {
+               PushVariant(vm, retval);

-       Assert(!(retval.m_flags & SV_FREE));
-       retval.Free();
+               Assert(!(retval.m_flags & SV_FREE));
+               retval.Free();
+       }

        return pFunc->m_desc.m_ReturnType != FIELD_VOID;
 }

And profile of player.SetHealth(100)

// final (null ret)
(sqdbg) prof | t1 : total 579.10 us, avg 579.68 ns, peak   2.70 us(1), hits 1000
(sqdbg) prof | t2 : total   5.75 ms, avg 575.03 ns, peak   1.80 us(64), hits 10000
(sqdbg) prof | t3 : total 575.50 ms, avg 575.50 ns, peak  10.90 us(231745), hits 1000000

// final + check ret (null ret)
(sqdbg) prof | t1 : total 520.80 us, avg 521.32 ns, peak   2.20 us(1), hits 1000
(sqdbg) prof | t2 : total   5.24 ms, avg 524.08 ns, peak   1.60 us(8057), hits 10000
(sqdbg) prof | t3 : total 539.46 ms, avg 539.46 ns, peak  10.60 us(703043), hits 1000000

@z33ky
Copy link
Author

z33ky commented Nov 16, 2024

Its impact was more noticable in my test

Cool :)

Related, return value should only be pushed to script if there is a return value. Otherwise it's unnecessarily pushing null to stack all the time.

That's neat. I'll commit this too then.

One more question, I'm seeing the if (!sq_isnull(pSquirrelVM->lastError_)) path, which returns early without freeing retval (btw. my Assert()ion is wrong because strings will still be dynamically allocated and need to be Free()d), though it probably should, yeah?
Also in constructor_stub(), which has a similar check/path for lastError_ and should call pClassDesc->m_pfnDestruct(instance) (if m_pfnDestruct is not nullptr) before returning.

@samisalreadytaken
Copy link

samisalreadytaken commented Nov 16, 2024

Yes, retval should be freed on error as well since the error state doesn't change the return type and allocation. No place where RaiseException is used returns any freeable value though.

m_pfnConstruct does new T so it makes sense to delete it, but why would a constructor raise script exception (unless some hypothetical custom class has external dependencies that aren't met in its constructor)? I think the error check in the constructor can be removed and replaced with an assertion.

With the stack allocation of Vector now, when is retval ever going to need freeing, what strings will be allocated? Strings returned from bound functions are owned by the callees, neither ScriptVariant_t or ScriptBinding::Call reallocate them.

@z33ky
Copy link
Author

z33ky commented Nov 16, 2024

m_pfnConstruct does new T so it makes sense to delete it, but why would a constructor raise script exception (unless some hypothetical custom class has external dependencies that aren't met in its constructor)? I think the error check in the constructor can be removed and replaced with an assertion.

Sounds fair.

With the stack allocation of Vector now, when is retval ever going to need freeing, what strings will be allocated? Strings returned from bound functions are owned by the callees, neither ScriptVariant_t or ScriptBinding::Call reallocate them.

Oh you're right, nothing does. I incorrectly thought the operator= copies, but it doesn't. I am thinking of how getVariant() behaves. Which now makes me wonder if callers of getVariant() properly free it. For Execute(Hook)Function() and Get(Key)Value() the callee is obviously responsible for that (return value), but the {set,add,...}_stub()s look guilty.

@samisalreadytaken
Copy link

samisalreadytaken commented Nov 17, 2024

Those metamethods indeed leak. They are also completely broken, they get the instance itself again instead of the second parameter. Apparently no one tested or used them :P

CScriptQuaternionInstanceHelper needs to be rewritten as well.

diff --git a/sp/src/vscript/vscript_squirrel.cpp b/sp/src/vscript/vscript_squirrel.cpp
index 47364e00..fec6a6d1 100644
--- a/sp/src/vscript/vscript_squirrel.cpp
+++ b/sp/src/vscript/vscript_squirrel.cpp
@@ -1586,7 +1586,7 @@ SQInteger add_stub(HSQUIRRELVM vm)
        sq_getinstanceup(vm, 1, (SQUserPointer*)&classInstanceData, 0);

        ScriptVariant_t var;
-       getVariant( vm, 1, var );
+       getVariant( vm, -1, var );

        if (classInstanceData &&
                classInstanceData->instance &&
@@ -1611,7 +1611,7 @@ SQInteger sub_stub(HSQUIRRELVM vm)
        sq_getinstanceup(vm, 1, (SQUserPointer*)&classInstanceData, 0);

        ScriptVariant_t var;
-       getVariant( vm, 1, var );
+       getVariant( vm, -1, var );

        if (classInstanceData &&
                classInstanceData->instance &&
@@ -1636,7 +1636,7 @@ SQInteger mul_stub(HSQUIRRELVM vm)
        sq_getinstanceup(vm, 1, (SQUserPointer*)&classInstanceData, 0);

        ScriptVariant_t var;
-       getVariant( vm, 1, var );
+       getVariant( vm, -1, var );

        if (classInstanceData &&
                classInstanceData->instance &&
@@ -1661,7 +1661,7 @@ SQInteger div_stub(HSQUIRRELVM vm)
        sq_getinstanceup(vm, 1, (SQUserPointer*)&classInstanceData, 0);

        ScriptVariant_t var;
-       getVariant( vm, 1, var );
+       getVariant( vm, -1, var );

        if (classInstanceData &&
                classInstanceData->instance &&

@z33ky
Copy link
Author

z33ky commented Nov 17, 2024

CAnimEventTInstanceHelper::Set() also has a broken assignment to "options", which is a const char*. It will refer to the temporary string in the ScriptVariant_t. FWIW there is a DevWarning about the deprecation of the animevent_t metamethods.

script print(Quaternion()) seems to return uninitialized or invalid values, so something there seems broken, too.
I'm not quite sure how to fix CScriptQuaternionInstanceHelper::Add(). I think it's the only one of its kind, so I don't have a reference to go by. Currently it seems to try and return itself (like a += in C++), though it seems that does not happen properly, as printing it returns something like ((null, 0x568d04f0)) instead of Quaternion(...). I guess you can't just cast the pointer to HSCRIPT, as it's doing, and that's it.
It'd be better to change the signature of Add() & co. to bool Add(void *p, ScriptVariant_t &variant, ScriptVariant_t &result) (or use variant as in & out) so it doesn't need to awkwardly use a static ScriptVariant_t to return, though since basically nothing is using it, and what does is broken, maybe removing it is more appropriate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants