Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft-deprecate the use of implicit reliance on NULL-terminated strings. #99806

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Ivorforce
Copy link
Contributor

@Ivorforce Ivorforce commented Nov 28, 2024

"MVP" of addressing godotengine/godot-proposals#11249 and godotengine/godot-proposals#11258. The motivation for this direction are discussed there (and linked issues). This PR does not change assumptions about user-supplied strings; NULL termination checks are just offloaded to compile-time for static strings supplied from the codebase.

This change is toggled with the c++-define GODOT_SHOW_STR_DEPRECATIONS (off by default). For unowned string references, StrRange can be used, which avoids allocation overhead. This is already employed in sprintf and operator+ in this PR, because there are a lot of uses of these functions.

This change is as minimal as I could realistically get it to be: ustring has quite a lot of changes, but most other files work exactly as before. This changes when GODOT_SHOW_STR_DEPRECATIONS is toggled: Suddenly, the console explodes into a flurry of warnings, because strlen is used in a place where it may be inappropriate and wastes CPU cycles.

This PR should already slightly improves performance by letting callers pass their string[] as string[] with known lengths, rather than having it recomputed immediately. Every change we will make to get rid of another GODOT_SHOW_STR_DEPRECATIONS deprecation warning will improve the engine speed further. Every function we can make refactor to use StrRange instead of String will reduce allocations and improve engine speed.

Please, participate in lively discussion! I realize merging this is kind of a large ask, especially from a pretty fresh contributor like myself. I tried to make the change as unopinionated as possible, and focused on just improving speed and conditions for future PRs to improve speed further.

Still, I personally think this direction is unavoidable sooner or later, if we want to make the best use of clock cycles for the hard parts elsewhere. Also, i have heard it mentioned a few times that the Godot source code has a side-mission as a learning tool - making computation explicit to avoid accidental promotions and data copies is very important in clean and fast code :)

@Ivorforce Ivorforce requested review from a team as code owners November 28, 2024 19:40
@Ivorforce Ivorforce force-pushed the string-deprecate-null-term branch 15 times, most recently from 1b8e3da to 1cb5c1e Compare November 28, 2024 21:29
@Ivorforce Ivorforce requested a review from a team as a code owner November 28, 2024 21:29
@Mickeon Mickeon added this to the 4.x milestone Nov 28, 2024
@Mickeon
Copy link
Contributor

Mickeon commented Nov 28, 2024

This feels somewhat related to #84680 @Repiteo

@Ivorforce Ivorforce force-pushed the string-deprecate-null-term branch from 1cb5c1e to 9bc994c Compare November 28, 2024 21:33
@clayjohn
Copy link
Member

Sounds interesting! Can you share the results of your profiling so we have an idea of what kind of performance differential to expect?

@Ivorforce
Copy link
Contributor Author

Sounds interesting! Can you share the results of your profiling so we have an idea of what kind of performance differential to expect?

Will do once I get everything rolling again! It's still stuck between not compiling or causing segfaults, hopefully to resolve soon :)

@clayjohn clayjohn marked this pull request as draft November 28, 2024 22:56
@Ivorforce Ivorforce force-pushed the string-deprecate-null-term branch from da45a5d to af0de09 Compare December 1, 2024 16:03
@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 1, 2024

Tests are finally passing (locally anyway)!

I managed to salvage this PR by doing less at once. The core is still the same - implicit conversions from char * to String are soft-deprecated (and implicit conversions from `CharString and Char16String to ptr), while implicit construction from string literal to String remains legal.

The strlen calls in the 'String from String Literal' constructors are resolved at compile time, as can be seen here: https://godbolt.org/z/aTv61TT1K. I first tried to use the len template parameter, but realized that some string literals are implicitly ended with NULL while others aren't. Calling strlen at compile time should be the safest route to go, and it shows because without it the tests weren't passing 😅

All in all, this PR should slightly speed up conversion from string literal to String, and allow us to move more toward explicitness and no NULL termination reliance in Strings. When / If it gets merged, I will likely add follow-up PRs to explicitize implicit conversions throughout the codebase.

I recommend merging #99816 and #99817 before this one because this change includes the former ones, making the changelog more granular.

Edit: One test is failing, I think that's just caches though?

@Ivorforce Ivorforce force-pushed the string-deprecate-null-term branch from af0de09 to be06698 Compare December 1, 2024 17:36
@Repiteo
Copy link
Contributor

Repiteo commented Dec 1, 2024

Holy shit, you have no idea how cathartic it is to see someone else take an interest in this topic. I'm 100% onboard with modernizing our strings to not rely on null-termination, especially considering \0 is a perfectly valid Unicode codepoint (for better or for worse)

@Repiteo Repiteo removed request for a team December 1, 2024 19:40
@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 2, 2024

I made a synthetic performance benchmark: https://onlinegdb.com/rzGUA045Ry

This PR is not about performance, but I wanted to check anyway, since some string creators already automatically select the static length constructor with this PR. The speed gain is not very consistent (it's more consistent offline); unscientifically i would estimate it's about 9% for short length strings, 5% for medium length strings, and 4% for long length strings such as from canvas.glsl.gen.h.

@fire
Copy link
Member

fire commented Dec 2, 2024

I'll bump the build, but I don't think it's a cache thing.

drivers.windows.editor.x86_64.lib(feed_effects.windows.editor.x86_64.obj) : error LNK2019: unresolved external symbol "public: __cdecl String::String(char const *)" (??0String@@QEAA@PEBD@Z) referenced in function "public: __cdecl GLES3::FeedEffects::FeedEffects(void)" (??0FeedEffects@GLES3@@QEAA@XZ)
bin\godot.windows.editor.x86_64.exe : fatal error LNK1120: 1 unresolved externals

@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 2, 2024

I'll bump the build, but I don't think it's a cache thing.

Thanks! But I don't think that helped. feed_effects.cpp wasn't recompiled in these builds, and since it's referencing a function I removed, I suspect it's failing because MSVC / scons fails to detect it needs to recompile the file in question.
In lack of a better alternative, I'll touch the module to force it to rebuild, and if it succeeds, I'll remove the unneeded change again.

Edit: Looks like i was right, the builds are passing with the cache for feed_effects wiped. Unfortunately, the cache resets and the file fails to build again if I remove the change. I'll keep the file touched for the build to pass. I don't think there's a way around keeping the comment in-place with a merge; otherwise we'll just get the same issue again with branches that do not have the changes from this PR yet and thus experience incompatibility across this cache. I'm in talks with SCons maintainers to look into this issue.

@Ivorforce Ivorforce force-pushed the string-deprecate-null-term branch from be06698 to bdee697 Compare December 2, 2024 12:37
@Ivorforce Ivorforce requested a review from a team as a code owner December 2, 2024 12:37
@Ivorforce Ivorforce force-pushed the string-deprecate-null-term branch 2 times, most recently from d919522 to 1a30408 Compare December 2, 2024 13:46
…icitly parse the given string to find its end. Add known-length constructors and explicit from_c_str static methods.
@Ivorforce Ivorforce force-pushed the string-deprecate-null-term branch from 1a30408 to 10e14e0 Compare December 2, 2024 13:48
@adamscott adamscott changed the title Soft-deprecate the use of implicit reliance on NULL-terminated strings. Soft-deprecate the use of implicit reliance on NULL-terminated strings. Dec 2, 2024
@adamscott
Copy link
Member

@Calinou We don't have a pure String benchmark test, haven't we?

@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 3, 2024

Existing benchmarks are a good idea actually! I tested this branch against Godot 4.3 (old) and #99816 and got these results:

results-plot

Results are mostly beneffitting from #99816, since this PR is based on that one (and a few others that were merged since 4.3, like #92548). Compile-time string allocation is not accelerated from GDExtension (yet) since no interface for it is available. Still, there is some gain from those static strings too.

But yeah, again, this is not primarily a speed PR. Some static strings are likely to benefit, but mostly it's about being explicit with NULL termination, which should help us avoid performance issues and potential bugs.

@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 7, 2024

Converting to draft, waiting for #100132 to be merged first, which is less opinionated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants