-
-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize String
with Small String Optimization (SSO)
#11241
Comments
Remember that Strings in Godot internally use UTF-32 for its fixed width since 4.0, so the threshold of short string optimization would likely be 8 characters for 32 bytes. I'm surprised to see such a spike of string resize values at 24 and 25 too. Are these values in bytes or character counts? |
It is the value passed to Print is so slow I haven't test real workloads, will update the results if I find a way. |
I was interested in how string concatenation can lead to such performance problems (as discussed in godotengine/godot#77158), so I had a look at the relevant source code myself. One example is the use of There are a few other strange choices, for example the use of 3 different (COW, no less) No wonder string handling in Godot is so slow! I feel like if we want to make actual performance improvements to string related code, we should:
I think |
I agree with most of what @Ivorforce says, with one exception:
This would reduce the number of Also, a small issue with not using |
I think this is a good idea. In my profiling I found that StringBuffer was still slightly faster than using String directly, even for larger strings like what are used in the shader compiler. I suspect it is due to StringBuffer explicitly Po2 resizing the String ahead of time. When using String directly we do end up doing Po2 allocations since it uses Vector internally, but there is a lot of bookkeeping around it. By manually doing a Po2 resize, we skip that bookkeeping. |
I saw your tests, and was impressed by how much of a difference it made! I think it will be less pronounced with the giant ~20k char strings we're dealing with in the shader code though, since every reallocation thereafter would lead to ~20kb RAM alloc and dealloc. But I'm open to test!
Luckily it is possible! Here's some code for that: template <size_t len>
UnownedString(const char (&p_cstring)[len]) {
return UnownedString { p_cstring, len };
} |
I've previously explained in the RocketChat, but this only applies for WebAssembly, since it uses a custom memory allocator (provided by Emscripten). Every platform that uses a platform-provided reallocator, or otherwise has access to memory mapping, will not actually deallocate the memory before reallocating. Instead, it will simply remap all the pages other than the first and last ones into new addresses, to create the illusion of a continuous span of memory addresses, even though it's only continuous in virtual address space. If a page size is 4kb (the default for Linux), then a ~20kb realloc is actually just 5 page moves, each being hardly more expensive than a function call. |
Very interesting! I definitely want to test this. But yes, I agree that means there's all the more reason to question the current setup of the code. |
Interesting idea, I think that should be paired with some stack-based string (e.g. Actually, I do have thought about move all short string to stack (and facing the risk of stackoverflow :) ).
There are, actually tons of them. For example just take a look at |
Yeah, I can see how those uses would be intimidating to adjust, 'just' for a bit of String optimization. I do think it would be doable, but it would involve quite a bit work. |
By the way, should we change the title? I think the discussion has been far beyond SSO. |
We diverged a bit because it is an interesting topic, but I think it would be best to keep this proposal for |
I just realized - the best spot to implement this addition would probably be @YYF233333 Do you think it would be possible to repeat your experiment hooking into |
Make sense to me, but for Edit: This time count size by byte. Truncate at 128 bytes (the last column represents all large allocation). |
Thank you! |
Awesome. Looks like |
Describe the project you are working on
Godot Editor
Describe the problem or limitation you are having in your project
String
usually spend a lot of time doing heap alloc stuff (inresize
). This phenomenon can be easily detected by profiling editor workload (add/remove node, undo/redo, etc) since editor utilizesString
a lot.Describe the feature / enhancement and how it helps to overcome the problem or limitation
As what was discussed in #77158, Small String Optimization (SSO) improve string performance by eliminating allocation for small/short strings.
To prove we do have small strings, I hook
String::resize
to get the distribution of string size, the results are like below:Collected from opening project manager and opening an empty project. Note that last column represent all size beyond 64.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
Store string shorter than certain size (e.g. 32 bytes) direct in
String
struct. Only alloc if it grow large enough. Should take care of the COW semantic.If this enhancement will not be used often, can it be worked around with a few lines of script?
No, it is core.
Is there a reason why this should be core and not an add-on in the asset library?
It is core.
The text was updated successfully, but these errors were encountered: