Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize StringBuilder by using LocalVector instead of Vector. #99775

Merged

Conversation

Ivorforce
Copy link
Contributor

@Ivorforce Ivorforce commented Nov 28, 2024

Simple, unopinionated optimization of StringBuilder that's invisible to its callers.
This PR optimizes:

  • calls like builder.append("test"), which no longer call strlen() because the size is known statically.
  • as_string is now better optimized, regarding the iteration.
  • LocalVector instead of Vector foregoes COW overhead (lifted from Optimize SceneTreeEditor::_update_node_tooltip() #97777).

This should slightly improve performance already, and should be easier to merge than #77158 because it does not modify any other parts of the code, and #97777 because it does not change assumptions about the strings.
I especially opted not to remove the explicit call to String(buffer, string_length) in the end because it does some error checking that should arguably stay - or at least, it should be the task of another PR to remove it.

I do think we should try optimizing strings further - especially with regards to large strings being checked for strlen quite repeatedly because of NULL termination - but this will be done in follow-up PRs. Especially, follow-up PRs will be able to call append(ptr, len) due to this PR if they already know the size.

@Ivorforce Ivorforce requested a review from a team as a code owner November 28, 2024 00:29
@Ivorforce Ivorforce force-pushed the string-builder-template-append branch 6 times, most recently from 5f2f32a to 4c5bfd0 Compare November 28, 2024 01:08
@clayjohn
Copy link
Member

After moving to LocalVectors, almost all of the cost of StringBuilder is from the call to String(buffer, string_length). In my test project that compiles 300 shaders I got roughly:
190 ms in master
90 ms using LocalVector instead of Vector
30 ms using String instead of raw buffer + reserving space in the local vectors

Given that, I suspect that this code will still be in the 80-90ms range, although testing is necessary to confirm of course. What error checks do you think are necessary?

@Ivorforce
Copy link
Contributor Author

Ivorforce commented Nov 28, 2024

What error checks do you think are necessary?

I had a pretty in-depth look at ustring.cpp yesterday (for some reason, this discussion sucked me in). copy_from (called from the constructor) runs 3 checks that are costly:

  1. Find the length of the string, clipped to p_clip_to if supplied.
  2. Run a dubious cast to uint8_t, possibly preventing autovectorization.
  3. Include an if branch on '\0\' in the critical loop, preventing autovectorization.

Number 2) is wholly unnecessary, as indicated by my test: onlinegdb test code. Casting rules of the C++ spec ensure this should not be different under any compiler, so we should be able to refactor the ternary with a simple static_cast<uint8_t>(p_cstr[i]). A short test indicates this check is optimized away by some compilers (e.g. clang) but not by others (MSVC).

This leaves number 1) and number 3).
Number 1) clips the string to the first '\0' character it finds.
Number 3) again checks for '\0', and if found, prints an error.

It is impossible to trigger the Number 3) check, because the 1) check will terminate the string early anyway. Though the reliance on NULL termination and repeated length checks is another discussion we should have.

To answer your question directly: After going in-depth, I don't think any of these are necessary, but whether or not they are should be discussion of other PRs.

@Ivorforce
Copy link
Contributor Author

I removed the static-string length parts. They weren't working anyway yet, and I've been moving to a more well-rounded approach to avoiding strlen calls. In any case, this should make the PR even less contentious to merge.

@Ivorforce Ivorforce changed the title Optimize StringBuilder append for static strings, and as_string(). Optimize StringBuilder by using LocalVector and improving the as_string assignment loop. Dec 7, 2024
Copy link
Member

@hpvb hpvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, straightforward improvement! Thanks!

@hpvb hpvb modified the milestones: 4.x, 4.4 Dec 7, 2024
Copy link
Contributor

@kiroxas kiroxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the changes in the cpp file has any impact on performance. But changing to local Vector is definitely a good thing !

@Ivorforce Ivorforce force-pushed the string-builder-template-append branch from 77b1a06 to ebf4b31 Compare December 8, 2024 15:25
@Ivorforce
Copy link
Contributor Author

Ivorforce commented Dec 8, 2024

Not sure the changes in the cpp file has any impact on performance. But changing to local Vector is definitely a good thing

You're right. I just did some more tests. I think something about using buffer pointers directly confuses the compiler. In some instances i even got worse results by removing the running index. In any case, I've just removed the .cpp changes, just to be agnostic. It should be a trivial merge now.

@Ivorforce Ivorforce changed the title Optimize StringBuilder by using LocalVector and improving the as_string assignment loop. Optimize StringBuilder by using LocalVector instead of Vector. Dec 8, 2024
@Ivorforce Ivorforce force-pushed the string-builder-template-append branch from ebf4b31 to 8df2dbe Compare December 8, 2024 15:39
@clayjohn
Copy link
Member

clayjohn commented Dec 9, 2024

Let's go ahead and merge this for now as it makes a huge difference and is totally safe.

However, I do think we can go ahead with the changes from #97777

I don't think the extra validation from calling String() is relied upon, so appending directly into a String will make this way faster (in my testing it made StringBuilder essentially as fast as StringBuffer for shader compilation). Additionally, the addition of the clear() method allows us to use thread_local versions of this for places where it will be called at run time to allow us to avoid any allocations at run time (after the first few calls of course).

To be clear. I think we should merge this and then evaluate the other improvements in follow up PRs (including potentially just replacing the current implementation with what we currently have in StringBuffer)

@Repiteo Repiteo merged commit 3ef8e83 into godotengine:master Dec 10, 2024
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Dec 10, 2024

Thanks!

@Ivorforce Ivorforce deleted the string-builder-template-append branch December 10, 2024 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants