-
-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the use of StringBuilder with StringBuffer #77158
base: master
Are you sure you want to change the base?
Conversation
218f001
to
49f2d14
Compare
I tested out of interest in godbolt, and since c++17 it looks like you can omit the brackets, i.e. I think (master at least) is c++17, so that might be worth doing. This is called "class template type deduction". As P.S. Also for things like this we should ask @reduz as he likes to be involved in such decisions, and it can save some effort if he prefers different approach. 👍 Also as @SlugFiller pointed out, we should benchmark this with some longer / variable sized strings to make sure it's always a win as he was benchmarking with small strings (5 chars). |
core/string/string_buffer.h
Outdated
@@ -35,7 +35,7 @@ | |||
|
|||
template <int SHORT_BUFFER_SIZE = 64> | |||
class StringBuffer { | |||
char32_t short_buffer[SHORT_BUFFER_SIZE]; | |||
char32_t short_buffer[SHORT_BUFFER_SIZE] = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the StringBuffer
class, but I would double check this is required.
This is kind of controversial - it might lead to needless extra processing if used in a bottleneck, if not optimized out. Profiling / benchmarking may help, maybe there's no cost in practice but it would be nice to show this first. 🤷♂️
Some people often do this kind of thing thinking it is somehow "safer", but this is an illusion. Providing you aren't reading from uninitialized data there's no problem. Personally if it's a small structure and not a bottleneck I'd be more likely to zero. If a bottleneck, no. If a large buffer no (because zeroing commits the pages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for review! I was unsure about this line myself so I am thankful that you pointed this out for me.
you are right about the observation -- this line felt "safe" to me since when I added a const variation of as_string
I was confused by how length of string are treated in a way.
I will work on this as soon as I have time available.
49f2d14
to
5db8aa1
Compare
I tried to measure performance difference between I'm not really sure there results are trustworthy, since I'm unfamiliar with the good basics of benchmarking. could be taken with a huge grain of salt, but these are the result that I observed; some info about data:
Debug Build
Release Build
|
Just to recap it seems like to know how to proceed we need to do more benchmarking, just in case Either using |
I am sincerely sorry for late response, I've been busy lately with my work. I feel like I should've asked someone to supervise my approach before doing this 😅 I tried my best to make accurate measurement, and see if I can improve performance here and there. The reason why To confirm my suspicion, I benchmarked both this was the result before the change: I updated this was the result after change in next, the performance difference of godot/core/string/string_buffer.h Lines 100 to 101 in 5c2295f
because this constantly dereferences pointer to see if it reached the end of string (instead of relying on the result of This is the compiler output of said line for release build: replacing line above with simple for-loop using the result of this was the result after change in This was the compiler output after the for-loop change, which is doing what I wanted it to do: |
5db8aa1
to
4b565e3
Compare
4b565e3
to
c4199d3
Compare
Great work! 👍 |
c4199d3
to
a0a689a
Compare
I am sorry but I just don't see how this can have better performance than StringBuilder, specially if some optimizations are made to StringBuilder that are pending (moving the internal vectors to a LocalVector with a reserve usage). The approach in StringBuilder pretty much does not allocate memory until the end and it does not do unnecesary memory copying. If anything, to me it should be the other way around (no idea why StringBuffer exists, which has a worse approach). |
@reduz You can try benchmarking. The current benchmarks show a massive difference. Also, even if there was a different yet more efficient approach, it could simply replace I could explain the theory of why |
We do try to keep a good explanation of the reasoning for PRs available from the PR, simply so it is easier for reviewers to come in and read all the relevant information rather than just effectively a "trust me, this is better". Afaik the original discussion took place on rocket chat, I'd forgotten the details myself as it was a while ago: |
a0a689a
to
15478fc
Compare
@reduz sorry for taking so long to respond! I wished to post a response sooner, but I was busy with my work. The StringBuffer performs better compared to StringBuilder because the concatenation is performed in a simple loop, allowing for the compiler to optimize. this comes with the cost of possibly allocating excess memory in some situations. I admit that I did not measure the amount of memory allocated for each case, only the speed in a controlled environment. I would like to measure the memory when I have time. Given that StringBuilder is used to generate the compiled shader code from the visual shader editor, I hope that this provides a better experience to people who like the feature, although I do think that the gain is insignificant aside from this particular case. If possible, I would love to get some advice on where to go with this PR. I think from the performance perspective this is better suited for general use, but I am not yet familiar with the codebase so I may be missing something important. |
Optimize how and what? |
@AThousandShips After re-reading my previous comment I realized that I did not provide any substance on my claim, I apologize. I will try to explain as much as I can I was thinking about the when appending I hope this answers the question, and let me know if I missed anything. |
15478fc
to
a4687e6
Compare
- Replaces the use of StringBuilder with StringBuffer. - updates StringBuffer to be more performant on certain cases.
a4687e6
to
78ac391
Compare
Come across this accidentally, it seems this PR has a simillar goal with #97777. Firstly I have a question:
Do you mean this? I think And not relevant but I think both of them can utilize Back to
The problem of So in summary, I think The inline cache itself is a good thing and maybe we can add that to |
It is annoying for me to keep having to explain it, but I'll do it one more time.
Where it goes wrong is in the "take a bunch" bit. It doesn't accept an iterator that returns the strings on the fly. Rather, the strings are added one by one, before being iterated at the generation step. And the way it "adds" a string is by inserting it into a The issue is that appending an element to a As a result, every single instruction of the accumulation step is overhead. By default, I honestly, swear to Carmack, thought this is obvious. But if it wasn't, then here, a full explanation as to why The only scenario By contrast, P.S. Making the StringBuilder str1;
StringBuilder str2;
str1 += 'prefix_';
str2 += 'different_';
if (strs.size() > 0) {
str1 += strs[0];
str2 += strs[0];
}
for (int i = 1; i < strs.size(); i++) {
str1 += '_';
str2 += '_';
str1 += strs[i];
str2 += strs[i];
}
str1 += '_suffix';
str2 += '_and_unique';
String res1 = str1;
String res2 = str2; // If static threadlocal buffers are used, this would be the same as res1, even though it's not supposed to be But this exact same solution could be applied to |
@YYF233333 Thanks for bringing attention to this. I vaguely remember the discussion from last year on the contributor chat about this, similar to here, the chat there was pretty divided on whether this change would or would not be faster. But notably, nobody tested the performance inside of Godot. The benchmarking done in this PR is helpful, but not conclusive since it is a synthetic benchmark. I profiled both the relevant code in this PR and the code in #97777 by running a scene that compiles 300 shaders in one frame. While #97777 is significantly faster than master (about 2x faster), this PR is faster than #97777 by about 3x. For compiling 300 GLES3 shaders in one frame master spends about 200 ms in StringBuilder, #97777 spends about 90 ms (almost all of that time is in Do you mind testing the same scene in #97777 with the code from this PR to compare performance? |
Firstly thanks for your detailed explanation @SlugFiller , I still have some point to discuss.
The problem is that iterator doesn't exist in the usecase, typically the string you want to concat are local variables/string literals that are distributed on stack. You need some bookkeeping, a builder, to gather them up. Bookkeeping do has overhead, so we use it only when simple
Sorry I forget we are still using
Problem is, The only usecase of So if the profile shows that |
That's expected and means probably the bookkeeping overhead is larger than resize overhead and we should use
I will if I have time. Edit: Test done. I cannot find performance difference between Rebase #97777 against latest master, merge this PR, and change only
P.S. This PR reminds me that I'm trapped by the old implementation. We should directly use a Again, I'm not against this class. I'm against its name and replacing all |
And my point was that the bookkeeping is equally expensive as
That, in and of itself, is not the issue. It merely contributes to the issue. The issue is that appending to the
You're suggesting replacing
It's not just the profile, although the profile is pretty conclusive. Logically, It's like replacing bubble sort with quicksort. There's no conceivable condition under which the prior would be preferable, other than maybe code simplicity. And |
Hi, I am sorry for the lack of activity on my end. I generally agree with the point made by @SlugFiller, and thank you for extremely thorough explanation.
I don't think we can use the
As far as I can tell, the I am honestly not sure why we are not doing the same validation check in either
I agree that the name I am struggling to understand the appeal of keeping This is another issue that I've been wondering about since I read the code of
|
...but adds an equal or larger amount of reallocations during the append phase, thereby removing any semblance of having an advantage or purpose.
I think there isn't. But in order for your example code to actually create garbage, you need to thrash the stack in between the If this was Rust or Swift, an escaping or linked lifetime guarantee could be required from the |
Realistically the end result of these conversations should be to remove either StringBuffer or StringBuilder, settle on an implementation of the remaining class, and ensure that class is named StringBuilder. godotengine/godot-proposals#11241 is interesting and may help many cases of String concatenation, but there is still value in a purpose-built class for concatenating strings that we know are valid in advance so we can avoid validating individual characters and avoid the bookkeeping that comes from resizing a Vector. |
I believe I have run enough tests to make a recommendation. I ran some of my own benchmarks against Pre-TestBefore starting, I patched
(I tested all 3 optimizations iteratively top to bottom, and each contributes to better scores). string_builder.h/**************************************************************************/
/* string_builder.h */
/**************************************************************************/
/* This file is part of: */
/* GODOT ENGINE */
/* https://godotengine.org */
/**************************************************************************/
/* Copyright (c) 2014-present Godot Engine contributors (see AUTHORS.md). */
/* Copyright (c) 2007-2014 Juan Linietsky, Ariel Manzur. */
/* */
/* Permission is hereby granted, free of charge, to any person obtaining */
/* a copy of this software and associated documentation files (the */
/* "Software"), to deal in the Software without restriction, including */
/* without limitation the rights to use, copy, modify, merge, publish, */
/* distribute, sublicense, and/or sell copies of the Software, and to */
/* permit persons to whom the Software is furnished to do so, subject to */
/* the following conditions: */
/* */
/* The above copyright notice and this permission notice shall be */
/* included in all copies or substantial portions of the Software. */
/* */
/* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, */
/* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF */
/* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. */
/* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY */
/* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, */
/* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE */
/* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */
/**************************************************************************/
#ifndef STRING_BUILDER_H
#define STRING_BUILDER_H
#include "core/string/ustring.h"
#include "core/templates/vector.h"
#include <core/templates/local_vector.h>
class StringBuilder {
struct Item {
String s;
const char *ptr;
size_t ptr_len;
};
uint32_t string_length = 0;
LocalVector<Item> strings;
public:
StringBuilder &append(const String &p_string);
StringBuilder &append(const char *p_cstring);
_FORCE_INLINE_ StringBuilder &operator+(const String &p_string) {
return append(p_string);
}
_FORCE_INLINE_ StringBuilder &operator+(const char *p_cstring) {
return append(p_cstring);
}
_FORCE_INLINE_ void operator+=(const String &p_string) {
append(p_string);
}
_FORCE_INLINE_ void operator+=(const char *p_cstring) {
append(p_cstring);
}
_FORCE_INLINE_ int num_strings_appended() const {
return strings.size();
}
_FORCE_INLINE_ uint32_t get_string_length() const {
return string_length;
}
String as_string() const;
_FORCE_INLINE_ operator String() const {
return as_string();
}
};
#endif // STRING_BUILDER_H
string_builder.cpp/**************************************************************************/
/* string_builder.cpp */
/**************************************************************************/
/* This file is part of: */
/* GODOT ENGINE */
/* https://godotengine.org */
/**************************************************************************/
/* Copyright (c) 2014-present Godot Engine contributors (see AUTHORS.md). */
/* Copyright (c) 2007-2014 Juan Linietsky, Ariel Manzur. */
/* */
/* Permission is hereby granted, free of charge, to any person obtaining */
/* a copy of this software and associated documentation files (the */
/* "Software"), to deal in the Software without restriction, including */
/* without limitation the rights to use, copy, modify, merge, publish, */
/* distribute, sublicense, and/or sell copies of the Software, and to */
/* permit persons to whom the Software is furnished to do so, subject to */
/* the following conditions: */
/* */
/* The above copyright notice and this permission notice shall be */
/* included in all copies or substantial portions of the Software. */
/* */
/* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, */
/* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF */
/* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. */
/* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY */
/* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, */
/* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE */
/* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */
/**************************************************************************/
#include "string_builder.h"
#include <string.h>
StringBuilder &StringBuilder::append(const String &p_string) {
if (p_string.is_empty()) {
return *this;
}
strings.push_back({p_string, nullptr, 0});
string_length += p_string.length();
return *this;
}
StringBuilder &StringBuilder::append(const char *p_cstring) {
size_t len = strlen(p_cstring);
if (len == 0) {
return *this;
}
strings.push_back({String(), p_cstring, len});
string_length += len;
return *this;
}
String StringBuilder::as_string() const {
if (string_length == 0) {
return "";
}
String string;
string.resize(string_length);
char32_t *buffer_ptr = string.ptrw();
for (uint32_t i = 0; i < strings.size(); i++) {
const auto& item = strings[i];
if (!item.ptr) {
// Godot string
const String &s = item.s;
memcpy(buffer_ptr, s.ptr(), s.length() * sizeof(char32_t));
buffer_ptr += s.length();
} else {
const char *s = item.ptr;
for (int32_t j = 0; j < item.ptr_len; ++j, ++buffer_ptr, ++s) {
*buffer_ptr = *s;
}
}
}
return string;
} The TestHere's the code I ran: test_main.cpp(templating should optimize away struct StrBuilder {
String x;
void append(const String& s) {
x += s;
}
String as_string() {
return x;
}
};
template <typename Builder>
void test(const String &s, int runs, int count) {
for (int r = 0; r < runs; ++r) {
Builder b;
for (int i = 0; i < count; ++i) {
b.append(s);
}
int l = b.as_string().length();
}
}
int test_main(int argc, char *argv[]) {
String s = String("A").repeat(1);
int runs = 100;
int count = 200;
test<StringBuilder>(s, runs, count);
auto t0 = std::chrono::high_resolution_clock::now();
test<StringBuilder>(s, runs, count);
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
test<StringBuffer<>>(s, runs, count);
t0 = std::chrono::high_resolution_clock::now();
test<StringBuffer<>>(s, runs, count);
t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
test<StrBuilder>(s, runs, count);
t0 = std::chrono::high_resolution_clock::now();
test<StrBuilder>(s, runs, count);
t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>(t1 - t0).count() << "us\n";
} I ran the test with 4 configurations, differing in Indeed, this is exactly what we see:
I hope it's clear what the reason for this is:
DiscussionI think this is enough data to make a judgement call. I propose the following:
Additionally, here are follow-ups that are likely to further optimize string concatenation:
|
"Performs badly" is a misphrasing here. The difference is in microseconds. It's on the level of measurement error, and is roughly 20 times less than the difference in the many appends version. To be complete, you should probably test many appends long length, or at least increase the run count until the run time is measured in seconds, not in microseconds. Or you can flip the stop condition, and measure number of runs that can be completed before a second elapses. This should somewhat reduce measurement error, although overhead from repeatedly getting the current time might take over in the case of a very short run. At any rate, even if we take these measurements to be perfectly accurate, with no errors introduced from resolution limitation of the system clock, being able to earn 50us for the risk of losing 1000us is not a proper tradeoff. It does not justify the use of |
Hrm, you are right, unfortunately this does cast a shadow on my conclusions. But i agree with clay that if we went there, to try to totally replace StringBuilder, StringBuffer should be renamed to StringBuilder, instead of renaming references. Also, StringBuffer using a short string cache is not suitable for building strings, since the data has to be copied over later anyhow. This should be addressed before proposing the switch. |
Urgh, I just now realized that With that in the picture as well, I wonder why |
String has a lot more bookeeping steps, look at the number of instructions in order to get to the Po2 resize logic. StringBuffer does the resize up front because it knows its going to need to resize a lot. Basically you skip all the string and CoW logic |
Replaces
StringBuilder
withStringBuffer
across codebases, which seems to be present in a codebase for quite a while and had similar functionality asStringBuilder
, but with improved performance as measured by the benchmark done by @SlugFiller.It's mostly a full-on search-and-replace, but there are few modification to make it compile as well.