-
-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace List
with LocalVector
if used as simple buffer
#99936
base: master
Are you sure you want to change the base?
Conversation
This needs some checking to ensure these are safe uses, the swap is only valid if the container size is not changed while iterating, |
Also important to evaluate is the tradeoff between locality and reduced overhead in memory on one hand, and the cost of reallocation and moving of data on the other One of the big benefits of |
My bad, that's a queue. Double checked, no other resize during iterating.
All these containers gather value by If performance turns out to be problem, we can call |
We should be using reserve everywhere that we can. I.e. for places where we know the total number of elements, reserve exactly enough for the total number of elements, and where we only know a possible range, we should allocate enough to cover the range. |
That should be another PR I guess, we should check all |
I imagine lawnjelly's It allocates on the stack, avoiding a heap allocation. It would be useful if |
Just see #99328 and I think I have to clarify something with this performance tag. The first priority of this PR is not to improve performance (If these My major concern here is So this should be considered a codestyle stuff first, and I did some test to show it is not going to cause systematic regression. MicroBenchmark
| relative | ns/op | op/s | err% | total | benchmark
|---------:|--------------------:|--------------------:|--------:|----------:|:----------
| 100.0% | 121.56 | 8,226,243.55 | 0.1% | 0.01 | `List itemsize: 4 length: 1 sort: 0`
| 207.7% | 58.53 | 17,085,847.77 | 0.1% | 0.01 | `LocalVector itemsize: 4 length: 1 sort: 0`
| 100.0% | 484.70 | 2,063,120.98 | 0.2% | 0.01 | `List itemsize: 4 length: 7 sort: 0`
| 195.7% | 247.67 | 4,037,581.08 | 0.1% | 0.01 | `LocalVector itemsize: 4 length: 7 sort: 0`
| 100.0% | 3,865.53 | 258,696.92 | 0.2% | 0.05 | `List itemsize: 4 length: 63 sort: 0`
| 812.5% | 475.78 | 2,101,827.68 | 0.1% | 0.01 | `LocalVector itemsize: 4 length: 63 sort: 0`
| 100.0% | 34,327.83 | 29,130.88 | 1.1% | 0.41 | `List itemsize: 4 length: 511 sort: 0`
| 3,671.7% | 934.93 | 1,069,603.20 | 0.7% | 0.01 | `LocalVector itemsize: 4 length: 511 sort: 0`
| 100.0% | 254,799.90 | 3,924.65 | 0.2% | 3.05 | `List itemsize: 4 length: 4095 sort: 0`
| 8,035.9% | 3,170.79 | 315,378.95 | 1.0% | 0.04 | `LocalVector itemsize: 4 length: 4095 sort: 0`
| 100.0% | 128.63 | 7,774,033.51 | 0.2% | 0.01 | `List itemsize: 24 length: 1 sort: 0`
| 174.8% | 73.58 | 13,590,218.24 | 0.0% | 0.01 | `LocalVector itemsize: 24 length: 1 sort: 0`
| 100.0% | 508.46 | 1,966,727.22 | 0.2% | 0.01 | `List itemsize: 24 length: 7 sort: 0`
| 163.0% | 311.90 | 3,206,153.31 | 0.1% | 0.01 | `LocalVector itemsize: 24 length: 7 sort: 0`
| 100.0% | 4,053.12 | 246,723.21 | 0.3% | 0.05 | `List itemsize: 24 length: 63 sort: 0`
| 368.0% | 1,101.36 | 907,966.79 | 0.2% | 0.01 | `LocalVector itemsize: 24 length: 63 sort: 0`
| 100.0% | 32,916.59 | 30,379.81 | 0.2% | 0.39 | `List itemsize: 24 length: 511 sort: 0`
| 543.4% | 6,057.23 | 165,092.06 | 1.3% | 0.07 | `LocalVector itemsize: 24 length: 511 sort: 0`
| 100.0% | 284,490.54 | 3,515.06 | 0.2% | 3.40 | `List itemsize: 24 length: 4095 sort: 0`
| 680.3% | 41,816.05 | 23,914.26 | 0.5% | 0.50 | `LocalVector itemsize: 24 length: 4095 sort: 0`
| 100.0% | 121.97 | 8,198,987.54 | 0.1% | 0.01 | `List itemsize: 4 length: 1 sort: 1`
| 197.9% | 61.64 | 16,223,083.55 | 0.1% | 0.01 | `LocalVector itemsize: 4 length: 1 sort: 1`
| 100.0% | 561.59 | 1,780,657.10 | 0.6% | 0.01 | `List itemsize: 4 length: 7 sort: 1`
| 219.0% | 256.46 | 3,899,191.63 | 0.1% | 0.01 | `LocalVector itemsize: 4 length: 7 sort: 1`
| 100.0% | 4,212.65 | 237,380.50 | 0.4% | 0.05 | `List itemsize: 4 length: 63 sort: 1`
| 678.8% | 620.59 | 1,611,370.31 | 0.2% | 0.01 | `LocalVector itemsize: 4 length: 63 sort: 1`
| 100.0% | 38,429.21 | 26,021.87 | 0.2% | 0.46 | `List itemsize: 4 length: 511 sort: 1`
| 1,569.3% | 2,448.83 | 408,357.83 | 0.5% | 0.03 | `LocalVector itemsize: 4 length: 511 sort: 1`
| 100.0% | 309,527.59 | 3,230.73 | 0.2% | 3.70 | `List itemsize: 4 length: 4095 sort: 1`
| 1,573.5% | 19,671.17 | 50,835.82 | 0.3% | 0.23 | `LocalVector itemsize: 4 length: 4095 sort: 1`
| 100.0% | 128.69 | 7,770,760.97 | 1.3% | 0.01 | `List itemsize: 24 length: 1 sort: 1`
| 154.5% | 83.30 | 12,004,334.72 | 4.2% | 0.01 | `LocalVector itemsize: 24 length: 1 sort: 1`
| 100.0% | 738.00 | 1,355,013.55 | 1.8% | 0.01 | `List itemsize: 24 length: 7 sort: 1`
| 138.9% | 531.46 | 1,881,610.87 | 0.2% | 0.01 | `LocalVector itemsize: 24 length: 7 sort: 1`
| 100.0% | 6,569.49 | 152,218.86 | 0.1% | 0.08 | `List itemsize: 24 length: 63 sort: 1`
| 150.1% | 4,377.90 | 228,420.02 | 0.2% | 0.05 | `LocalVector itemsize: 24 length: 63 sort: 1`
| 100.0% | 69,331.06 | 14,423.55 | 0.2% | 0.83 | `List itemsize: 24 length: 511 sort: 1`
| 132.0% | 52,521.50 | 19,039.82 | 0.2% | 0.63 | `LocalVector itemsize: 24 length: 511 sort: 1`
| 100.0% | 724,062.43 | 1,381.10 | 0.1% | 8.65 | `List itemsize: 24 length: 4095 sort: 1`
| 123.5% | 586,161.22 | 1,706.02 | 0.1% | 7.01 | `LocalVector itemsize: 24 length: 4095 sort: 1` Source CodeJust try to run Using this nice small lib nanobench.h #define ANKERL_NANOBENCH_IMPLEMENT
#include "nanobench.h"
#include "core/templates/list.h"
#include "core/templates/local_vector.h"
#include <ostream>
#include <sstream>
#include <vector>
template<typename T>
class Test
{
T placehold;
List<T> list;
LocalVector<T> vector;
public:
void bench_list(const std::vector<T> &source, bool sort)
{
for (auto &item : source)
{
list.push_back(item);
}
if (sort)
{
list.sort();
}
for (auto &item : list)
{
placehold = item;
}
}
void bench_local_vector(const std::vector<T> &source, bool sort)
{
for (auto &item : source)
{
vector.push_back(item);
}
if (sort)
{
vector.sort();
}
for (auto &item : vector)
{
placehold = item;
}
}
static void bench(bool sort)
{
for (int i = 1; i < 4096 + 1; i *= 8)
{
int n = i == 1 ? 1 : i-1;
std::ostringstream name;
name << "itemsize: " << sizeof(T) << " length: " << n << " sort: " << sort;
std::vector<T> source(n);
auto bench = ankerl::nanobench::Bench()
.minEpochIterations(1000)
.relative(true);
bench.run("List " + name.str(), [&]{
Test test;
test.bench_list(source, sort);
});
bench.run("LocalVector " + name.str(), [&]{
Test test;
test.bench_local_vector(source, sort);
});
}
}
};
void mytest(void)
{
Test<int>::bench(false);
Test<Variant>::bench(false);
Test<int>::bench(true);
Test<Variant>::bench(true);
} |
Add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is, generally, a great idea, and pretty unlikely to cause performance regressions. Linked lists are rarely useful. In fact, I would advocate to rename List
to LinkedList
to avoid accidental use (e.g. by Java folks) because it is rarely more efficient than a linear LocalVector
. LocalVector
should be the go-to default, and the use of LinkedList
should be well-proven to be faster than LocalVector
in each specific case, documented next to the variable to justify its use.
That being said — this is a pretty big change, and I gotta say if I were the code maintainer I would be uncomfortable to merge it because of potential regressions. I would therefore advocate to split the PR and address each use separately, to make them small, comfortable and somewhat tested chunks. It's annoying, but it's safe.
@@ -1582,13 +1583,6 @@ void AnimationNodeStateMachine::remove_transition_by_index(const int p_transitio | |||
Transition tr = transitions[p_transition]; | |||
transitions.write[p_transition].transition->disconnect("advance_condition_changed", callable_mp(this, &AnimationNodeStateMachine::_tree_changed)); | |||
transitions.remove_at(p_transition); | |||
|
|||
Vector<String> path_from = String(tr.from).split("/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are removing this because it is unused? Shouldn't that be a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, especially when I see how fast codestyle PRs are merged :)
Yes, but it is already split from a bigger one :) Despite being somewhat scattered, these changes are essentially the same pattern. Safety should probably be ok, I double checked they follow the restriction on top, but I cannot prove that to you (unless you read the code). I'm waiting to see if #100251 can be merged efficiently. If that can not, I'd better give up. |
A byproduct when I go through
List
. These list are used as simple buffer to store value temporarily, which means:push_back
.List
.So I change them to
LocalVector
, as they do not depend on container implementation.There are other kinds of usages which can be replaced, but need some prepare first:
LocalVector
.pop_back()
method forLocalVector
.Queue
?