Replace `List` with `LocalVector` if used as simple buffer #99936

YYF233333 · 2024-12-02T18:31:47Z

A byproduct when I go through List. These list are used as simple buffer to store value temporarily, which means:

They are local variables.
They get value by push_back.
Values are consumed by iterating the List.

So I change them to LocalVector, as they do not depend on container implementation.

There are other kinds of usages which can be replaced, but need some prepare first:

Same as above, but iterate from tail to head. Need a reverse iterator for LocalVector.
Used as stack. Need a pop_back() method for LocalVector.
Used as queue. Shall we implement a dedicated Queue?

AThousandShips · 2024-12-03T11:12:33Z

This needs some checking to ensure these are safe uses, the swap is only valid if the container size is not changed while iterating, ~~the code in Node for example looks unsafe~~

AThousandShips · 2024-12-03T11:46:33Z

Also important to evaluate is the tradeoff between locality and reduced overhead in memory on one hand, and the cost of reallocation and moving of data on the other

One of the big benefits of List is that changing the size reallocates no memory, so it would depend on the data type stored and the cost of repeatedly reallocating data depending on how the size of the container changes and the overhead there

YYF233333 · 2024-12-03T12:36:32Z

This needs some checking to ensure these are safe uses, the swap is only valid if the container size is not changed while iterating, the code in Node for example looks unsafe

My bad, that's a queue. Double checked, no other resize during iterating.

One of the big benefits of List is that changing the size reallocates no memory, so it would depend on the data type stored and the cost of repeatedly reallocating data depending on how the size of the container changes and the overhead there

All these containers gather value by push_back, for List it is an alloc per call. I think alloc times matters more than alloc size.

If performance turns out to be problem, we can call reserve().

clayjohn · 2024-12-03T21:43:09Z

We should be using reserve everywhere that we can. I.e. for places where we know the total number of elements, reserve exactly enough for the total number of elements, and where we only know a possible range, we should allocate enough to cover the range.

YYF233333 · 2024-12-04T06:41:54Z

We should be using reserve everywhere that we can. I.e. for places where we know the total number of elements, reserve exactly enough for the total number of elements, and where we only know a possible range, we should allocate enough to cover the range.

That should be another PR I guess, we should check all LocalVector usage, not just these. Will implement that if have time. Or you just prefer refactor these first?

stuartcarnie · 2024-12-04T06:47:52Z

I imagine lawnjelly's FixedArray would be beneficial in some cases, if the size is known:

https://github.com/godotengine/godot/blob/71bbe3b00c40986859045e0397f6a3afe311c3e2/core/templates/fixed_array.h

It allocates on the stack, avoiding a heap allocation. It would be useful if VectorView accepted it as it would mean there are numerous cases where it could be used to create things like uniform sets without allocating on the heap.

YYF233333 · 2024-12-05T13:35:35Z

Just see #99328 and I think I have to clarify something with this performance tag.

The first priority of this PR is not to improve performance (If these List cause problem we should have found them long ago).

My major concern here is List is generally not considered to play well with modern processors (cache stuff). If you use it, you probably have some special needs (e.g. insert in the middle). If not, like in this PR, the use of List may confuse someone when they touch the code.

So this should be considered a codestyle stuff first, and I did some test to show it is not going to cause systematic regression.

MicroBenchmark

push_back one by one from a std::vector and iterating through to do some nonsense stuff. Also test if you sort before iterate because many code do this.

| relative |               ns/op |                op/s |    err% |     total | benchmark
|---------:|--------------------:|--------------------:|--------:|----------:|:----------
|   100.0% |              121.56 |        8,226,243.55 |    0.1% |      0.01 | `List        itemsize: 4 length: 1 sort: 0`
|   207.7% |               58.53 |       17,085,847.77 |    0.1% |      0.01 | `LocalVector itemsize: 4 length: 1 sort: 0`
|   100.0% |              484.70 |        2,063,120.98 |    0.2% |      0.01 | `List        itemsize: 4 length: 7 sort: 0`
|   195.7% |              247.67 |        4,037,581.08 |    0.1% |      0.01 | `LocalVector itemsize: 4 length: 7 sort: 0`
|   100.0% |            3,865.53 |          258,696.92 |    0.2% |      0.05 | `List        itemsize: 4 length: 63 sort: 0`
|   812.5% |              475.78 |        2,101,827.68 |    0.1% |      0.01 | `LocalVector itemsize: 4 length: 63 sort: 0`
|   100.0% |           34,327.83 |           29,130.88 |    1.1% |      0.41 | `List        itemsize: 4 length: 511 sort: 0`
| 3,671.7% |              934.93 |        1,069,603.20 |    0.7% |      0.01 | `LocalVector itemsize: 4 length: 511 sort: 0`
|   100.0% |          254,799.90 |            3,924.65 |    0.2% |      3.05 | `List        itemsize: 4 length: 4095 sort: 0`
| 8,035.9% |            3,170.79 |          315,378.95 |    1.0% |      0.04 | `LocalVector itemsize: 4 length: 4095 sort: 0`
|   100.0% |              128.63 |        7,774,033.51 |    0.2% |      0.01 | `List        itemsize: 24 length: 1 sort: 0`
|   174.8% |               73.58 |       13,590,218.24 |    0.0% |      0.01 | `LocalVector itemsize: 24 length: 1 sort: 0`
|   100.0% |              508.46 |        1,966,727.22 |    0.2% |      0.01 | `List        itemsize: 24 length: 7 sort: 0`
|   163.0% |              311.90 |        3,206,153.31 |    0.1% |      0.01 | `LocalVector itemsize: 24 length: 7 sort: 0`
|   100.0% |            4,053.12 |          246,723.21 |    0.3% |      0.05 | `List        itemsize: 24 length: 63 sort: 0`
|   368.0% |            1,101.36 |          907,966.79 |    0.2% |      0.01 | `LocalVector itemsize: 24 length: 63 sort: 0`
|   100.0% |           32,916.59 |           30,379.81 |    0.2% |      0.39 | `List        itemsize: 24 length: 511 sort: 0`
|   543.4% |            6,057.23 |          165,092.06 |    1.3% |      0.07 | `LocalVector itemsize: 24 length: 511 sort: 0`
|   100.0% |          284,490.54 |            3,515.06 |    0.2% |      3.40 | `List        itemsize: 24 length: 4095 sort: 0`
|   680.3% |           41,816.05 |           23,914.26 |    0.5% |      0.50 | `LocalVector itemsize: 24 length: 4095 sort: 0`
|   100.0% |              121.97 |        8,198,987.54 |    0.1% |      0.01 | `List        itemsize: 4 length: 1 sort: 1`
|   197.9% |               61.64 |       16,223,083.55 |    0.1% |      0.01 | `LocalVector itemsize: 4 length: 1 sort: 1`
|   100.0% |              561.59 |        1,780,657.10 |    0.6% |      0.01 | `List        itemsize: 4 length: 7 sort: 1`
|   219.0% |              256.46 |        3,899,191.63 |    0.1% |      0.01 | `LocalVector itemsize: 4 length: 7 sort: 1`
|   100.0% |            4,212.65 |          237,380.50 |    0.4% |      0.05 | `List        itemsize: 4 length: 63 sort: 1`
|   678.8% |              620.59 |        1,611,370.31 |    0.2% |      0.01 | `LocalVector itemsize: 4 length: 63 sort: 1`
|   100.0% |           38,429.21 |           26,021.87 |    0.2% |      0.46 | `List        itemsize: 4 length: 511 sort: 1`
| 1,569.3% |            2,448.83 |          408,357.83 |    0.5% |      0.03 | `LocalVector itemsize: 4 length: 511 sort: 1`
|   100.0% |          309,527.59 |            3,230.73 |    0.2% |      3.70 | `List        itemsize: 4 length: 4095 sort: 1`
| 1,573.5% |           19,671.17 |           50,835.82 |    0.3% |      0.23 | `LocalVector itemsize: 4 length: 4095 sort: 1`
|   100.0% |              128.69 |        7,770,760.97 |    1.3% |      0.01 | `List        itemsize: 24 length: 1 sort: 1`
|   154.5% |               83.30 |       12,004,334.72 |    4.2% |      0.01 | `LocalVector itemsize: 24 length: 1 sort: 1`
|   100.0% |              738.00 |        1,355,013.55 |    1.8% |      0.01 | `List        itemsize: 24 length: 7 sort: 1`
|   138.9% |              531.46 |        1,881,610.87 |    0.2% |      0.01 | `LocalVector itemsize: 24 length: 7 sort: 1`
|   100.0% |            6,569.49 |          152,218.86 |    0.1% |      0.08 | `List        itemsize: 24 length: 63 sort: 1`
|   150.1% |            4,377.90 |          228,420.02 |    0.2% |      0.05 | `LocalVector itemsize: 24 length: 63 sort: 1`
|   100.0% |           69,331.06 |           14,423.55 |    0.2% |      0.83 | `List        itemsize: 24 length: 511 sort: 1`
|   132.0% |           52,521.50 |           19,039.82 |    0.2% |      0.63 | `LocalVector itemsize: 24 length: 511 sort: 1`
|   100.0% |          724,062.43 |            1,381.10 |    0.1% |      8.65 | `List        itemsize: 24 length: 4095 sort: 1`
|   123.5% |          586,161.22 |            1,706.02 |    0.1% |      7.01 | `LocalVector itemsize: 24 length: 4095 sort: 1`

Source Code

Just try to run MyTest() in any godot environment. I hack test_main.cpp, thanks @Ivorforce for the idea.

Using this nice small lib nanobench.h

#define ANKERL_NANOBENCH_IMPLEMENT
#include "nanobench.h"
#include "core/templates/list.h"
#include "core/templates/local_vector.h"
#include <ostream>
#include <sstream>
#include <vector>

template<typename T>
class Test
{
    T placehold;
    List<T> list;
    LocalVector<T> vector;

public:
    void bench_list(const std::vector<T> &source, bool sort)
    {
        for (auto &item : source)
        {
            list.push_back(item);
        }
        if (sort)
        {
            list.sort();
        }
        for (auto &item : list)
        {
            placehold = item;
        }
    }

    void bench_local_vector(const std::vector<T> &source, bool sort)
    {
        for (auto &item : source)
        {
            vector.push_back(item);
        }
        if (sort)
        {
            vector.sort();
        }
        for (auto &item : vector)
        {
            placehold = item;
        }
    }

    static void bench(bool sort)
    {
        for (int i = 1; i < 4096 + 1; i *= 8)
        {
            int n = i == 1 ? 1 : i-1;
            std::ostringstream name;
            name << "itemsize: " << sizeof(T) << " length: " << n << " sort: " << sort;
            std::vector<T> source(n);
            auto bench = ankerl::nanobench::Bench()
                .minEpochIterations(1000)
                .relative(true);
            bench.run("List        " + name.str(), [&]{
                Test test;
                test.bench_list(source, sort);
            });
            bench.run("LocalVector " + name.str(), [&]{
                Test test;
                test.bench_local_vector(source, sort);
            });
        }
    }
};

void mytest(void)
{
    Test<int>::bench(false);
    Test<Variant>::bench(false);
    Test<int>::bench(true);
    Test<Variant>::bench(true);
}

YYF233333 · 2024-12-12T09:21:05Z

Add reserve to be aligned with #100251.

Ivorforce

I think this is, generally, a great idea, and pretty unlikely to cause performance regressions. Linked lists are rarely useful. In fact, I would advocate to rename List to LinkedList to avoid accidental use (e.g. by Java folks) because it is rarely more efficient than a linear LocalVector. LocalVector should be the go-to default, and the use of LinkedList should be well-proven to be faster than LocalVector in each specific case, documented next to the variable to justify its use.

That being said — this is a pretty big change, and I gotta say if I were the code maintainer I would be uncomfortable to merge it because of potential regressions. I would therefore advocate to split the PR and address each use separately, to make them small, comfortable and somewhat tested chunks. It's annoying, but it's safe.

Ivorforce · 2024-12-12T14:47:51Z

scene/animation/animation_node_state_machine.cpp

@@ -1582,13 +1583,6 @@ void AnimationNodeStateMachine::remove_transition_by_index(const int p_transitio
 	Transition tr = transitions[p_transition];
 	transitions.write[p_transition].transition->disconnect("advance_condition_changed", callable_mp(this, &AnimationNodeStateMachine::_tree_changed));
 	transitions.remove_at(p_transition);
-
-	Vector<String> path_from = String(tr.from).split("/");


You are removing this because it is unused? Shouldn't that be a separate PR?

Agree, especially when I see how fast codestyle PRs are merged :)

YYF233333 · 2024-12-12T16:30:35Z

That being said — this is a pretty big change, and I gotta say if I were the code maintainer I would be uncomfortable to merge it because of potential regressions. I would therefore advocate to split the PR and address each use separately, to make them small, comfortable and somewhat tested chunks. It's annoying, but it's safe.

Yes, but it is already split from a bigger one :) Despite being somewhat scattered, these changes are essentially the same pattern.

Safety should probably be ok, I double checked they follow the restriction on top, but I cannot prove that to you (unless you read the code).

I'm waiting to see if #100251 can be merged efficiently. If that can not, I'd better give up.

YYF233333 requested review from a team as code owners December 2, 2024 18:31

Calinou added enhancement topic:core performance labels Dec 2, 2024

Calinou added this to the 4.x milestone Dec 2, 2024

YYF233333 mentioned this pull request Dec 3, 2024

Add back() and pop_back() to LocalVector #99955

Open

YYF233333 force-pushed the list branch from fd2e63a to 1c0e09d Compare December 3, 2024 12:23

YYF233333 force-pushed the list branch from 1c0e09d to 87fb6ce Compare December 4, 2024 06:36

YYF233333 mentioned this pull request Dec 4, 2024

Add microbenchmark framework to godot to reduce the work of performance PR godotengine/godot-proposals#11285

Open

KoBeWi mentioned this pull request Dec 5, 2024

[Core] Use Vector for MethodInfo::arguments #91660

Open

YYF233333 mentioned this pull request Dec 10, 2024

Reserve space for LocalVector if size is known #100251

Open

YYF233333 added 2 commits December 12, 2024 16:23

Replace List with LocalVector if used as simple buffer.

d4d6589

Add reserve where appropriate.

66b66dd

YYF233333 force-pushed the list branch from 87fb6ce to 66b66dd Compare December 12, 2024 09:17

YYF233333 requested a review from a team as a code owner December 12, 2024 09:17

Ivorforce reviewed Dec 12, 2024

View reviewed changes

Repiteo removed request for a team December 12, 2024 15:07

YYF233333 mentioned this pull request Dec 12, 2024

Remove dead code in AnimationNodeStateMachine #100324

Merged

YYF233333 mentioned this pull request Dec 12, 2024

Use LocalVector instead of List as arg of Dictionary::get_key_list #100333

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `List` with `LocalVector` if used as simple buffer #99936

Replace `List` with `LocalVector` if used as simple buffer #99936

YYF233333 commented Dec 2, 2024

AThousandShips commented Dec 3, 2024 •

edited

Loading

AThousandShips commented Dec 3, 2024

YYF233333 commented Dec 3, 2024 •

edited

Loading

clayjohn commented Dec 3, 2024

YYF233333 commented Dec 4, 2024

stuartcarnie commented Dec 4, 2024

YYF233333 commented Dec 5, 2024 •

edited

Loading

YYF233333 commented Dec 12, 2024

Ivorforce left a comment •

edited

Loading

Ivorforce Dec 12, 2024

YYF233333 Dec 12, 2024

YYF233333 commented Dec 12, 2024

Replace List with LocalVector if used as simple buffer #99936

Are you sure you want to change the base?

Replace List with LocalVector if used as simple buffer #99936

Conversation

YYF233333 commented Dec 2, 2024

AThousandShips commented Dec 3, 2024 • edited Loading

AThousandShips commented Dec 3, 2024

YYF233333 commented Dec 3, 2024 • edited Loading

clayjohn commented Dec 3, 2024

YYF233333 commented Dec 4, 2024

stuartcarnie commented Dec 4, 2024

YYF233333 commented Dec 5, 2024 • edited Loading

YYF233333 commented Dec 12, 2024

Ivorforce left a comment • edited Loading

Choose a reason for hiding this comment

Ivorforce Dec 12, 2024

Choose a reason for hiding this comment

YYF233333 Dec 12, 2024

Choose a reason for hiding this comment

YYF233333 commented Dec 12, 2024

Replace `List` with `LocalVector` if used as simple buffer #99936

Replace `List` with `LocalVector` if used as simple buffer #99936

AThousandShips commented Dec 3, 2024 •

edited

Loading

YYF233333 commented Dec 3, 2024 •

edited

Loading

YYF233333 commented Dec 5, 2024 •

edited

Loading

Ivorforce left a comment •

edited

Loading