More reduced allocations #7193

anderseknert · 2024-11-26T12:31:46Z

For the first time, down under a 100 million allocations when running regal lint bundle 🎈

main

BenchmarkLintAllEnabled-10    1	2538350916 ns/op	6182626816 B/op	108424249 allocs/op

pr

BenchmarkLintAllEnabled-10    1	2297548750 ns/op	5355014152 B/op	94603784 allocs/op

But there's more to it than just the number of allocations:

➜ hyperfine -i --warmup 1 'regal lint bundle' 'regal-new lint bundle'
Benchmark 1: regal lint bundle
  Time (mean ± σ):      2.822 s ±  0.055 s    [User: 19.299 s, System: 0.603 s]
  Range (min … max):    2.743 s …  2.961 s    10 runs

Benchmark 2: regal-new lint bundle
  Time (mean ± σ):      2.373 s ±  0.040 s    [User: 15.940 s, System: 0.575 s]
  Range (min … max):    2.315 s …  2.435 s    10 runs

Summary
  regal-new lint bundle ran
    1.19 ± 0.03 times faster than regal lint bundle

Most notable changes:

Reuse trieTraversalResult in indexing, as these were expensive and short-lived. This had the most dramatic impact on the number of reduced allocations of all the changes here.
Optimize *set, *object and *Array operations to minimize allocations by using "primitive" form iteration instead of the function literal counterparts internally, and to only reset the sort guard when needed.
New Array.Equal implementation does not remove any allocations as the old implementation didn't allocate either. It did however perform much better for the case where the compared arrays were not equal.

srenatus

Thanks for working on this. Some comments inline 🙃

srenatus · 2024-11-26T12:35:50Z

ast/term.go

-	s.Foreach(func(x *Term) {
-		if !other.Contains(x) {
-			r.Add(x)
+	terms := make([]*Term, 0, len(s.keys))


Overallocating is better than re-allocating, I suppose? 🤔

Not sure tbh, an the length can probably be tweaked. Rewriting these was not a huge allocation saver, but it did save some and performed consistently better across all tests, so I left it be.

srenatus · 2024-11-26T12:38:56Z

topdown/walk.go

 	if filter == nil || filter.Len() == 0 {
 		if path == nil {
 			path = ast.NewArray()
 		}

-		if err := iter(ast.ArrayTerm(ast.NewTerm(path.Copy()), input)); err != nil {
+		// TODO: why does this not work?


TODO: let's find an answer before merging this 😉

I think it doesn't work because defer doesn't go well with CPS. return iter(...) will mean that the defer runs when everything is evaluated successfully.

let's find an answer before merging this

Naturally :) I'll see if I can get something to work without defer ... but note that this actually does work in the walkNoPath version... the few added unit tests on this confirms so, but even more so that regal lintworks, as we're quite dependent on that.

I think it doesn't work because defer doesn't go well with CPS. return iter(...) will mean that the defer runs when everything is evaluated successfully.

This seems to align with what some println's tell me. However, I'm not sure why this would be an issue. Even if all path/value pairs aren't returned to the pool until after the whole walk is done, that would still be a substantial improvement over having to create all new items the next walk is invoked. BUT the YAML tests show some really odd behavior in the printed path/values vs the expected ones! But now that I try to run these changes using regal lint bundle, everything actually works.. and we use walk (even with path) quite extensively... so I'm really really clueless as to what's going on right now 😅

@srenatus I have now pushed my latest changes, that return items to the pool "manually" instead of via defer. I also added a pool for terms as they too can be reduced, and all in all this is another 4 million allocations down if successful. The OPA YAML tests related to walk are failing, where Regal just keeps going. I don't even know where to start 😄 But perhaps you'll come up with some ideas. I figured I'd push this so that we're on the same page.

topdown/walk.go

anderseknert · 2024-11-27T07:05:24Z

The failing test is "unrelated to this PR", which is a bit of a bold statement, given that the test was added by this very PR. But that test fails even on current OPA, and it seems like we have some ordering issue in the Wasm-implementation of walk.

 --- FAIL: TestWasmE2E (308.07s)
    --- FAIL: TestWasmE2E/test/cases/testdata/v1/walkbuiltin/test-walkbuiltin-0972.yaml/walkbuiltin/objects_no_path (0.09s)
        external_test.go:110: expected {{"x": [{"v1": "hello", "v2": "goodbye"}, "hello", "goodbye"]}} but got {{"x": [{"v1": "hello", "v2": "goodbye"}, "goodbye", "hello"]}}

srenatus · 2024-11-27T12:29:52Z

topdown/walk.go

+
+	termPool = sync.Pool{
+		New: func() any {
+			return ast.NewTerm(ast.Boolean(false))


I think I'd be less surprised about something like return &ast.Term{} -- why bother with a Value if it's not supposed to have one in the first place?

For the first time, down under a 100 million allocations when running `regal lint bundle` 🎈 **main** ``` BenchmarkLintAllEnabled-10 1 2538350916 ns/op 6182626816 B/op 108424249 allocs/op ``` **pr** ``` BenchmarkLintAllEnabled-10 1 2282894416 ns/op 5310032744 B/op 93674054 allocs/op ``` But there's more to it than just the number of allocations: ``` ➜ hyperfine -i --warmup 1 'regal lint bundle' 'regal-new lint bundle' Benchmark 1: regal lint bundle Time (mean ± σ): 2.822 s ± 0.055 s [User: 19.299 s, System: 0.603 s] Range (min … max): 2.743 s … 2.961 s 10 runs Benchmark 2: regal-new lint bundle Time (mean ± σ): 2.373 s ± 0.040 s [User: 15.940 s, System: 0.575 s] Range (min … max): 2.315 s … 2.435 s 10 runs Summary regal-new lint bundle ran 1.19 ± 0.03 times faster than regal lint bundle ``` Most notable changes: - Reuse trieTraversalResult in indexing, as these were expensive and short-lived. This had the most dramatic impact on the number of reduced allocations of all the changes here. - Optimize *set, *object and *Array operations to minimize allocations by using "primitive" form iteration instead of the function literal counterparts internally, and to only reset the sort guard when needed. - New Array.Equal implementation does not remove any allocations as the old implementation didn't allocate either. It did however perform much better for the case where the compared arrays were not equal. Signed-off-by: Anders Eknert <[email protected]>

anderseknert force-pushed the more-interning branch from 6809980 to 5c3f996 Compare November 26, 2024 12:34

srenatus reviewed Nov 26, 2024

View reviewed changes

anderseknert force-pushed the more-interning branch 2 times, most recently from d6102d8 to 409425a Compare November 26, 2024 20:05

srenatus reviewed Nov 27, 2024

View reviewed changes

anderseknert force-pushed the more-interning branch 2 times, most recently from 2a7b437 to 49d325c Compare November 30, 2024 20:49

anderseknert force-pushed the more-interning branch from 49d325c to 4206ed7 Compare November 30, 2024 20:49

anderseknert requested a review from srenatus November 30, 2024 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More reduced allocations #7193

More reduced allocations #7193

anderseknert commented Nov 26, 2024

srenatus left a comment

srenatus Nov 26, 2024

anderseknert Nov 26, 2024

srenatus Nov 26, 2024

anderseknert Nov 26, 2024

anderseknert Nov 26, 2024

anderseknert Nov 26, 2024

anderseknert commented Nov 27, 2024

srenatus Nov 27, 2024

More reduced allocations #7193

Are you sure you want to change the base?

More reduced allocations #7193

Conversation

anderseknert commented Nov 26, 2024

srenatus left a comment

Choose a reason for hiding this comment

srenatus Nov 26, 2024

Choose a reason for hiding this comment

anderseknert Nov 26, 2024

Choose a reason for hiding this comment

srenatus Nov 26, 2024

Choose a reason for hiding this comment

anderseknert Nov 26, 2024

Choose a reason for hiding this comment

anderseknert Nov 26, 2024

Choose a reason for hiding this comment

anderseknert Nov 26, 2024

Choose a reason for hiding this comment

anderseknert commented Nov 27, 2024

srenatus Nov 27, 2024

Choose a reason for hiding this comment