Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Modify bounds detection to inheritance when clipping is disabled #365

Merged
merged 8 commits into from
Sep 3, 2024

Conversation

wouterlucas
Copy link
Contributor

@wouterlucas wouterlucas commented Aug 30, 2024

This PR changes the CoreNode update behavior, including how bounds detection works. Instead of creating local strictBoundaries and preLoad boundaries, the boundaries are inherited from the parent unless clipping is enabled.

Why?

Bounds detection added a significant CPU impact to the the rendering loop of the L3 Renderer, causing low end devices to struggle to keep up as the CPU would be overloaded and not in time enough to provide new render instructions to the GPU.

Bounds detection is needed to ensure we only draw what is required on screen / view port and do not render nodes that are outside of the view ports bounds.

What changed?

Previously every node would on every update() calculate what its strictBound and preloadBounds where based off of its world position. This is quite expensive to do and most of time time not needed unless clipping is enabled on a particular node.

This changes:

  • strictBounds and preloadBounds are inherited from the parent, if the parent has no bounds the viewport stage bounds are used.
  • If a node enabled clipping, only that parent will calculate it's own strictBound and preloadBound for it's own children
  • This is calculated once for all children
  • Don't process anything that's not needed when out of bounds

Unrelated to bounds but performance changes that I ran into:

  • Only run through clipping if clipping is enabled, previously clipping was executed every global Transform (== expensive)
  • Minor conditional checks and small performance gains

Test results

Prior to the PR I ran two tests, a CoreNode.update() throughput test (using #364) and a stress benchmark with bounds for FPS measurements.

Tested on a Ryzen 7 6800H / 3070 win11 machine using Chrome Version 128.0.6613.113 (Official Build) (64-bit) on 20x slowdown

Baseline

FPS

index.ts:292 Average FPS: 30.33
index.ts:293 Median FPS: 31
index.ts:294 P01 FPS: 20
index.ts:295 P05 FPS: 25
index.ts:296 P25 FPS: 29
index.ts:297 Std Dev FPS: 2.8001964216818784
index.ts:298 Num samples: 100
index.ts:299 ---------------------------------

Throughput

┌───┬───────────┬─────────┬───────────────────┬────────┬─────────┐
│   │ Task Name │ ops/sec │ Average Time (ns) │ Margin │ Samples │
├───┼───────────┼─────────┼───────────────────┼────────┼─────────┤
│ 0 │ update    │ 126     │ 7899612.499999996 │ ±2.44% │ 64      │
└───┴───────────┴─────────┴───────────────────┴────────┴─────────┘

These changes

FPS

index.ts:292 Average FPS: 40.73
index.ts:293 Median FPS: 41
index.ts:294 P01 FPS: 33
index.ts:295 P05 FPS: 35
index.ts:296 P25 FPS: 39
index.ts:297 Std Dev FPS: 3.3700296734598636
index.ts:298 Num samples: 100
index.ts:299 ---------------------------------

Throughput

┌───┬───────────┬───────────┬───────────────────┬────────┬─────────┐
│   │ Task Name │ ops/sec   │ Average Time (ns) │ Margin │ Samples │
├───┼───────────┼───────────┼───────────────────┼────────┼─────────┤
│ 0 │ update    │ 4,295,328 │ 232.8110762198926 │ ±1.28% │ 2147665 │
└───┴───────────┴───────────┴───────────────────┴────────┴─────────┘

About ~10 FPS on 20x slowdown and going from 126 to 4k ops in throughput.

@wouterlucas wouterlucas marked this pull request as ready for review August 30, 2024 18:51
@philippe-wm
Copy link

Can you clarify one thing: if I have a "page" component with many rails, I still want each rail to activate when they are in preload bounds. I like the bound inheritance idea, but then we need to be able to opt in for the local calculations.

@wouterlucas
Copy link
Contributor Author

The moment you turn on clipping for that rail, bounds will be calculated for that point of the branch and all it’s children.

@erikhaandrikman
Copy link
Contributor

I removed some duplicate code and improved the stage boundary reuse condition. Other than that, LGTM :)

@wouterlucas wouterlucas added this pull request to the merge queue Sep 3, 2024
Merged via the queue into main with commit b85cb48 Sep 3, 2024
2 checks passed
@wouterlucas wouterlucas deleted the perf/strictbound_parent branch September 3, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants