Performance: Modify bounds detection to inheritance when clipping is disabled #365
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR changes the CoreNode update behavior, including how bounds detection works. Instead of creating local strictBoundaries and preLoad boundaries, the boundaries are inherited from the parent unless clipping is enabled.
Why?
Bounds detection added a significant CPU impact to the the rendering loop of the L3 Renderer, causing low end devices to struggle to keep up as the CPU would be overloaded and not in time enough to provide new render instructions to the GPU.
Bounds detection is needed to ensure we only draw what is required on screen / view port and do not render nodes that are outside of the view ports bounds.
What changed?
Previously every node would on every
update()
calculate what itsstrictBound
andpreloadBounds
where based off of its world position. This is quite expensive to do and most of time time not needed unless clipping is enabled on a particular node.This changes:
strictBounds
andpreloadBounds
are inherited from the parent, if the parent has no bounds the viewport stage bounds are used.strictBound
andpreloadBound
for it's own childrenUnrelated to bounds but performance changes that I ran into:
Test results
Prior to the PR I ran two tests, a
CoreNode.update()
throughput test (using #364) and a stress benchmark with bounds for FPS measurements.Tested on a Ryzen 7 6800H / 3070 win11 machine using Chrome Version 128.0.6613.113 (Official Build) (64-bit) on 20x slowdown
Baseline
FPS
index.ts:292 Average FPS: 30.33
index.ts:293 Median FPS: 31
index.ts:294 P01 FPS: 20
index.ts:295 P05 FPS: 25
index.ts:296 P25 FPS: 29
index.ts:297 Std Dev FPS: 2.8001964216818784
index.ts:298 Num samples: 100
index.ts:299 ---------------------------------
Throughput
These changes
FPS
index.ts:292 Average FPS: 40.73
index.ts:293 Median FPS: 41
index.ts:294 P01 FPS: 33
index.ts:295 P05 FPS: 35
index.ts:296 P25 FPS: 39
index.ts:297 Std Dev FPS: 3.3700296734598636
index.ts:298 Num samples: 100
index.ts:299 ---------------------------------
Throughput
About ~10 FPS on 20x slowdown and going from 126 to 4k ops in throughput.