Skip to content

Conversation

Beilinson
Copy link
Contributor

@Beilinson Beilinson commented Oct 10, 2025

Description

What I did: loaded the basic Google Photorealistic sandcastle and profiled me just moving around slowly

Performance Profile:
I ordered by Self Time descending, and saw that this Cesium3DTilesetStatistics.clone takes about 8% of total frame time.
Image

Why this change:
The performance profile showed that this object destructure was taking up the main bulk of the time of this function.

I audited the code, and saw that while the statistics object is copied frame-to-frame, this property specifically is only ever used by the new statistics object, and passing it by reference kept all the same behavior in prod. Changing it a Map is also a minor performance improvement.

Additionally, I noticed that the actual amount of calls per frame to Cesium3DTilesetTraversal.updateVisibility was higher than the reported number of visited tiles, because some tiles are being updated as part of a check by their parents and are never added to the tileset traversal queue, so never marked as visited. This is rectified by explicitly visiting the tile in updateVisibility.

Current Reported Visited:

image

After this change:

image

Testing plan

Compare inspector results between main sandcastle and here, texture sizes should be the same. Visited tiles should be bigger.

main

local

Author checklist

  • I have submitted a Contributor License Agreement
  • I have added my name to CONTRIBUTORS.md
  • I have updated CHANGES.md with a short summary of my change
  • I have added or updated unit tests to ensure consistent code coverage
  • I have updated the inline documentation, and included code examples where relevant
  • I have performed a self-review of my code

Copy link

Thank you for the pull request, @Beilinson!

✅ We can confirm we have a CLA on file for you.

@Beilinson Beilinson changed the title 3D tileset statistics performance 3D tileset statistics improvements Oct 10, 2025
@Beilinson Beilinson changed the title 3D tileset statistics improvements Cesium3DTilesetStatistics improvements Oct 10, 2025
@Beilinson
Copy link
Contributor Author

I can fix the failing tests, but only after confirmation that the updated visitation numbers are desired. Correctness is based on what visited means:

  1. Currently, whether a tile was added to the traversal queue
  2. As of this PR, whether a tile was actually processed (either as part of the traversal queue or as part of a check by a parent of their children)

@Beilinson Beilinson mentioned this pull request Oct 10, 2025
6 tasks
@lilleyse
Copy link
Contributor

@Beilinson nice performance improvement!

Just a drive by comment, I haven't reviewed the code

I can fix the failing tests, but only after confirmation that the updated visitation numbers are desired. Correctness is based on what visited means:

  1. Currently, whether a tile was added to the traversal queue
  2. As of this PR, whether a tile was actually processed (either as part of the traversal queue or as part of a check by a parent of their children)

I would stay with option 1 - a tile is visited when it is added to the traversal queue.

@Beilinson
Copy link
Contributor Author

Thanks @lilleyse!

I would stay with option 1 - a tile is visited when it is added to the traversal queue.

Thats fair. I worry this is minor misdirection as the count is much smaller than the true number of updated tile. Someone attempting to improve the performance of either of the traversals would be incorrect to base his performance assumptions off of that.

Copy link
Contributor

@jjspace jjspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this a little hard to actually reproduce in the devtools profiler. Eventually I made this small sandcastle which automates starting and stopping the profiler around running a camera sequence.

I do think this is an improvement but it's also not one of the biggest performance impacts from what I can see. I often see the time way lower than shown in the initial post on my machine.

@Beilinson
Copy link
Contributor Author

That's fair, just wanted to clear up that I tested this using 20x cpu slow down to simulate "high resolution" profiling since the regular performance sample rate can easily miss something like this

@Beilinson
Copy link
Contributor Author

Just ran your sandcastle with 6x cpu slowdown, getting the same as reported in my original description:

image

@Beilinson
Copy link
Contributor Author

Beilinson commented Oct 14, 2025

TIL: Apparently you can set an environment variable $CPUPROFILE_FREQUENCY to control the chrome performance profiling sample rate. Default is 100. Setting this to something like 1000 and after restarting chrome I now reproduce the same results as above without any cpu slowdown:

image

@javagl
Copy link
Contributor

javagl commented Oct 14, 2025

Given that the texturesReferenceCounterById that this PR is about was introduced by me, I'd like to mention that I'd usually prefer a Map over a {}.

Usually LinkedHashMap, FWIW...

I cannot really profoundly justify why I didn't use a Map back then. It's probably something vague, like "All the other code is using {}'s, so that's apparently how we rolll 🤷‍♂️". But the fact that most of the code uses {}'s is likely just an aspect of this being legacy code. And as one aspect of ~"trying to improve things" (even if certain improvements are happening faaar tooooo slooooowwwwlllyyyyy), I'd see this PR as a tiny improvement along these lines: It is used like a Map. Let's make it a Map.

(The fact that there are probably hundreds of other places that could be changed from {} to Map and the fact that it doesn't magically double the frame rate shouldn't prevent a clear improvement from being integrated...)

@Beilinson
Copy link
Contributor Author

(The fact that there are probably hundreds of other places that could be changed from {} to Map and the fact that it doesn't magically double the frame rate shouldn't prevent a clear improvement from being integrated...)

I actually did some local testing by converting as many {} hash maps that I could find to a Proxy which doubles all get/set/delete operations done on the {} to a Map, and compared the performance between them. I'll hopefully get to polishing those results up tomorrow and open an issue which will give real insights into which places in the code would benefit most from getting converted to a Map, and the expected performance gains from that (spoiler, in first place is AssociativeArray)

@javagl
Copy link
Contributor

javagl commented Oct 14, 2025

The performance is one thing.

When you talk about a Proxy, then I wonder: If this is a real Proxy, wouldn't this distort the performance results considerably? It would be nice to have some automated process/tooling (maybe just IDE support) for this sort of change. Then one could take two fixed commits, once with the current state, and once with all (appropriate) objects replaced with maps. And then it would be nice if there was some sensible benchmark with which we could objectively compare the performance between these two states.

However, I assume that the performance would generally not become worse. That remains to be confirmed, but from what I've heard, a Map is generally faster for the real Map-like usage patterns.

So regardless of the performance, I think that the clarity of the code is really important. First of all, I don't like the untypedness of JavaScript to begin with. The fact that you can just takeSomeObject.andAssignArbitraryProperties is dangerous for the maintainability in a large codebase. On top of that, is is practically impossible to find out what a line of code like object.x = 42; is doing. (Yeah, that sounds strange, but you never know whether that's a 'setter' - so... just do a full-text-search for x, right?) For the specific case of the Map-like usage pattern, there now are the Object.keys and Object.values functions, but the whole codebase is still littered with these old something.hasOwnProperty(property) calls.


BTW:

spoiler, in first place is AssociativeArray

It was exactly two weeks ago (October 1st ) when I asked in our internal chat...

Any reason to not "replace all" of https://github.com/CesiumGS/cesium/blob/ecf0050a7b185fc9dd4a6b85ab6e0453ec89f349/packages/engine/Source/Core/AssociativeArray.js with Map?


The bottom line: Map where a Map is due 🙂

@ggetz
Copy link
Contributor

ggetz commented Oct 14, 2025

Also a drive by comment: I don't think at this point there's reason not to move to Map other than the time to, like @javagl said, make sure where we do swap them in is the right compromise between performance and clarity.

For AssocitaiveArray specifically, it is part of the public API so, if we decide to remove it, we should properly deprecate first.

Though there's no reason to prevent this PR from going in with just this one Map update as it has a demonstrated benefit.

@ggetz
Copy link
Contributor

ggetz commented Oct 14, 2025

The fact that you can just takeSomeObject.andAssignArbitraryProperties is dangerous for the maintainability in a large codebase.

And I believe doing so also can have performance implications if "dictionary mode" is triggered.

@jjhembd
Copy link
Contributor

jjhembd commented Oct 14, 2025

real insights into which places in the code would benefit most from getting converted to a Map, and the expected performance gains from that (spoiler, in first place is AssociativeArray)

@Beilinson I've had a long-running note to myself to get rid of AssociativeArray. My only reasoning was that it was an unnecessary duplication of Map. But if you also have performance data to justify the change, that's even better!

As @ggetz mentioned, we do have to communicate any change clearly, since it's public. Maybe replacing AssociativeArray should be a separate issue/PR by itself.

@Beilinson
Copy link
Contributor Author

Happy to get conversation (re)started about this! AssociativeArray specifically should definitely be a separate issue (I know my org has code that uses the public API surface for example viewer.entities which is an AssociativeArray). I'd love to continue this in the separate issue I'll open.

Is this PR ready to merge then?

result.texturesReferenceCounterById = {
...statistics.texturesReferenceCounterById,
};
result.texturesReferenceCounterById = statistics.texturesReferenceCounterById;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems Github ate this comment and didn't post it with my original review. This is my main remaining concern with this PR that it's no longer a full clone

What do you think about cloning the map here instead of just passing by reference? In some limited testing it seemed like it was still a big improvement over the spread ... but also stays more true to the meaning of clone by actually cloning?

Suggested change
result.texturesReferenceCounterById = statistics.texturesReferenceCounterById;
result.texturesReferenceCounterById = new Map(statistics.texturesReferenceCounterById);

Copy link
Contributor Author

@Beilinson Beilinson Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just compared all three options using your sandbox and CPUPROFILE_FREQUENCY=1000, the results:

Before this pr:
image

Clone with new Map (this suggestion):
image

No deep clone (my version):
Don't have anything to show here because it doesn't show up on the performance sample

This change makes it no longer a "deep clone", it still is a "shallow clone", and I dont think the deep behavior is needed for a completely internal part of this class which has zero effect on any part of the external runtime or the end result when comparing statistics before/after rendering. IMO, even if its only 4% of total cpu time, thats still 4% that has no added value on the outcome of the statistics counting.

This element seems to only be used by the new version (the clone) and never the old object, the actual texturesByteLength which this is attribute is used to update is copied over.

For context, at 4% this unnecessary clone takes more time than Matrix4.multiplyTransformation or binding the vertices to webgl:

image

4% rendertime isn't critical, but I think if many of these small percentage improvements can be made that will make a big long-term difference.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjspace You mentioned the cloning aspect in the now-closed PR at #12968 (comment) (but that referred to the credits, so I'm not sure if this is what you meant...?)

I'm also in strong favor of avoiding surprises: clone should clone (no shallow copy with unpredictable side-effects).

But... I had a short look at where Cesium3DTilesetStatistics.clone is actually called and how it is used.

One call is in update (every call is in some update 😆 ). It's not really documented what's happening there. But it seems to fill some tileset._statisticsPerPass array. This array, in turn, does not seem to be used anywhere.

The other call is in raiseLoadProgressEvent. There, it fills some tileset._statisticsLast. But after that, it only seems to care about the numberOfPendingRequests and numberOfTilesProcessing there (and certainly not about that map with the texture IDs).

Sooo... unless I'm overlooking something, the places that are using that clone function are doing a lot of unnecessary work to begin with, and on top of that, none of them seems to depend on that Map. I'm sure there is some room for improvement.


One point that might be relevant for real peformance comparisons: What's in that Map after all? The performance will to some extent depend on the size of that map. On the other hand: When there are 10 textures, then the map is small, and when there are 1000 textures, then there is other costly stuff - so I'd expect this to not be a bottleneck either way.

Copy link
Contributor Author

@Beilinson Beilinson Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javagl The map gets pretty big (hundreds of entries) of [string-number] pairs.

I checked the instances of the statistics cloning and saw the same as you reported. Either way, the clone is used more as a "Lets get a snapshot of what the statistics currently looks like", wherein the map is not useful.

_statisticsPerPass is used in the specs, and also in the Cesium3DTilesInspectorViewModel.getStatistics. Here all the internal properties are needed except the Map again, because these act as snapshots of what the statistics looked like while processing the pick/render/whatever pass, and aren't actively used for calculating anything.

Ideas:

  1. Make this a separate function (.snapshot or open to other suggestions) that only passes around these values without copying the map. It would return an object that doesn't have the functions to increment/decrement the values, representing that its a static snapshot object that shouldn't be worked on directly.
  2. Explicitly copy over only what is needed, so _statisticsLast would become an object containing only numberOfPendingRequests and numberOTilesProcessing, and the passes would copy over everything except the map. I think the snapshot model is cleaner.
    thoughts @javagl @jjspace ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an example of what the snapshot would look like, note that the passes and last statistics now are no longer full statistics objects, so they dont contain functions to increment/decrement counts or any other logic, just container objects. They also don't contain the map.

Copy link
Contributor

@javagl javagl Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, but a few random comments for now:

  • When something is only used in tests, it's a hint that it may be removed
  • I didn't have the Cesium3DTilesInspectorViewModel on the radar (usually only searching in ./packages/engine/Source...)
  • The whole structure and role of the statistics raises a bunch of questions
    • Top-level ones: Which of them are purely informative (to be shown in a UI), and which of them are crucially influencing the behavior of the whole application? I think that the _statisticsPerPass could/should carry a comment saying that it's only for the UI. And for me, the inlined "block comments", // Rendering statistics, // Loading statistics ... etc. are screaming that there should be RenderingStatistics and LoadingStatistics sub-structures.

The 'snapshot' approach looks conceptually clean for me, but implies new structures (first and foremost that new type, ...Snapshot) and several changes that may warrant some explaining (maybe even just /** A snapshot is a shallow copy ... */ or so).

If this was only about the raiseLoadProgressEvent, I could imagine that storing both relevant fields explicitly, as
tileset._lastNumberOfPendingRequests and _lastNumberOTilesProcessing could be an option a well. (Yes, this "litters" the tileset even more, but ... now there's that statisticsLast which carries mostly unused stuff, so both are not ideal anyhow...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about statisticsLast (although future logic may want to use the other attributes for certain logic maybe).

But yes it's both used for specs (which in my opinion would preferably be something hidden by a debug flag) but also for the inspector view to debug either the pick/render pass (from what I saw nearly every member other than the map is used).

I would prefer not to stretch the structural changes in this PR much more than I have already, but let me know what you think is best, just comment the areas for future reference or something different

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, no strong opinion (I assume others will chime in here), but

  • Change to Map: Good
  • Shallow copy instead of clone? Confusing
  • The changes from the last commit ("snapshot") go a bit too far and raise too many questions for me

A compromise:

Iff the performance benefit is really worth it, maybe just renaming the clone to shallowCopy could be reasonable? (It's an undocumented/internal function after all.. )

Copy link
Contributor Author

@Beilinson Beilinson Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im fine with that, shallowClone + comment explaining that the clones should not be worked on directly sounds good to me.

I could also set the map to undefined so if in the future someone does try to use one of the clones it shoulderror out.

thoughts @jjspace /and or others?

@jjspace
Copy link
Contributor

jjspace commented Oct 14, 2025

Is this PR ready to merge then?

We definitely got a little off track. All good discussion but it can be moved into that issue you just opened. I agree with @ggetz that there's no reason to hold up this specific switch to Map. I do still have one concern, posted above, that got lost earlier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants