vsg::Intersector wasted time #1588
Replies: 2 comments 1 reply
-
This proposal isn't viable as something deeper in the scenegraph might care about arrays set higher up, so we need to proactively track everything in case something else cares. The best alternative I've come up with so far is to use a pool for this vector so when we pop an array state from the stack and push a new one so we can traverse a node's sibling, we can reuse the existing allocation. In theory, it's a pretty elegant and obvious way to make things faster, but actually wiring it up is proving un-beautiful so far. |
Beta Was this translation helpful? Give feedback.
-
|
I've added an extra bullet point about
This would be:
I suppose the only practical way to make this particular aspect faster would be to optimise the VSG allocator itself, e.g. by adding a thread-local FILO free list that memory is released to first before going back into the mutex-protected free memory so code that repeatedly allocated and deallocated same-sized objects wouldn't need to lock a mutex to do so or scan over a bunch of differently-sized memory blocks to find a right-sized one. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been working on making
vsg::LineSegmentIntersectorfaster for a client to allow faster terrain height queries for 3D Tiles data, and have been profiling a test app that does a large number of queries to see where time is spent. The VSG part is available at https://github.com/AnyOldName3/VulkanSceneGraph/tree/fast-intersection-queries, and the test app is available at https://github.com/AnyOldName3/vsgExamples/tree/terrain-height-example. Parts of the implementation are still temporary, so there's no pull request yet.I've already made things much faster by creating an optional acceleration structure for
vsg::VertexDrawso instead of testing for intersection with every triangle, only the ones which have a high chance of being near a line are tested, and I've reduced the amount of time spent allocating intersector members by making it possible to reset an intersector and use it to query a new line and/or scenegraph.However, there's still a significant amount of time spent on things that I can't optimise without breaking changes of one kind or another, so before I jumped in, I thought I better mention these here.
Nested allocations
Even if we reuse an intersector, so don't have to redo the direct allocations for intersections at a similar scenegraph depth, we've got to destroy and recreate the storage that some intersector members own.
arrayStateStackholds multipleArrayStateinstances, which contain:localToWorldStackandworldToLocalStack- as far as I can tell, these are only there so that the intersectors can intersect billboarded geometry, but we could get the same result by making these stacks part ofvsg::Intersectorand passing in the top nodes when asking the array state for the vertex buffer. I'm wondering if there's any reason not to make this change? Also, these stacks aren't set byvsg::ComputeBounds, which probably means that things with custom array state give incorrect bounds.verticesandproxy vertices. These aren't a big problem as they share ownership of storage, and cloning array state just copies pointers.arrays, avsg::DataList. Maybe this can be allocated in place if it's changed to just be a singleref_ptr<Data>for the basicArrayStateclass, tworef_ptr<Data>fields forTranslationArrayStateetc. so we only need to track the (fixed number of) attributes we care about.intersectionscan hold multipleIntersectioninstances, which contain:nodePath, which is important when an app wants to know the node path, so can't be avoided unless there's some way to opt out.arrays, still aDataList. I don't see the point as it can be computed from the node path by applications that need it without making applications that don't have to pay the cost of it.indexRatios, which could be astd::array<IndexRatio, 3>if we only cared about triangles. Vulkan only directly supports triangles (and other topologies made of triangles), except when tesselation is used, in which case, you can define patches with arbitrarily many control points (before turning them into triangles in the shader). I guess we could optimise the common case while letting complicated tesselation-based things still do what they wanted if the field became astd::variant<std::array<IndexRatio, 3>, std::vector<IndexRatio>>so triangles work in-place, but higher-order patches can use the heap.For applications that only care about the first intersection, all of these could be made into non-nested allocations by just overwriting a single Intersection instance.
arrayStateStackandintersectionsboth holdref_ptrs instead of directly holding the objects, so there's a mandatory allocator invocation each time anything's added to these containers even if growing the containers avoids allocation. I'm not seeing an obvious reason forIntersectionto inherit fromvsg::Object, and if it didn't, there'd be no need to have the layer of indirection, butArrayStateis used elsewhere, and we don't know the specific runtime type anyway, so it has to be a heap-allocated pointer (until such a time as C++ gains the ability to use something like C's variable-length arrars to stack-allocate an object of runtime-knowable type).Secondary intersections
A lot of applications will only care about the closest intersection. It wastes time to compute intersections we don't care about and sometimes we can know we don't care because a node's bounds showed the whole thing was beyond the best-known intersection so far. Also, as mentioned above, it means we need to redo any allocations an intersection instance owns.
The particular use case I've been tasked with optimising isn't troubled by multiple intersections due to the shape of the geometry and direction of the line segments, but a considerable amount of time is spent allocating storage for intersection members, even after switching to reusing intersectors so the storage for the intersection instances themselves only needs allocating once.
Other stuff
At the moment, the most expensive part of intersector traversal (other than the intersection maths and cache misses) is all of the mandatory allocations, so there's nothing else worth optimising yet. It just wouldn't make a meaningful difference until the bigger problems are made smaller.
Beta Was this translation helpful? Give feedback.
All reactions