Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instancer Capsule #5453

Merged
merged 18 commits into from
Nov 10, 2023

Conversation

danieldresser-ie
Copy link
Contributor

Based on #3998, the concept for this is that when an Instancer is set to "encapsulateInstanceGroups", it should be able to add special cases to communicate more directly with the render, improving performance.

My goal for today was to clear away enough of the construction sawdust for it to not be totally unreasonable for John to do a first pass review before his vacation.

Known issues:

  • I haven't yet supported prototype contexts properly. This will be implemented using a local hash map keyed by prototype context in InstancerCapsule::render to create a vector of prototype objects, which will be passed to the command, along with a list of indices for which one to use for each instance.
  • Many TODOs
  • No tests

Main questions:

Was I right to remove prototypeChildNamesPlug? I think the answer is definitely yes, though this is currently causing some small performance regressions ... I don't think they're actually important, but I will probably investigate further, just because it's deeply bothering that I don't understand why some of the tests have gotten slower ( testEngineDataPerf has gotten obviously slower for obvious reasons, but is not an important case in practice ). If there is going to be any plug that holds something that needs to be global to the Instancer, but doesn't get initialized when EngineData is initialized, then rather than prototypeChildNamesPlug, what would be nice to store separately is the computation of m_prototypeIndexRemap and m_instancesForPrototype which occur here:


... main time when this work would happen unnecessarily is when there are several transform segments, but the topology does not change within the shutter. Then rather than building separate mappings for each EngineData, you could just build a mapping for the first EngineData, and then compare the topology of each subsequent EngineData, and if they're equal, just use the mapping from the first one. Currently, in order to compare if they're identical, you'd have to get the EngineData for each sample, triggering all the mapping builds. Probably not worth actually doing, but I might try to benchmark the cost we're currently paying at least.

Where should InstancerCapsule live? Currently, EngineData is only forward-declared in Instancer.h ( and I think this is a good thing ), which means that in order to use EngineData, InstancerCapsule must be defined in Instancer.cpp. To achieve this, I've put InstancerCapsule inside Instancer, though I think things might be clearer if it was outside, and just declared as a friend in order to access what it needs. Would it be acceptable to have both GafferScene::Instancer and GafferScene::InstancerCapsule both defined in Instancer.cpp?

I guess those are my specific questions, but also just general feedback on how things are structured ... and how I should go about testing this ... it feels like there are a huge number of special cases.

Copy link
Member

@johnhaddon johnhaddon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel! This is all feeling pretty good to me, and its pleasing that we're not really having to restructure anything particularly radically to achieve it.

Was I right to remove prototypeChildNamesPlug? I think the answer is definitely yes,

I concur.

though this is currently causing some small performance regressions ... I don't think they're actually important

I agree here too. I think losing a tiny bit of performance for the non-encapsulated case in return for huge wins for the encapsulated case is a good trade-off. The non-encapsulated case isn't the one people choose when performance is their number one priority.

but I will probably investigate further, just because it's deeply bothering that I don't understand why some of the tests have gotten slower

Seems worth a quick look, but the way you've structured things currently seems pretty clean and simple, and I'd be reluctant to make that much more complex just to claw back a few percent in the case where the user has opted out of the higher performance approach.

and how I should go about testing this

I'd definitely recommend looking at using CapturingRenderer, as it provides a pretty direct way of checking that you're getting what you want. We could perhaps also give CapturingRenderer a custom constructor argument to get it to pretend to be the Arnold renderer, so we could test the instancer command/object.

src/GafferSceneModule/HierarchyBinding.cpp Outdated Show resolved Hide resolved
include/GafferScene/Instancer.h Outdated Show resolved Hide resolved
include/GafferScene/Instancer.h Outdated Show resolved Hide resolved
include/GafferScene/Capsule.h Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/IECoreArnold/Renderer.cpp Outdated Show resolved Hide resolved
src/IECoreArnold/Renderer.cpp Outdated Show resolved Hide resolved
src/IECoreArnold/Renderer.cpp Outdated Show resolved Hide resolved
@danieldresser-ie danieldresser-ie changed the base branch from main to 1.3_maintenance October 13, 2023 06:02
@danieldresser-ie danieldresser-ie marked this pull request as ready for review October 13, 2023 06:03
@danieldresser-ie
Copy link
Contributor Author

Well, this has taken way too long, but I'm finally feeling like I might have gotten some decent test coverage and hunted down all the weird corner case bugs from before.

The code I would look at with the most suspicion are:

  • the handling of time when evaulating different thing ( the way that shutter works for nested Capsules is a bit weird )
  • the way that the instancer is now passed to Arnold as a CompoundObject ( I moved this into ArnoldRendererBase at the last minute, and I'm not sure it all makes sense )

I did do a quick investigation to figure out why without the Arnold specific code path, ArnoldRendererTest.testInstancerEncapsulatePerf is showing a 20% slowdown ( though with the special code path it goes 30X faster ). Turns out, the slowdown is because it is now completely single threaded - it is taking 20% more time on one thread than the previous code took while fully loading 16 threads. Comparing single thread evaluation, it is 10X faster without the Arnold specific code, and 150X faster with the special case.

We probably should be able to add some multithreading here to improve performance when the render translation is bottlenecked on a single instancer.

@danieldresser-ie
Copy link
Contributor Author

The tests aren't running because I had originally targetted main which resulted in a merge conflict. I retargeted to 1.3, but now can't see any way to run the tests - all the relevant ones passed locally however.

@danieldresser-ie danieldresser-ie force-pushed the instancerCapsule branch 3 times, most recently from 73d2f0d to ea4ba88 Compare October 21, 2023 02:46
@johnhaddon
Copy link
Member

Thanks Daniel, I've been through and checked and resolved all the comments from my previous review - thanks for addessing them. I haven't rereviewed the latest code as it looks like there are a few problems with it :

CI is showing an Instancer-related crash in the Linux debug build, and a compilation error in the Windows build :

testAttributes (GafferSceneTest.InstancerTest.InstancerTest) ... ERROR
testBoundHashIsStable (GafferSceneTest.InstancerTest.InstancerTest) ... ok
testChildNamesPerf (GafferSceneTest.InstancerTest.InstancerTest) ... skipped 'Categories excluded by `performance`'
testContextChangedAndGIL (GafferSceneTest.InstancerTest.InstancerTest) ... ok
testContextSet (GafferSceneTest.InstancerTest.InstancerTest) ... *** Segmentation fault

I also ran into a crash when doing a quick Arnold performance test here. Hopefully you can reproduce it with the file below and gaffer stats fileName.gfr -scene ArnoldRender.task :

import Gaffer
import GafferArnold
import GafferDispatch
import GafferScene
import IECore
import imath

Gaffer.Metadata.registerValue( parent, "serialiser:milestoneVersion", 1, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:majorVersion", 3, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:minorVersion", 5, persistent=False )
Gaffer.Metadata.registerValue( parent, "serialiser:patchVersion", 0, persistent=False )

__children = {}

__children["Plane"] = GafferScene.Plane( "Plane" )
parent.addChild( __children["Plane"] )
__children["Plane"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Sphere"] = GafferScene.Sphere( "Sphere" )
parent.addChild( __children["Sphere"] )
__children["Sphere"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Instancer"] = GafferScene.Instancer( "Instancer" )
parent.addChild( __children["Instancer"] )
__children["Instancer"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["PathFilter"] = GafferScene.PathFilter( "PathFilter" )
parent.addChild( __children["PathFilter"] )
__children["PathFilter"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["ArnoldRender"] = GafferArnold.ArnoldRender( "ArnoldRender" )
parent.addChild( __children["ArnoldRender"] )
__children["ArnoldRender"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["StandardOptions"] = GafferScene.StandardOptions( "StandardOptions" )
parent.addChild( __children["StandardOptions"] )
__children["StandardOptions"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Parent"] = GafferScene.Parent( "Parent" )
parent.addChild( __children["Parent"] )
__children["Parent"]["children"].addChild( GafferScene.ScenePlug( "child1", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Parent"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Camera"] = GafferScene.Camera( "Camera" )
parent.addChild( __children["Camera"] )
__children["Camera"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Outputs"] = GafferScene.Outputs( "Outputs" )
parent.addChild( __children["Outputs"] )
__children["Outputs"]["outputs"].addChild( Gaffer.ValuePlug( "output1", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Outputs"]["outputs"]["output1"].addChild( Gaffer.StringPlug( "name", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Outputs"]["outputs"]["output1"].addChild( Gaffer.BoolPlug( "active", defaultValue = True, flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Outputs"]["outputs"]["output1"].addChild( Gaffer.StringPlug( "fileName", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Outputs"]["outputs"]["output1"].addChild( Gaffer.StringPlug( "type", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Outputs"]["outputs"]["output1"].addChild( Gaffer.StringPlug( "data", defaultValue = '', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Outputs"]["outputs"]["output1"].addChild( Gaffer.CompoundDataPlug( "parameters", flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Outputs"]["outputs"]["output1"]["parameters"].addChild( Gaffer.NameValuePlug( "driverType", Gaffer.StringPlug( "value", defaultValue = 'ClientDisplayDriver', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "driverType", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["Outputs"]["outputs"]["output1"]["parameters"].addChild( Gaffer.NameValuePlug( "displayHost", Gaffer.StringPlug( "value", defaultValue = 'localhost', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "displayHost", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["Outputs"]["outputs"]["output1"]["parameters"].addChild( Gaffer.NameValuePlug( "displayPort", Gaffer.StringPlug( "value", defaultValue = '${image:catalogue:port}', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "displayPort", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["Outputs"]["outputs"]["output1"]["parameters"].addChild( Gaffer.NameValuePlug( "remoteDisplayType", Gaffer.StringPlug( "value", defaultValue = 'GafferImage::GafferDisplayDriver', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "remoteDisplayType", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["Outputs"]["outputs"]["output1"]["parameters"].addChild( Gaffer.NameValuePlug( "quantize", Gaffer.IntVectorDataPlug( "value", defaultValue = IECore.IntVectorData( [ 0, 0, 0, 0 ] ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "quantize", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["Outputs"]["outputs"]["output1"]["parameters"].addChild( Gaffer.NameValuePlug( "catalogue:imageName", Gaffer.StringPlug( "value", defaultValue = 'Image', flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ), "catalogue_imageName", Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic ) )
__children["Outputs"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["ArnoldOptions"] = GafferArnold.ArnoldOptions( "ArnoldOptions" )
parent.addChild( __children["ArnoldOptions"] )
__children["ArnoldOptions"].addChild( Gaffer.V2fPlug( "__uiPosition", defaultValue = imath.V2f( 0, 0 ), flags = Gaffer.Plug.Flags.Default | Gaffer.Plug.Flags.Dynamic, ) )
__children["Plane"]["divisions"].setValue( imath.V2i( 1000, 1000 ) )
__children["Plane"]["__uiPosition"].setValue( imath.V2f( 6.95000029, 16.2000008 ) )
__children["Sphere"]["__uiPosition"].setValue( imath.V2f( 20.4499989, 16.2000008 ) )
__children["Instancer"]["in"].setInput( __children["Plane"]["out"] )
__children["Instancer"]["filter"].setInput( __children["PathFilter"]["out"] )
__children["Instancer"]["prototypes"].setInput( __children["Sphere"]["out"] )
__children["Instancer"]["prototypeIndex"].setValue( 'prototypeIndex' )
__children["Instancer"]["orientation"].setValue( 'orientation' )
__children["Instancer"]["scale"].setValue( 'scale' )
__children["Instancer"]["attributes"].setValue( 'P' )
__children["Instancer"]["attributePrefix"].setValue( 'user:' )
__children["Instancer"]["encapsulateInstanceGroups"].setValue( True )
__children["Instancer"]["__uiPosition"].setValue( imath.V2f( 8.45088482, 8.03593826 ) )
__children["PathFilter"]["paths"].setValue( IECore.StringVectorData( [ '/plane' ] ) )
__children["PathFilter"]["__uiPosition"].setValue( imath.V2f( 33.4499969, 14.1185303 ) )
__children["ArnoldRender"]["in"].setInput( __children["Outputs"]["out"] )
__children["ArnoldRender"]["fileName"].setValue( '${project:rootDirectory}/asses/${script:name}/${script:name}.####.ass' )
__children["ArnoldRender"]["__uiPosition"].setValue( imath.V2f( 8.64999771, -38.0818558 ) )
__children["StandardOptions"]["in"].setInput( __children["Parent"]["out"] )
__children["StandardOptions"]["options"]["renderCamera"]["value"].setValue( '/camera' )
__children["StandardOptions"]["options"]["renderCamera"]["enabled"].setValue( True )
__children["StandardOptions"]["__uiPosition"].setValue( imath.V2f( 10.1499977, -11.8537388 ) )
__children["Parent"]["in"].setInput( __children["Instancer"]["out"] )
__children["Parent"]["parent"].setValue( '/' )
__children["Parent"]["children"][0].setInput( __children["Camera"]["out"] )
__children["Parent"]["__uiPosition"].setValue( imath.V2f( 10.1499977, -1.95374155 ) )
__children["Camera"]["transform"]["translate"].setValue( imath.V3f( 0, 0, -5.35361481 ) )
__children["Camera"]["__uiPosition"].setValue( imath.V2f( 50.0499954, 2.94625902 ) )
__children["Outputs"]["in"].setInput( __children["ArnoldOptions"]["out"] )
__children["Outputs"]["outputs"]["output1"]["name"].setValue( 'Interactive/Beauty' )
__children["Outputs"]["outputs"]["output1"]["fileName"].setValue( 'beauty' )
__children["Outputs"]["outputs"]["output1"]["type"].setValue( 'ieDisplay' )
__children["Outputs"]["outputs"]["output1"]["data"].setValue( 'rgba' )
__children["Outputs"]["__uiPosition"].setValue( imath.V2f( 10.1499977, -29.9178028 ) )
__children["ArnoldOptions"]["in"].setInput( __children["StandardOptions"]["out"] )
__children["ArnoldOptions"]["options"]["consoleInfo"]["value"].setValue( True )
__children["ArnoldOptions"]["options"]["consoleInfo"]["enabled"].setValue( True )
__children["ArnoldOptions"]["options"]["consoleStats"]["value"].setValue( True )
__children["ArnoldOptions"]["options"]["consoleStats"]["enabled"].setValue( True )
__children["ArnoldOptions"]["options"]["statisticsFileName"]["value"].setValue( '/tmp/test,stats' )
__children["ArnoldOptions"]["__uiPosition"].setValue( imath.V2f( 10.1499977, -21.7537403 ) )


del __children

The output I get is this :

WARNING : setParameter : Unsupported data type "RunTimeTyped" for name "user:P"WARNINGWARNING | signal caught: SIGSEGV -- Invalid memory reference

While taking care of all that, could you squash in the fixups and reverts so I can review in the form we'll merge it (I think I've forgotten anything I might have known from the previous review anyway) please?

Cheers...
John

@danieldresser-ie
Copy link
Contributor Author

Fixed one silly mistake and squashed. Tests are all passing locally, lets see what CI thinks ...

Copy link
Member

@johnhaddon johnhaddon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel! As discussed in a side channel just now, I'm posting my half-complete review to give you a few simple things to deal with today, and a few performance measurements from my end to consider. I'll return to review the trickier stuff in Instancer.cpp tomorrow, hopefully with a clearer head.
Cheers...
John

python/GafferArnoldTest/ArnoldRenderTest.py Outdated Show resolved Hide resolved
python/GafferArnoldTest/ArnoldRenderTest.py Outdated Show resolved Hide resolved
python/GafferArnoldTest/ArnoldRenderTest.py Outdated Show resolved Hide resolved
python/GafferArnoldTest/ArnoldRenderTest.py Outdated Show resolved Hide resolved
python/GafferArnoldTest/ArnoldRenderTest.py Show resolved Hide resolved
python/GafferSceneTest/InstancerTest.py Outdated Show resolved Hide resolved
python/GafferSceneTest/InstancerTest.py Show resolved Hide resolved
python/GafferSceneTest/InstancerTest.py Show resolved Hide resolved
python/GafferSceneTest/InstancerTest.py Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
@danieldresser-ie danieldresser-ie force-pushed the instancerCapsule branch 2 times, most recently from f640e63 to e77ebb6 Compare October 25, 2023 05:51
@danieldresser-ie
Copy link
Contributor Author

danieldresser-ie commented Oct 25, 2023

I think I've addressed all feedback. The one other significant change here - since you said you hadn't really looked at the changes to Instancer.cpp much, I've gone back and split the commit to pretend that the changes to remove __prototypeChildNames happened before InstancerCapsule was added. This change was a pre-requisite because for InstancerCapsule, we wanted the EngineData to contain all the information needed, rather than having InstancerCapsule need to evaluate multiple plugs.

This should make it easier to review, and hopefully helps clarify why the performance for the non-encapsulated case might have changed ( if it's improved for you, that's great ).

The remaining big concern is why we're not seeing the performance improvement scale properly with threads on your computer.

@johnhaddon
Copy link
Member

I've gone back and split the commit to pretend that the changes to remove __prototypeChildNames happened before InstancerCapsule was added.

That's great, thanks! When I first saw that huge commit I was wondering if it could be split like that to make it more digestible.

johnhaddon added a commit to johnhaddon/gaffer that referenced this pull request Oct 26, 2023
This knocks around 30% off the runtime of `ArnoldRenderTest.testInstancerEncapsulatePerf` in PR GafferHQ#5453, testing with 64 threads.
johnhaddon added a commit to johnhaddon/gaffer that referenced this pull request Oct 26, 2023
This knocks around 30% off the runtime of `ArnoldRenderTest.testInstancerEncapsulatePerf` in PR GafferHQ#5453, testing with 64 threads.
danieldresser-ie pushed a commit to danieldresser-ie/gaffer that referenced this pull request Oct 27, 2023
This knocks around 30% off the runtime of `ArnoldRenderTest.testInstancerEncapsulatePerf` in PR GafferHQ#5453, testing with 64 threads.
Copy link
Member

@johnhaddon johnhaddon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers Daniel - comments for remaining commits inline. Note that I haven't commented on your last commit as you indicated it was hacky work-in-progress.

src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
python/GafferSceneUI/InstancerUI.py Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
danieldresser-ie pushed a commit to danieldresser-ie/gaffer that referenced this pull request Nov 1, 2023
This knocks around 30% off the runtime of `ArnoldRenderTest.testInstancerEncapsulatePerf` in PR GafferHQ#5453, testing with 64 threads.
@danieldresser-ie danieldresser-ie force-pushed the instancerCapsule branch 2 times, most recently from 9860216 to 543f020 Compare November 1, 2023 05:40
@danieldresser-ie
Copy link
Contributor Author

OK, I think I've addressed all feedback - it is a bit of a mixture of separate commits and squashing in place ... some of the comments were about it not making sense how things were spread through the commits, so it wouldn't have helped much to add more fix commits to the picture ... and in the course of this rearranging, I've actually altered the history so that "omitDuplicateIds" comes before "Optimize encapsulated" ... I had to squash them together to address some feedback, and when splitting them, the other order made more sense.

If you want me to, I can go through and annotate which comment is fixed where, because this is a bit more confusing currently than sometimes.

The main commit that would probably be good to have you look at now is the last one, "Parallelize prototype creation when encapsulated". This requires some locking that is more complicated than I would have hoped, but I don't see a way to make it simpler without adding more contention. It is definitely useful, producing a 3X speedup on testInstancerManyPrototypesEncapsulatePerf.

The remaining thing I haven't fixed is all the bugs introduced by outputting objects directly instead of capsules.

@danieldresser-ie
Copy link
Contributor Author

I've fixed the most obvious issue with skipping internal capsules in favour of naked objects when the prototype has no children. I think there are still issues with light linking and such, but I'll talk to you tomorrow about what the requirements are here.

Copy link
Member

@johnhaddon johnhaddon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel! Hopefully this is the final round of comments - mostly trivialities, but hopefully I've spotted a couple of more worthwhile things as well.

As I mentioned offline yesterday, I'm seeing really very substantial IPR startup improvements with this in production scenes, so it has most definitely been worth the effort. Other than the copy() overhead I mentioned, the main remaining stuff in a profile was AiNode() and a bunch of our ArnoldRenderer internals - I think we can probably squeeze out some more improvements from that once this is merged.

I've also done a bit of interactive testing with side-by-side encapsulated and non-encapsulated copies of an instancer in IPR today, and so far everything has behaved perfectly (once I'd made the fix in #5532). I'm going to keep prodding at that a bit more, but it's looking like we're there now.

Thanks!
John

@@ -2528,82 +2529,112 @@ void Instancer::PrototypeScope::setPrototype( const EngineData *engine, const Sc
namespace
{

struct Prototype
// It shouldn't be necessary for this to be refcounted - but LRUCache is set up to make it impossible
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up against something similar to this the other day, and although I decided not to dice with it, I think it might actually be possible to use Prototype * as the value type, and then use a RemovalCallback to do delete value. Let's not worry about it until after we get this merged, but it might be worth playing with to see if it offers any benefit afterwards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ( thought I ) had some spare time today, so I took a swing at this. It resulted in a 20% speedup in testInstancerManyPrototypesEncapsulatePerf ... this surprised me a bit - I had expected it to be more significant in testInstancerFewPrototypesEncapsulatePerf, since I was expecting the overhead in my code to play a greater part when the LRUCache just has one thing in it, and isn't working that hard. These results suggest that there's some overhead inside the LRUCache ... are we doing reference counting when the hash map resizes? I would have hoped that modern C++ could prevent that with an std::move? Anyway, that might merit some investigation - but I think this faster code feels a bit cleaner to me anyway? Happy to omit this commit if it raises some uncertainty for you though that would prevent merging this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this explains what you're seeing, but I'd expect the LRUCache to perform better when queries are spread across a number of different keys. Internally it uses a binned map and mutex on each entry, so it sees much less contention if different threads are asking for different things than if they're all fighting over the one thing. Perhaps that fighting-over-one-thing drowns out any potential benefit from removing the ref-counting?

src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
python/GafferSceneTest/InstancerTest.py Outdated Show resolved Hide resolved
python/GafferSceneTest/InstancerTest.py Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
Comment on lines +173 to +175
boost::python::objects::copy_class_object(
type_id<Capsule>(), Instancer::instancerCapsuleTypeInfo()
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Selection Mask" menu in the viewer is broken now, always showing the highlighted "non default" icon, and refusing to tick the Capsules item. I suspect it's something to do with this, but haven't verified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll leave it to you to sort this one out, since you know the _SelectionMaskPlugValueWidget.

The cause of the break is pretty understandable. _SelectionMaskPlugValueWidget needs to find the types inheriting from a base class which are actually going to be rendered ... and it approximates this with _leafTypes, which returns any derived types which do not themselves have derived types. From a theoretical perspective, I think this is simply wrong ... just because something has a derived type doesn't mean it can't also be an object that gets rendered.

_SelectionMaskPlugValueWidget then compares sets of leaf types, but in the case of the hardcoded menu options, it compares leaf types to the hardcoded type ( rather than converting the hardcoded type to leaf types before comparing ). If this was fixed, then selecting the menu option would work ... but it would fail to actually mask Capsules that aren't InstancerCapsules.

So the trigger isn't the python binding stuff, but just the IE_CORE_DECLAREEXTENSIONOBJECT( InstancerCapsule, GafferScene::InstancerCapsuleTypeId, GafferScene::Capsule ); which declares something deriving from Capsule ... I don't think there's anything wrong with this though.

What I think a fix would look like is:

  • rather than _leafTypes, it should use something like _descendantTypes, which includes a type itself and all descendants whether or not they are leaves. This would include some types which actually abstract rather than concrete renderable things ... I wouldn't think that would actually screw it up, but I might be missing something
  • rather than checked = currentTypes.issuperset( types ) it feels like it might be more consistent to do checked = currentTypes.issuperset( _descendantTypes( types ) ) so that both sides of the comparison have been processed through _descendantTypes ... though I guess switching from _leafTypes to _descendantTypes means this isn't strictly necessary

Anyway, I can do the work if this sounds like the right approach, but I figured I check in before trying to change the meanings of some stuff I don't fully understand.

@danieldresser-ie danieldresser-ie force-pushed the instancerCapsule branch 2 times, most recently from 4b1ca2c to 59ea28b Compare November 9, 2023 08:05
@danieldresser-ie
Copy link
Contributor Author

4 more fixup commits for you to take a look at. Should be just about there!

Copy link
Member

@johnhaddon johnhaddon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Daniel. Comments inline - I think we have a memory leak now. We can chat about this in the meeting today, but this has gone on long enough and we're close enough that I'm feeling inclined to drop "Use raw pointer" and "Switch to renderInstance" in favour of getting something out without further iteration and testing. Then we can follow up with additional improvements once we've actually got some results in people's hands.

src/GafferScene/Instancer.cpp Outdated Show resolved Hide resolved
src/GafferScene/Instancer.cpp Show resolved Hide resolved
@@ -2736,7 +2731,15 @@ void Instancer::InstancerCapsule::render( IECoreScenePreview::Renderer *renderer
!hasAttributes
);
},
limits<size_t>::max() // Never evict, even if prototypes are all unique

std::numeric_limits<size_t>::max(), // Never evict, even if prototypes are all unique
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we should have used numeric_limits in the original commit as well? What is limits actually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not even sure what limits was finding, other than that it must have been a large integer, since the tests passed. I only fixed here because it was a last minute fix to get CI going, and I was planning to squash anyway. I've now fixed it in the correct place.

@@ -2612,6 +2612,76 @@ struct Prototype
}
}

// This internal interface for rendering an instance from a prototype is mostly pretty clean,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like small_vector might let us clean this up a bit? I suppose the other option would be thread_local for our scratch space?

template< typename F >
inline void renderInstance(
const std::string &name, const std::vector<float> &transformSampleTimes,
std::vector<M44f> &instanceTransforms, F &&attributeProcessor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it doesn't matter all that much since this is all internal, but this attributeProcessor feels a bit awkward. I wonder if we could use the same "scratch space" idea from instanceTransforms so that we pass CompoundObject &instanceAttributes and then just insert into it from m_attributes? At least then we only have one idiom for avoiding allocations? I don't think it will be as beneficial as it could be in this case as CompoundObject is using a map internally, but I'd assume that if it was using unordered_map we might avoid allocating at all after the first go round?

@danieldresser-ie
Copy link
Contributor Author

OK, I've dropped the two troublesome commits, and added a hack for the selection mask thing. Hopefully that gets us to a reasonable place for merging.

@johnhaddon
Copy link
Member

Hopefully that gets us to a reasonable place for merging.

Thanks! I've tweaked the selection mask workaround to avoid adding the instancerCapsuleTypeId() public method, and just used typeIdFromTypeName() to get it instead. The less API surface area the better, especially for stuff we'd want to remove...

Merging!

@johnhaddon johnhaddon merged commit 3db4256 into GafferHQ:1.3_maintenance Nov 10, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants