-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ValuePlug::getValue()
performance
#5362
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are some very nice wins.
I should follow the example of d608cb0 and inline the getValue() methods for the other plug types, but the question there is "where to put the inlined methods?". We already have .inl files for things like TypedPlug, but they're only intended for inclusion from select .cpp files. I think they should probably be renamed with some other suffix, but what?
The wording here was initially ambiguous to me, but looking at it: the standard convention is that *.inl is included in the corresponding .h ( about 40 of 50 current examples ). So, the outliers are .inl that aren't intended to be included in the .h, and they're what needs renaming? I guess the closest to our other conventions would be to call it Private/TypedPlug.inl, but maybe it would be confusing to have both TypedPlug.inl and Private/TypedPlug.inl?
); | ||
IECore::msg( IECore::Msg::Error, "ValuePlug::getObjectValue", error ); | ||
throw IECore::Exception( error ); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been quite a while, and I don't clearly remember all this stuff - but it sounds like this warning was put here quite explicitly to help debug something nasty ... removing it doesn't seem related to this PR, but maybe you're aware of some reason why this logic no longer applies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I traced this back to the original PR, but it still wasn't completely clear to me why this was done in the first place. One thing was clear - there was a segfault in the unexpected value == nullptr
case that was being fixed in addition to the message being added. I've kept that using the ternary in the format call. The other thing that was clear in the original PR was that the cause of the value == nullptr
has been fixed elsewhere and has its own test coverage (at the TaskMutex level). So I don't think this message is any more likely to be useful than a belt-and-braces message anywhere else we throw an exception, and since we don't do that as a matter of course, I don't really see a reason to keep it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I don't feel too strongly about it.
Yes. The convention is that |
…ionScopes This means calling `ValuePlug::dirty()` before the scope closes, so that the hash cache is invalidated before the compute is performed. This was first requested way back in 2017 in issue GafferHQ#1971, where the use case was edit/compute loops in the PythonEditor. It has become a high priority more recently though, as it turned out we were actually interleaving edits and computes in the TransformTools, and noticed incorrect results while working on the LightTool in GafferHQ#5218. The problem was as follows, and is demonstrated in `testMultipleSelectionWithEditScope` : 1. We push a dirty propagation scope using UndoScope. 2. We create a transform edit for `/cube1`, which dirties some plugs, incrementing their dirty count. 3. We create a transform edit for `/cube`, but as part of doing that we need to compute the _current_ transform. This "bakes" that transform into the (hash) cache, keyed by the current dirty count. 4. We propagate dirtiness for this new transform edit, but don't increment the dirty count again, because we've already visited those plugs in step 2. 5. After closing the scope, we query the transform for `/cube` and end up with the cached value from step 3. This is incorrect, because it isn't accounting for the edits made subsequently. The problem is demonstrated even more simply by `ComputeNodeTest.testInterleavedEditsAndComputes`. There is an overhead to calling `flushDirtyPropagationScope()` for each `getValue()` call, and it is measurable in `ValuePlugTest.testCacheOverhead()`. But taking into account the performance improvements made in GafferHQ#5362, we still come out with an %18 reduction in runtime even with the overhead. Fixes GafferHQ#1971
…ionScopes This means calling `ValuePlug::dirty()` before the scope closes, so that the hash cache is invalidated before the compute is performed. This was first requested way back in 2017 in issue GafferHQ#1971, where the use case was edit/compute loops in the PythonEditor. It has become a high priority more recently though, as it turned out we were actually interleaving edits and computes in the TransformTools, and noticed incorrect results while working on the LightTool in GafferHQ#5218. The problem was as follows, and is demonstrated in `testMultipleSelectionWithEditScope` : 1. We push a dirty propagation scope using UndoScope. 2. We create a transform edit for `/cube1`, which dirties some plugs, incrementing their dirty count. 3. We create a transform edit for `/cube`, but as part of doing that we need to compute the _current_ transform. This "bakes" that transform into the (hash) cache, keyed by the current dirty count. 4. We propagate dirtiness for this new transform edit, but don't increment the dirty count again, because we've already visited those plugs in step 2. 5. After closing the scope, we query the transform for `/cube` and end up with the cached value from step 3. This is incorrect, because it isn't accounting for the edits made subsequently. The problem is demonstrated even more simply by `ComputeNodeTest.testInterleavedEditsAndComputes`. There is an overhead to calling `flushDirtyPropagationScope()` for each `getValue()` call, and it is measurable in `ValuePlugTest.testCacheOverhead()`. But taking into account the performance improvements made in GafferHQ#5362, we still come out with an %18 reduction in runtime even with the overhead. Fixes GafferHQ#1971
I think there's another reference counting win that might be a little bit more faffy, but could be worthwhile. In |
Since it's a fairly special case, I guess another option would be rather than using a new file extension, to just call it TypedPlugImplementation.h But I don't mind ".impl" if that's what you want. And yeah, skipping reference counting for static values does sound interesting. |
52528a9
to
c978f7f
Compare
c978f7f
to
6c5e585
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, aside from two typos.
SConstruct
Outdated
@@ -1090,10 +1090,10 @@ libraries = { | |||
|
|||
"GafferSceneUI" : { | |||
"envAppends" : { | |||
"LIBS" : [ "Gaffer", "GafferUI", "GafferImage", "GafferImageUI", "GafferScene", "Iex$IMATH_LIB_SUFFIX", "IECoreGL$CORTEX_LIB_SUFFIX", "IECoreImage$CORTEX_LIB_SUFFIX", "IECoreScene$CORTEX_LIB_SUFFIX" ], | |||
"LIBS" : [ "Gaffer", "GafferUI", "GafferImage", "GafferImageUI", "GafferScene", "GafferImage", "Iex$IMATH_LIB_SUFFIX", "IECoreGL$CORTEX_LIB_SUFFIX", "IECoreImage$CORTEX_LIB_SUFFIX", "IECoreScene$CORTEX_LIB_SUFFIX" ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"GafferImage" was already listed here - looks like it's here twice now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 0945ec9.
template<> | ||
IECore::MurmurHash GafferImage::AtomicFormatPlug::hash() const; | ||
|
||
} // namespace Gaffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github is now listing a missing newline at EOF here - I don't think we should be doing that, though the whitespace checker isn't complaining about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also fixed in 0945ec9.
6c5e585
to
c7e4e53
Compare
This produces the following reductions in runtime : - ValuePlugTest.testCacheOverhead : 7% - ValuePlugTest.testContentionForOneItem : 25%
The common case is that a plug has no input, so it's worth avoiding the overhead of calling `typeId()`. This knocks ~2% off the runtime of `ValuePlugTest.testCacheOverhead`.
This produces the following reductions in runtime : - ValuePlugTest.testCacheOverhead : 5% - ValuePlugTest.testContentionForOneItem : 13%
This produces the following reductions in runtime : - ValuePlugTest.testCacheOverhead : 7% - ValuePlugTest.testContentionForOneItem : 20%
This produces the following reductions in runtime : - ValuePlugTest.testCacheOverhead : 5% - ValuePlugTest.testContentionForOneItem : 1%
We use `.inl` files to store inline implementation that we want the compiler to see, but which we don't want to clutter up the header with, so the headers are more human-readable. These files don't match that pattern - they're to be included in a single `.cpp` file and used to explicitly instantiate the template.
This also means declaring AtomicFormatPlug's specialisations publicly so that we don't get the publicly visible default `getValue()` implementation being compiled instead. And the part I don't understand : we need to link `libGafferImage` to far more libraries now, to avoid undefined symbols related to AtomicFormatPlug with MSVC. It seems that the mere existence of the inlined `getValue()` - even though it is invalidated by the specialization - is enough to make MSVC greedily do a bunch of implicit template instantiations that it wasn't doing before, and which ultimately depend on `GafferImage::Format`.
c7e4e53
to
0945ec9
Compare
…ionScopes This means calling `ValuePlug::dirty()` before the scope closes, so that the hash cache is invalidated before the compute is performed. This was first requested way back in 2017 in issue GafferHQ#1971, where the use case was edit/compute loops in the PythonEditor. It has become a high priority more recently though, as it turned out we were actually interleaving edits and computes in the TransformTools, and noticed incorrect results while working on the LightTool in GafferHQ#5218. The problem was as follows, and is demonstrated in `testMultipleSelectionWithEditScope` : 1. We push a dirty propagation scope using UndoScope. 2. We create a transform edit for `/cube1`, which dirties some plugs, incrementing their dirty count. 3. We create a transform edit for `/cube`, but as part of doing that we need to compute the _current_ transform. This "bakes" that transform into the (hash) cache, keyed by the current dirty count. 4. We propagate dirtiness for this new transform edit, but don't increment the dirty count again, because we've already visited those plugs in step 2. 5. After closing the scope, we query the transform for `/cube` and end up with the cached value from step 3. This is incorrect, because it isn't accounting for the edits made subsequently. The problem is demonstrated even more simply by `ComputeNodeTest.testInterleavedEditsAndComputes`. There is an overhead to calling `flushDirtyPropagationScope()` for each `getValue()` call, and it is measurable in `ValuePlugTest.testCacheOverhead()`. But taking into account the performance improvements made in GafferHQ#5362, we still come out with an %18 reduction in runtime even with the overhead. Fixes GafferHQ#1971
I'll soon be opening a PR to fix #1971, but it comes at a small runtime cost that shows up as a few percent overhead in
ValuePlugTest.testCacheOverhead
. This PR provides some substantial performance improvements targeted at that test so that overall we will still come out well on top. I'm seeing the following runtime reductions :I did also investigate reducing the number of thread-local lookups by smooshing a bunch of disparate thread locals into one, as we had discussed. While it had a small benefit performance-wise it was a mess from a maintenance perspective, and didn't produce anything near the improvements from this PR. I don't intend to revisit that until we're scraping the bottom of the barrel.
I should follow the example of d608cb0 and inline the
getValue()
methods for the other plug types, but the question there is "where to put the inlined methods?". We already have.inl
files for things like TypedPlug, but they're only intended for inclusion from select.cpp
files. I think they should probably be renamed with some other suffix, but what?