Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ValuePlug::getValue() performance #5362

Merged
merged 8 commits into from
Jul 11, 2023

Conversation

johnhaddon
Copy link
Member

I'll soon be opening a PR to fix #1971, but it comes at a small runtime cost that shows up as a few percent overhead in ValuePlugTest.testCacheOverhead. This PR provides some substantial performance improvements targeted at that test so that overall we will still come out well on top. I'm seeing the following runtime reductions :

  • ValuePlugTest.testCacheOverhead : ~23%
  • ValuePlugTest.testContentionForOneItem : ~50%

I did also investigate reducing the number of thread-local lookups by smooshing a bunch of disparate thread locals into one, as we had discussed. While it had a small benefit performance-wise it was a mess from a maintenance perspective, and didn't produce anything near the improvements from this PR. I don't intend to revisit that until we're scraping the bottom of the barrel.

I should follow the example of d608cb0 and inline the getValue() methods for the other plug types, but the question there is "where to put the inlined methods?". We already have .inl files for things like TypedPlug, but they're only intended for inclusion from select .cpp files. I think they should probably be renamed with some other suffix, but what?

@johnhaddon johnhaddon self-assigned this Jun 27, 2023
Copy link
Contributor

@danieldresser-ie danieldresser-ie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are some very nice wins.

I should follow the example of d608cb0 and inline the getValue() methods for the other plug types, but the question there is "where to put the inlined methods?". We already have .inl files for things like TypedPlug, but they're only intended for inclusion from select .cpp files. I think they should probably be renamed with some other suffix, but what?

The wording here was initially ambiguous to me, but looking at it: the standard convention is that *.inl is included in the corresponding .h ( about 40 of 50 current examples ). So, the outliers are .inl that aren't intended to be included in the .h, and they're what needs renaming? I guess the closest to our other conventions would be to call it Private/TypedPlug.inl, but maybe it would be confusing to have both TypedPlug.inl and Private/TypedPlug.inl?

);
IECore::msg( IECore::Msg::Error, "ValuePlug::getObjectValue", error );
throw IECore::Exception( error );
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been quite a while, and I don't clearly remember all this stuff - but it sounds like this warning was put here quite explicitly to help debug something nasty ... removing it doesn't seem related to this PR, but maybe you're aware of some reason why this logic no longer applies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I traced this back to the original PR, but it still wasn't completely clear to me why this was done in the first place. One thing was clear - there was a segfault in the unexpected value == nullptr case that was being fixed in addition to the message being added. I've kept that using the ternary in the format call. The other thing that was clear in the original PR was that the cause of the value == nullptr has been fixed elsewhere and has its own test coverage (at the TaskMutex level). So I don't think this message is any more likely to be useful than a belt-and-braces message anywhere else we throw an exception, and since we don't do that as a matter of course, I don't really see a reason to keep it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I don't feel too strongly about it.

@johnhaddon
Copy link
Member Author

So, the outliers are .inl that aren't intended to be included in the .h, and they're what needs renaming? I guess the closest to our other conventions would be to call it Private/TypedPlug.inl, but maybe it would be confusing to have both TypedPlug.inl and Private/TypedPlug.inl?

Yes. The convention is that .inl contains stuff that we want to be inlineable, but don't want to clutter up the .h with. But the TypedPlug.inl isn't used like that - it's just included in a single compilation unit (TypedPlug.cpp etc) where we instantiate the template explicitly. It's not really private, because the reason it's in a header rather than just the .cpp is so that other modules can instantiate the template for their own types (for instance, GafferImage::FormatPlug). So I think maybe a different extension might make sense - I'm not sure there's any established convention for this though - we could use .impl maybe?

johnhaddon added a commit to johnhaddon/gaffer that referenced this pull request Jun 28, 2023
…ionScopes

This means calling `ValuePlug::dirty()` before the scope closes, so that the hash cache is invalidated before the compute is performed.

This was first requested way back in 2017 in issue GafferHQ#1971, where the use case was edit/compute loops in the PythonEditor. It has become a high priority more recently though, as it turned out we were actually interleaving edits and computes in the TransformTools, and noticed incorrect results while working on the LightTool in GafferHQ#5218. The problem was as follows, and is demonstrated in `testMultipleSelectionWithEditScope` :

1. We push a dirty propagation scope using UndoScope.
2. We create a transform edit for `/cube1`, which dirties some plugs, incrementing their dirty count.
3. We create a transform edit for `/cube`, but as part of doing that we need to compute the _current_ transform. This "bakes" that transform into the (hash) cache, keyed by the current dirty count.
4. We propagate dirtiness for this new transform edit, but don't increment the dirty count again, because we've already visited those plugs in step 2.
5. After closing the scope, we query the transform for `/cube` and end up with the cached value from step 3. This is incorrect, because it isn't accounting for the edits made subsequently.

The problem is demonstrated even more simply by `ComputeNodeTest.testInterleavedEditsAndComputes`.

There is an overhead to calling `flushDirtyPropagationScope()` for each `getValue()` call, and it is measurable in `ValuePlugTest.testCacheOverhead()`. But taking into account the performance improvements made in GafferHQ#5362, we still come out with an %18 reduction in runtime even with the overhead.

Fixes GafferHQ#1971
johnhaddon added a commit to johnhaddon/gaffer that referenced this pull request Jun 28, 2023
…ionScopes

This means calling `ValuePlug::dirty()` before the scope closes, so that the hash cache is invalidated before the compute is performed.

This was first requested way back in 2017 in issue GafferHQ#1971, where the use case was edit/compute loops in the PythonEditor. It has become a high priority more recently though, as it turned out we were actually interleaving edits and computes in the TransformTools, and noticed incorrect results while working on the LightTool in GafferHQ#5218. The problem was as follows, and is demonstrated in `testMultipleSelectionWithEditScope` :

1. We push a dirty propagation scope using UndoScope.
2. We create a transform edit for `/cube1`, which dirties some plugs, incrementing their dirty count.
3. We create a transform edit for `/cube`, but as part of doing that we need to compute the _current_ transform. This "bakes" that transform into the (hash) cache, keyed by the current dirty count.
4. We propagate dirtiness for this new transform edit, but don't increment the dirty count again, because we've already visited those plugs in step 2.
5. After closing the scope, we query the transform for `/cube` and end up with the cached value from step 3. This is incorrect, because it isn't accounting for the edits made subsequently.

The problem is demonstrated even more simply by `ComputeNodeTest.testInterleavedEditsAndComputes`.

There is an overhead to calling `flushDirtyPropagationScope()` for each `getValue()` call, and it is measurable in `ValuePlugTest.testCacheOverhead()`. But taking into account the performance improvements made in GafferHQ#5362, we still come out with an %18 reduction in runtime even with the overhead.

Fixes GafferHQ#1971
@johnhaddon
Copy link
Member Author

Those are some very nice wins.

I think there's another reference counting win that might be a little bit more faffy, but could be worthwhile. In getValue() and getObjectValue() we need to own a reference to values that come out of the compute cache, because they could be evicted at any moment. But static values that come from ValuePlug::m_value are guaranteed to stay alive as long as we need them to, so we could completely avoid incrementing their reference count. We'd need getObjectValue() to return some sort of const T *, T::ConstPtr pair, with the second item possibly being null, but I think the wins are probably worth it. I'm thinking cases where lots of threads are computing the hash on a node that largely has static input values (shader networks especially). Might be worth a look at some point?

@danieldresser-ie
Copy link
Contributor

Since it's a fairly special case, I guess another option would be rather than using a new file extension, to just call it TypedPlugImplementation.h

But I don't mind ".impl" if that's what you want.

And yeah, skipping reference counting for static values does sound interesting.

@johnhaddon
Copy link
Member Author

johnhaddon commented Jul 3, 2023

I guess another option would be rather than using a new file extension, to just call it TypedPlugImplementation.h

That's much better - that's what I've gone with in 83f78d8, allowing me to inline getValue() in 6c5e585.

Copy link
Contributor

@danieldresser-ie danieldresser-ie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, aside from two typos.

SConstruct Outdated
@@ -1090,10 +1090,10 @@ libraries = {

"GafferSceneUI" : {
"envAppends" : {
"LIBS" : [ "Gaffer", "GafferUI", "GafferImage", "GafferImageUI", "GafferScene", "Iex$IMATH_LIB_SUFFIX", "IECoreGL$CORTEX_LIB_SUFFIX", "IECoreImage$CORTEX_LIB_SUFFIX", "IECoreScene$CORTEX_LIB_SUFFIX" ],
"LIBS" : [ "Gaffer", "GafferUI", "GafferImage", "GafferImageUI", "GafferScene", "GafferImage", "Iex$IMATH_LIB_SUFFIX", "IECoreGL$CORTEX_LIB_SUFFIX", "IECoreImage$CORTEX_LIB_SUFFIX", "IECoreScene$CORTEX_LIB_SUFFIX" ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"GafferImage" was already listed here - looks like it's here twice now?

Copy link
Member Author

@johnhaddon johnhaddon Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0945ec9.

template<>
IECore::MurmurHash GafferImage::AtomicFormatPlug::hash() const;

} // namespace Gaffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github is now listing a missing newline at EOF here - I don't think we should be doing that, though the whitespace checker isn't complaining about it.

Copy link
Member Author

@johnhaddon johnhaddon Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fixed in 0945ec9.

This produces the following reductions in runtime :

- ValuePlugTest.testCacheOverhead : 7%
- ValuePlugTest.testContentionForOneItem : 25%
The common case is that a plug has no input, so it's worth avoiding the overhead of calling `typeId()`. This knocks ~2% off the runtime of `ValuePlugTest.testCacheOverhead`.
This produces the following reductions in runtime :

- ValuePlugTest.testCacheOverhead : 5%
- ValuePlugTest.testContentionForOneItem : 13%
This produces the following reductions in runtime :

- ValuePlugTest.testCacheOverhead : 7%
- ValuePlugTest.testContentionForOneItem : 20%
This produces the following reductions in runtime :

- ValuePlugTest.testCacheOverhead : 5%
- ValuePlugTest.testContentionForOneItem : 1%
We use `.inl` files to store inline implementation that we want the compiler to see, but which we don't want to clutter up the header with, so the headers are more human-readable. These files don't match that pattern - they're to be included in a single `.cpp` file and used to explicitly instantiate the template.
This also means declaring AtomicFormatPlug's specialisations publicly so that we don't get the publicly visible default `getValue()` implementation being compiled instead. And the part I don't understand : we need to link `libGafferImage` to far more libraries now, to avoid undefined symbols related to AtomicFormatPlug with MSVC. It seems that the mere existence of the inlined `getValue()` - even though it is invalidated by the specialization - is enough to make MSVC greedily do a bunch of implicit template instantiations that it wasn't doing before, and which ultimately depend on `GafferImage::Format`.
@johnhaddon johnhaddon merged commit 8e909c8 into GafferHQ:main Jul 11, 2023
4 checks passed
@johnhaddon johnhaddon deleted the getValueOptimisationPR branch July 11, 2023 09:56
johnhaddon added a commit to johnhaddon/gaffer that referenced this pull request Aug 15, 2023
…ionScopes

This means calling `ValuePlug::dirty()` before the scope closes, so that the hash cache is invalidated before the compute is performed.

This was first requested way back in 2017 in issue GafferHQ#1971, where the use case was edit/compute loops in the PythonEditor. It has become a high priority more recently though, as it turned out we were actually interleaving edits and computes in the TransformTools, and noticed incorrect results while working on the LightTool in GafferHQ#5218. The problem was as follows, and is demonstrated in `testMultipleSelectionWithEditScope` :

1. We push a dirty propagation scope using UndoScope.
2. We create a transform edit for `/cube1`, which dirties some plugs, incrementing their dirty count.
3. We create a transform edit for `/cube`, but as part of doing that we need to compute the _current_ transform. This "bakes" that transform into the (hash) cache, keyed by the current dirty count.
4. We propagate dirtiness for this new transform edit, but don't increment the dirty count again, because we've already visited those plugs in step 2.
5. After closing the scope, we query the transform for `/cube` and end up with the cached value from step 3. This is incorrect, because it isn't accounting for the edits made subsequently.

The problem is demonstrated even more simply by `ComputeNodeTest.testInterleavedEditsAndComputes`.

There is an overhead to calling `flushDirtyPropagationScope()` for each `getValue()` call, and it is measurable in `ValuePlugTest.testCacheOverhead()`. But taking into account the performance improvements made in GafferHQ#5362, we still come out with an %18 reduction in runtime even with the overhead.

Fixes GafferHQ#1971
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Can't interleave multiple edits/computes within an UndoContext
2 participants