Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Switch : Fix context used when managing internal connection
This came to light due to a CI failure on `main` : ``` Error: FAIL: testInvalidColumns (GafferImageUITest.CatalogueUITest.CatalogueUITest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/__w/gaffer/gaffer/build/python/GafferImageUITest/CatalogueUITest.py", line 58, in tearDown GafferUITest.TestCase.tearDown( self ) File "/__w/gaffer/gaffer/build/python/GafferUITest/TestCase.py", line 88, in tearDown GafferTest.TestCase.tearDown( self ) File "/__w/gaffer/gaffer/build/python/GafferTest/TestCase.py", line 119, in tearDown self.fail( f"Unexpected message : {message}" ) AssertionError: Unexpected message : ERROR : Emitting signal : Unknown error ``` Investigation showed that the "Unknown error" was in fact an `IECore::Cancelled` exception, and that the signal in question was `plugInputChangedSignal()` on a Switch node inside a Text node inside an InternalImage node inside a Catalogue node. The Catalogue node was being destroyed, and as the Text node was torn down, its internal connections were modified and `Switch::inputChanged()` was called on the internal Switch node. This was all happening in a background thread whose identity I havent ascertained, and where the current context was cancelled. The destruction was actually being triggered from `IECorePython::WrapperGarbageCollector::collect()`, which is why I believe the problem only occurred on `main` - collection needs to occur at precisely the right time for the problem to show, and is highly sensitive to how many wrappers have been created previously. So what was the problem exactly (other than that garbage collection is a horribly unpredictable compromise we'd rather avoid)? In `Switch::inputChanged()` we make a shortcutting internal connection across the Switch if we can determine that the index is not context-sensitive, which we check using `PlugAlgo::dependsOnCompute()`. When the index isn't context sensitive, we call `indexPlug()->getValue()` to get its current value and then make the appropriate connection. But in this case, the index plug (which is an IntPlug) is driven by a BoolPlug, which while still not context-sensitive, _will_ require type conversion work in `getValue()`. That work is done by ValuePlug inside a Process, and the Process checks for cancellation as a matter of course. It's a bit hard to know exactly where to point the finger of blame here. It's arguable that we don't need to wrap the type conversion in a Process. But we currently do because we think it's possible that some conversions might be expensive and therefore warrant monitoring (and caching). It's also arguable that `dependsOnCompute()` should return `True` when a type conversion is involved, and that what we really should mean by "Compute" is "something involving a Process". But that's hard to explain, and would defeat the Switch's internal optimisation in cases like this where it is completely valid. Instead I've blamed the poor old `Switch` node, and fixed it by scoping a default context in `updateInternalConnection()`. This is much more limited in scope than other potential fixes, and therefore much less likely to have unintended side effects. But I think the _real_ culprit is the garbage collection. It's occuring at times completely out of our control, disposing of nodes on threads we don't want them disposed on, and emitting signals on threads we don't want them emitted on. But fixing that is far far _far_ harder.
- Loading branch information