Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cycles Renderer : Defer creation of session until last point possible #5711

Merged
merged 20 commits into from
Mar 8, 2024

Conversation

johnhaddon
Copy link
Member

This is an alternate take on #5101, whereby we never recreate the Cycles session, even if rendering hasn't started yet. Like #5101, this fixes hangs on Windows and problems rendering background lights in batch mode. But unlike #5101, it guarantees that we can't accidentally create Cycles objects containing pointers to the wrong scene or session - which as Alex notes can at the very least cause assertion failures in debug builds (and which I suspect could also cause crashes in production builds).

My goal was to guarantee that we never ever recreated a session, while also not requiring any changes to the Renderer API or clients. I didn't quite succeed in the latter - see a47586d and 6c3bc4a, but it's close enough to be practical I think. I also noted an approach that might improve things in the future in e9d1f52.

Although things aren't quite as simple as I'd have liked, in conjunction with #5696 this does leave us with ~300 lines less in Renderer.cpp and with ~500 lines more in RendererTest.py, which I think demonstrates that we're going in a reasonable direction overall.

@murraystevenson
Copy link
Contributor

Thanks! I'm still working my way through this but from some initial interactive testing I've noticed that background lights aren't updating as expected. Toggling the enabled state of a background_light node while IPR is running doesn't update the render, likewise adjusting background light parameters such as color or exposure doesn't affect the render until you dirty the background by toggling another cycles:background option on a CyclesOptions node.

A background_shader assigned via a CyclesBackground node is also not working here, with a CyclesRenderer::option : Unknown option "cycles:background:shader". warning produced.

@johnhaddon
Copy link
Member Author

I've noticed that background lights aren't updating as expected.

Sorry, I should have caught that myself. I've pushed cef19fb and 09c46fe to take care of background lights and background shaders respectively, including some further test coverage.

@murraystevenson
Copy link
Contributor

Thanks for the fixes! Aside from the Optix shader update error you mentioned offline, this is looking pretty good to me to go for wider testing in a 1.4 beta.

It's been a challenge while testing to separate regressions from existing Cycles issues, but it seems like everything I've spotted has turned out to be an existing issue. I'll include my notes here for the record, but we should split things out into separate issues.

  • The background light can be randomly lost while performing interactive edits to shaders on other objects in SVM mode, but not in OSL mode. This isn't a regression from this PR.

  • Edits to background ray visibility options now work! Edits to ray visibility parameters on the background light itself aren't handled correctly, but that isn't a regression from this PR.

  • Motion blur appears to be working again in this PR, but not perfectly. Motion blurred objects can randomly lose their shader during interactive edits.

  • We're not handling the sampling_pattern option correctly. It should be set as an enum not a string, and our presets no longer match current Cycles, but that isn't a regression from this PR.

This PR also appears to fix a session shutdown crash that can occur when stopping in-progress Optix/CUDA renders that I've noticed in 1.4.0.0b2.

#0  0x00007fa6857d29a8 in pthread_sigmask@GLIBC_2.2.5 () from /lib64/libc.so.6
#1  0x00007fa619980e1e in ?? () from /opt/arnold-7.2.5.1/bin/libai.so
#2  0x00007fa619980bf1 in ?? () from /opt/arnold-7.2.5.1/bin/libai.so
#3  <signal handler called>
#4  0x00007fa61503b044 in ?? () from /lib64/libcuda.so
#5  0x00007fa614fc5370 in ?? () from /lib64/libcuda.so
#6  0x00007fa614fc6a6b in ?? () from /lib64/libcuda.so
#7  0x00007fa614fc6b80 in ?? () from /lib64/libcuda.so
#8  0x00007fa615080f33 in ?? () from /lib64/libcuda.so
#9  0x00007fa5e6c4c497 in ccl::CUDADevice::~CUDADevice() () from /home/murray/Downloads/gaffer-1.4.0.0b2-linux-gcc11/lib/libGafferCycles.so
#10 0x00007fa5e6c5930f in ccl::OptiXDevice::~OptiXDevice() () from /home/murray/Downloads/gaffer-1.4.0.0b2-linux-gcc11/lib/libGafferCycles.so
#11 0x00007fa5e67f7ab9 in ccl::Session::~Session() () from /home/murray/Downloads/gaffer-1.4.0.0b2-linux-gcc11/lib/libGafferCycles.so
#12 0x00007fa5e67726a8 in (anonymous namespace)::CyclesRenderer::~CyclesRenderer() ()
   from /home/murray/Downloads/gaffer-1.4.0.0b2-linux-gcc11/lib/libGafferCycles.so
#13 0x00007fa5e6772db9 in (anonymous namespace)::CyclesRenderer::~CyclesRenderer() ()
   from /home/murray/Downloads/gaffer-1.4.0.0b2-linux-gcc11/lib/libGafferCycles.so
#14 0x00007fa650884956 in GafferScene::InteractiveRender::stop() () from /home/murray/Downloads/gaffer-1.4.0.0b2-linux-gcc11/lib/libGafferScene.so

This helps the CyclesRenderer out, since it can't handle edits to session options after creating the session, and it needs the session to create attributes.
@johnhaddon
Copy link
Member Author

johnhaddon commented Mar 7, 2024

Aside from the Optix shader update error you mentioned offline, this is looking pretty good to me

I think I've finally got to the bottom of that. Bisecting placed the blame onhttps://github.com//pull/5711/commits/5573c5e55607fbce429ac5f364cfd71a1122817d, and for a long time I couldn't figure out how that was responsible. But after looking for all the ways I could have broken something, it actually turned out that I'd accidentally fixed another thing 😄. The original code was attempting to default sceneParams.hair_shape to CURVE_THICK, and I had diligently copied that default along through all my refactoring without questioning it. But the original code had actually failed to set the initial value to the default it used elsewhere, so we were actually getting CURVE_RIBBON. One of the effects of 8b9f0f324490869fda6414eadae0ff0191244aff is to unify everything so there is only one source for the default and initial values, and voila, the bug was fixed.

Except that Optix doesn't like rebuilding the BVH for the stoats whiskers with CURVE_THICK, hence the fixed default led to the errors we were observing. This can be demonstrated completely independently of this PR by using a CyclesOptions node to explicity choose CURVE_THICK in 1.4.0.0b1. So I think it's pretty conclusive that this is a pre-existing problem, and likely not a Gaffer one either.

CURVE_THICK seems like the wrong default anyway, and doesn't match the default in Cycles or Blender. So in f1a4672 I've changed the defaults in Renderer.cpp and CyclesOptions.cpp to be CURVE_RIBBONS. I think that's the right choice regardless of the bug, and it also makes it less likely that someone will hit the bug, and more likely that if they do they will realise the cause, since they need to explicitly choose the buggy mode.

I've also squashed in the previous fixups and fixed the merge conflicts, so this should be good to go.

This is a concession to help the Cycles backend accomodate the needs of RenderController and SceneGadget, which both find it easier to create a default camera before outputting all the options. Although it is pleasing to see that it also simplifies the vast majority of calls to `camera()` in our unit tests.
As mentioned in the comment, this was once a 100% reliable repro for a crash, and I don't know why it no longer is. Hence I want to keep it around as insurance.
Of the four states provided by `RenderState` we only actually care about two - we're either rendering or we're not.
This was an attempt to deal with a fundamental mismatch between the Cycles and Renderer APIs : you can't create a `ccl::session` without already having `ccl::SessionParams` (threads, shading system, device etc), but the values for those will arrive via `Renderer::option()` _after_ construction of the renderer. And unfortunately the Renderer constructor is the most natural place  to construct the `ccl::Session`.

Our strategy was to create a default session in the renderer constructor, and then store options into `m_sessionParams`, and then in `reset()` attempt to replace the original session with a new one constructed from `m_sessionParams`. This was made tricky by the fact that the session also owns the scene, and we needed to transplant all the old scene objects into the new scene. This almost worked, but had several flaws :

- On Windows, we were getting deadlocks due to deleting the scene lock while holding it. This could have been fixed separately while keeping the reset concept alive - see GafferHQ#5101.
- The Cycles objects we transplanted had pointers to the old scene, and this would trigger assertions when freeing them in debug builds. I suspect this was also the cause of the crash exposed by `InteractiveCyclesRenderTest.testSVMRenderWithCPU`.
- More generally, it was working against the Cycles API, and could be broken in any number of ways by future Cycles changes.

Just removing the `reset()` code isn't the end of the story though : it means that the options for devices, threads etc are now completely non-functioning.  This results in test failures for `RendererTest.testCustomAOV` and `RendererTest.testOSLInSVMShadingSystem`. We'll deal with that in future commits : the purpose of this commit is to document why the old approach wasn't working, and clear the decks a little in preparation for a new approach.
This makes the ownership clear. Also document the call to `cancel()` in the destructor, because its necessity is non-obvious given that `~Session` calls that anyway. And replace `function_bind` with `std::bind()` - `function_bind` is just a Cycles macro that evaluates to that anyway.
And replace with functions to get the defaults whenever we want them. This does add a tiny bit of overhead to `option()` but I doubt it's measurable, and `option()` isn't performance-critical anyway. In return we get to remove state from Renderer, and set ourselves up a bit better for the upcoming session-reset-refactor.
The call to `film->tag_update()` was commented out in d78e359, a "Cycles 2.93 API update" commit containing a huge number of other changes. It looks like the `Film::tag_update()` method was removed in blender/cycles@588eb94. And the reason for that seems to be that `Film::tag_update()` was no longer necessary because it could track it's own dirtiness manually using the new modified flags managed by the Socket API.

But what's less clear is why we should call `Integrator::tag_update()` if the film has changed. I can't see any reason for that. In the absence of clear evidence for existence, we can't keep code like this hanging around forever, so I'm tearing off the band-aid now. Added a unit tests that seems to show that this is fine.
- Put related things in a section together.
- Don't loop over all lights in the scene unless we're actually going to use what we find.
This reintroduces support for `cycles:device` and other session-affecting options, with the constraint that all such options much be provided before the first call to `object()` or `attributes()`. Due to earlier commits, all important clients are now meeting those constraints.

Generally, the approach of storing from all options and pulling from them in the relevant spots seems a bit easier to reason about than the old approach of having `option()` push into various things at times that might not be convenient. But there is one definite downside - `RendererTest.testUnknownOptions` is now failing, because we're not spotting that some options aren't handled at all. This will be fixed in the next commit - I've omitted it from this one in an attempt to keep the changes as bite-sized as possible.

The new test case for the background light in batch renders is courtesy of Alex, and shows that this also fixes GafferHQ#5234.
We no longer need to do a render to force the options to be flushed to the session.
We weren't hitting this code path before because we weren't setting the shading system before converting shaders. It's a nice side-effect of the session refactoring that the intended warning message actually gets emitted.
This lets us reintroduce warnings for unused options, avoid repetition of warnings for scene/session edits, avoid work when options haven't changed at all, and simplify the special-case for cryptomatte.
If we made this modification to the Renderer API, the Cycles backend would have more options in dealing with the options-before-session but session-before-attributes chicken-and-egg problem. If `Renderer::attributes()` was called before the session was acquired, it could have just returned a special `PendingAttributes` object that stored the source CompoundObject. Then later in `Renderer::editAttributes()` after the session has been made, it could have transformed PendingAttributes into the final CyclesAttributes form.

This would potentially also have benefits in the 3Delight backend, meaning that DelightObject didn't need to store a DelightContext.
We were failing to update the background shader because it was all hidden behind `m_optionsChanged`, and no options had changed. `updateOptions()` is now concerned _only_ with processing options, and the rest is dealt with in `updateBackground()`.
There were multiple problems here :

- We weren't adding the shaders to the scene via `updateSceneObjects()` before using them in `updateBackground()`, and this could trigger Cycles crashes. Longer term I think `updateSceneObjects()` has to go - we should just be adding nodes to the scene as we create them, so they are always where they are expected to be.
- We weren't resetting the `Option.modified` flag after processing the option, so we were warning that it hadn't been used.
- There was a pre-existing bug where we couldn't revert to a previously used shader, because Cycles wouldn't realise that it needed to update. This is worked around by tagging the `ccl::ShaderManager` for update manually.
- We were always updating `m_backgroundShader`, even when the option hadn't changed. This was probably harmless before, but less so now it is accompanied by a forced update on the ShaderManager, so we now only do anything when the option has changed.
It's not at all clear why this was here. It was introduced by 3cc31f2 - "One giant commit update to the latest Cycles". I can't see any reason it would be necessary, and the latest test additions for background lights and shaders suggest it isn't.
Ribbons are the more appropriate default because they are more performant.

It also seems that there is a Cycles/Optix bug which causes the following error when rebuilding a "thick" curve, at least as converted by Gaffer from the whiskers on the stoat in the ALab scene :

```
OPTIX_ERROR_INVALID_VALUE in optixAccelComputeMemoryUsage(context, &options, &build_input, 1, &sizes) (/dependencies-8.0.0a7-source/Cycles/working/cycles-4.0.2/src/device/optix/device_impl.cpp:1049)
OPTIX_ERROR_INVALID_VALUE in optixAccelBuild(context, __null, &options, &build_input, 1, temp_mem.device_pointer, sizes.tempSizeInBytes, out_data.device_pointer, sizes.outputSizeInBytes, &out_handle, use_fast_trace_bvh ? &compacted_size_prop : __null, use_fast_trace_bvh ? 1 : 0) (/dependencies-8.0.0a7-source/Cycles/working/cycles-4.0.2/src/device/optix/device_impl.cpp:1080)
```

This had not been noticed before now because of another bug that was inadvertently fixed by the recent session management refactor : the initial value used by `Renderer.cpp` did not match the default value advertised by CyclesOptions. So unless you explicitly opted out, you were getting "ribbon" curves anyway. This is another benefit of the new approach to building session/scene params from options - we now initialise every parameter explicitly ourselves, with the default only being specified in one place.
@johnhaddon
Copy link
Member Author

We're not handling the sampling_pattern option correctly.

I noticed this too. I wonder if we should be handling it at all? I can't see it exposed anywhere in Blender...

@murraystevenson
Copy link
Contributor

We're not handling the sampling_pattern option correctly.

I noticed this too. I wonder if we should be handling it at all? I can't see it exposed anywhere in Blender...

Ah, yes... Blender hides it behind the Experimental -> Cycles Debug preference, which sounds like fair justification for us to remove it from CyclesOptions. Blender also defaults this option to Tabulated Sobol so we should probably do the same.

https://github.com/blender/blender/blob/90ef46baa1bcacc1a886ac5ac4493fc46eaac7ce/intern/cycles/blender/addon/properties.py#L432

@murraystevenson murraystevenson merged commit a515001 into GafferHQ:main Mar 8, 2024
5 checks passed
@johnhaddon johnhaddon deleted the cyclesSessionRefactor branch March 15, 2024 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants