Material system: Implement GPU occlusion culling #1180

VReaperV · 2024-06-04T11:04:49Z

Implement occlusion culling in compute shaders using a Hierarchical Depth-Buffer.

The depth buffer is processed before the post-processing pass, reducing it to the next mip level until a 1x1 resolution is reached.
Then it's used to find out if a surface that passed frustum culling is behind some opaque surfaces. This is done by getting the screen-space AABB from the bounding sphere of the surface, and then comparing its depth to the depth in the appropriate level of the depth-reduction image. The level of the depth image is chosen based on how much of the screen space it's taking up.

VReaperV · 2024-06-16T08:48:20Z

The sphere projection function is giving incorrect results apparently, since I'm pretty sure the world->view space conversion is correct. I'm also not seeing much of any difference in frame time between this change and frustum culling-only: the frame time that I win in having less draws is nullified by the hi-z pass. It could be improved a bit by making the depth buffer downsampling be done in one compute dispatch, but it might just not really be worth it. On the other hand the hi-z buffer could be used for some other effects too.

VReaperV · 2024-06-16T09:03:41Z

Well, I'm seeing some improvements in certain cases, especially with high graphics settings. The whole thing also isn't taking too much frame time in itself anyway (the indirect draws + cull compute).

VReaperV · 2024-06-16T09:05:31Z

So the lower framerate that illwieckz was getting in #1137 might have to do with something else. Ideally we could also use better synchronization, but I'm not sure that would be possible.

VReaperV · 2024-06-16T09:36:15Z

Possibly rasterizing the bounding boxes with colour/depth writes disabled to determine object visibility would be faster. The depth pre-pass is super fast wherever I looked at it here, so this should be fast too. It does, however, mean that object visibility would be lagging behind by a frame, but that's already happening with this hi-z algorithm.

It would also require a bunch of writes to a buffer from every rasterized fragment that passed the depth test, which is not ideal. Perhaps a custom rasterizer in a compute shader that would instead write the visible object ids once per triangle or a batch of triangles would be better.

VReaperV · 2024-06-16T09:40:17Z

There are a few more things that could be improved independent of occlusion culling that might make this work faster in every case anyway:

Use batches of e. g. 64 triangles instead of specifying surfaces, which would allow dropping the processSurfaces_cp shader and doing writes in the cull_cp shader instead.
That would also allow creating an index buffer instead of an indirect buffer, for the most part. There would still be one draw per material, but that draw would have all the triangles in it (that weren't culled).
Potentially with triangle batches culling at triangle batch level might speed things up even more: e. g. with cone culling.

VReaperV · 2024-06-16T09:42:21Z

Another potential advantage of compute rasterizer is assigning light sources to a tile/cluster buffer, though with low light counts it might not make much of a difference.

VReaperV · 2024-06-20T14:44:48Z

Doing hardware rasterization of BBoxes and writing to a buffer from the fragment shader with an early discard doesn't work well for some reason.

VReaperV · 2024-06-21T11:41:15Z

I believe this fixes the incorrect projection. However there is still some incorrect culling, which might be due to incorrect depth downsampling with even/odd dimensions, haven't checked yet.

VReaperV · 2024-06-22T11:01:57Z

This mostly works now, however there's still an error with AABB depth projection, and some things need to be un-hardcoded.

src/engine/renderer/Material.cpp

VReaperV · 2024-07-29T19:49:14Z

Added some comments and made r_lockPVS 1 work with occlusion culling.

VReaperV · 2024-07-29T20:03:54Z

Rebased.

VReaperV · 2024-07-29T20:44:18Z

Cleaned up a bit, and added a toggle through r_gpuOcclusionCulling.

src/engine/renderer/Material.cpp

illwieckz

I'm not sure to understand everything, but the overall looks good to me and I confirm it fixes many bugs.

I would appreciate some better spacing:

I also noticed some new GLSL files use spaces or seem to mix tabs and spaces, I would prefer tabs.

illwieckz · 2024-07-30T20:11:37Z

Lol I wanted to approve #1137 instead… 🤦‍♀️️

Wrong one.

slipher · 2024-08-02T03:04:43Z

This works by creating a depth-reduction image from the depth buffer, then using it on the next frame.

How can data generated on the previous frame be valid? Everything could be in a different position or viewed from a different angle.

VReaperV · 2024-08-02T06:06:02Z

How can data generated on the previous frame be valid? Everything could be in a different position or viewed from a different angle.

That's a mistake in pr description, what is meant is that the results of the culling are used on the next frame, i. e. double-buffered (the depth buffer is used from the current frame).

VReaperV · 2024-08-02T22:12:43Z

Fixed pr description.

src/engine/renderer/tr_image.cpp

src/engine/renderer/gl_shader.h

src/engine/renderer/glsl_source/cull_cp.glsl

slipher · 2024-08-04T22:28:26Z

src/engine/renderer/glsl_source/cull_cp.glsl

+        surfaceCoords *= ( boundingBox.xy + boundingBox.zw ) * 0.5;
+        const ivec2 surfaceCoordsFloor = ivec2( clamp( surfaceCoords.x, 0.0, float( u_ViewWidth >> level ) ), clamp( surfaceCoords.y, 0.0, float( u_ViewHeight >> level ) ) );
+        ivec4 depthCoords = ivec4( surfaceCoordsFloor.xy,
+                                   surfaceCoordsFloor.x + ( surfaceCoords.x - surfaceCoordsFloor.x >= 0.5 ? 1 : -1 ),


The object won't always occupy 2 pixels in both dimensions, right? It could have some oblong shape and be 1 pixel wide in the x axis and 2 in the y axis for example. I understand this is not a bug as the worst thing that could happen here is a surface not being culled, but I wonder if there isn't another way to do the calculation.

That can happen sometimes, yes. I've tried wrapping the texel fetches in if statements so only the first one is guaranteed and the next ones are only done if the new coordinates don't match any of the previous ones, but some surfaces became invisible when they shouldn't be. Not sure why as I couldn't spot what the issue was. This can be improved later though.

There's also an extension that would allow doing this with just one regular texture sample, but the availability of that extension is not high enough to be the only option.

Ah wait, it wasn't working properly because I accidentally used min instead of max. Anyway though, with that change it seems to be a tiny bit slower (by <0.1ms). Any consecutive texel fetches with the same coordinates will likely just come from the same texture cache anyway.

src/engine/renderer/Material.h

src/engine/renderer/Material.cpp

VReaperV · 2024-08-06T21:26:07Z

I also noticed some new GLSL files use spaces or seem to mix tabs and spaces, I would prefer tabs.

Ah, yep, that should be fixed now.

VReaperV · 2024-08-06T21:28:15Z

I've added a portal readback as well: now instead of just calling R_MirrorViewBySurface() for every portal surface after sorting them by distance each frame, the cull_cp shader will cull portal surfaces in addition to regular ones, then the CPU will read the results back next frame and use that to determine which portals might be visible.

src/engine/renderer/Material.cpp

VReaperV · 2024-08-06T21:44:36Z

Adding R_SyncRenderThread() to GenerateWorldMaterials() also seems to have fixed crashes with r_smp 1, so I have disabled that restriction.

slipher

Could we get a comment somewhere with an overview of how the double buffering works and what is the benefit of it? Like is there an async API for the compute shaders? If world surfaces are double-buffered but models are not, how do we keep them in sync when rendering?

slipher · 2024-08-07T05:02:18Z

src/engine/renderer/Material.h

+struct PortalView {
+	uint32_t count;
+	drawSurf_t* drawSurf;
+	uint32_t views[MAX_VIEWS];


What do these represent? Portals directly visible from this one?

Yep. Some of it can probably be made simpler.

src/engine/renderer/glsl_source/cull_cp.glsl

src/engine/renderer/tr_main.cpp

VReaperV · 2024-08-07T14:11:52Z

Could we get a comment somewhere with an overview of how the double buffering works and what is the benefit of it? Like is there an async API for the compute shaders? If world surfaces are double-buffered but models are not, how do we keep them in sync when rendering?

There's this graph from #1137 which holds true here as well:

The benefit is not introducing stalls with extra synchronisation, so they can keep executing alongside the regular pipeline (including swapBuffers()). Compute shaders are async by nature, so nothing extra needs to be done for that.
Double-buffering only applies to which surfaces are rendered, not how they are rendered. And since world and model surfaces are culled in completely different ways synchronisation between them does not make sense.

slipher · 2024-08-09T02:59:27Z

LGTM, but I would still like to understand the double buffering thing. I guess I never really understood the answer to this previous conversation:

How can data generated on the previous frame be valid? Everything could be in a different position or viewed from a different angle.

That's a mistake in pr description, what is meant is that the results of the culling are used on the next frame, i. e. double-buffered (the depth buffer is used from the current frame).

I don't get how surfaces can be double buffered without also buffering everything else that may be rendered (particle effects, 2D drawing, etc.). If you used the surface visibility from an earlier time then the player could have moved to a point that makes some of them newly visible.

Add occlusion culling on the GPU to the material system. This works by creating a depth-reduction image from the depth buffer, then using it on the next frame. The bounding sphere of each surface is projected into a screen-space AABB, then its depth is tested against the depth of 4 texels in the depth image at appropriate mip-level, determined by the size of the AABB on screen. This doesn't include portal views because they use stencil buffer, so doing depth reduction and checking against depth image there would be complicated and likely require some sort of visibility buffer. Also added portal visibility readback: cull_cp will now cull portal surfaces (if there are any) in addition to regular surfaces, and the CPU will read the distance data (-1 if culled) on the next frame to better determine potential visible views.

VReaperV · 2024-08-11T19:27:56Z

(rebased)

VReaperV · 2024-08-11T19:36:06Z

If you used the surface visibility from an earlier time then the player could have moved to a point that makes some of them newly visible.

Technically yes, but practically the fact that it's only one frame of difference makes it more or less impossible, even when moving around fast. So far I've not seen any discrepancies when using the engine with this change.

It's also aided by the fact that the cull takes bounding spheres of the surfaces (either directly or projecting an AABB from them), so the culling somewhat coarse.

VReaperV mentioned this pull request Jun 22, 2024

Material System: Make depth pre-pass surfaces into stages of the main surface #1204

Draft

VReaperV mentioned this pull request Jul 16, 2024

Implement material system #1105

Merged

VReaperV force-pushed the gpu-occlusion-culling branch 4 times, most recently from e76f9b5 to 418997a Compare July 27, 2024 12:19

VReaperV marked this pull request as ready for review July 27, 2024 12:20

VReaperV force-pushed the gpu-occlusion-culling branch 2 times, most recently from 12b74aa to bac765b Compare July 27, 2024 13:20

VReaperV changed the title ~~WIP: Material system: Implement GPU occlusion culling~~ Material system: Implement GPU occlusion culling Jul 27, 2024

VReaperV force-pushed the gpu-occlusion-culling branch from bac765b to 7c49392 Compare July 27, 2024 13:29

github-advanced-security bot found potential problems Jul 27, 2024

View reviewed changes

src/engine/renderer/Material.cpp Fixed Show resolved Hide resolved

VReaperV force-pushed the gpu-occlusion-culling branch from 7c49392 to cff409a Compare July 29, 2024 19:48

VReaperV force-pushed the gpu-occlusion-culling branch from cff409a to ad7922d Compare July 29, 2024 20:03

VReaperV force-pushed the gpu-occlusion-culling branch from ad7922d to 7500d84 Compare July 29, 2024 20:25

VReaperV force-pushed the gpu-occlusion-culling branch from 7500d84 to 8a4fd8f Compare July 29, 2024 20:47

illwieckz reviewed Jul 30, 2024

View reviewed changes

src/engine/renderer/Material.cpp Outdated Show resolved Hide resolved

illwieckz reviewed Jul 30, 2024

View reviewed changes

src/engine/renderer/Material.cpp Outdated Show resolved Hide resolved

illwieckz reviewed Jul 30, 2024

View reviewed changes

src/engine/renderer/Material.cpp Outdated Show resolved Hide resolved

illwieckz previously approved these changes Jul 30, 2024

View reviewed changes

illwieckz added T-Improvement Improvement for an existing feature A-Renderer labels Jul 30, 2024

illwieckz mentioned this pull request Jul 30, 2024

Material system: Implement GPU frustum culling #1137

Merged

VReaperV force-pushed the gpu-occlusion-culling branch from 8a4fd8f to bd6236d Compare August 4, 2024 11:44

slipher reviewed Aug 4, 2024

View reviewed changes

VReaperV force-pushed the gpu-occlusion-culling branch 3 times, most recently from 94b0116 to 164072d Compare August 6, 2024 21:24

VReaperV force-pushed the gpu-occlusion-culling branch from 164072d to 0bfd7d2 Compare August 6, 2024 21:29

github-advanced-security bot found potential problems Aug 6, 2024

View reviewed changes

src/engine/renderer/Material.cpp Dismissed Show resolved Hide resolved

slipher reviewed Aug 7, 2024

View reviewed changes

VReaperV force-pushed the gpu-occlusion-culling branch from 0bfd7d2 to 2dad681 Compare August 7, 2024 12:44

VReaperV force-pushed the gpu-occlusion-culling branch from 2dad681 to 171c694 Compare August 11, 2024 18:43

VReaperV merged commit 4f3468d into DaemonEngine:master Aug 11, 2024
9 checks passed

VReaperV deleted the gpu-occlusion-culling branch August 11, 2024 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Material system: Implement GPU occlusion culling #1180

Material system: Implement GPU occlusion culling #1180

VReaperV commented Jun 4, 2024 •

edited

Loading

VReaperV commented Jun 16, 2024

VReaperV commented Jun 16, 2024

VReaperV commented Jun 16, 2024

VReaperV commented Jun 16, 2024

VReaperV commented Jun 16, 2024

VReaperV commented Jun 16, 2024

VReaperV commented Jun 20, 2024

VReaperV commented Jun 21, 2024

VReaperV commented Jun 22, 2024

VReaperV commented Jul 29, 2024

VReaperV commented Jul 29, 2024 •

edited

Loading

VReaperV commented Jul 29, 2024

illwieckz left a comment •

edited

Loading

illwieckz commented Jul 30, 2024

slipher commented Aug 2, 2024

VReaperV commented Aug 2, 2024

VReaperV commented Aug 2, 2024

slipher Aug 4, 2024

VReaperV Aug 6, 2024

VReaperV Aug 6, 2024 •

edited

Loading

VReaperV Aug 6, 2024 •

edited

Loading

VReaperV commented Aug 6, 2024

VReaperV commented Aug 6, 2024

VReaperV commented Aug 6, 2024

slipher left a comment

slipher Aug 7, 2024

VReaperV Aug 7, 2024

VReaperV commented Aug 7, 2024 •

edited

Loading

slipher commented Aug 9, 2024

VReaperV commented Aug 11, 2024

VReaperV commented Aug 11, 2024

Material system: Implement GPU occlusion culling #1180

Material system: Implement GPU occlusion culling #1180

Conversation

VReaperV commented Jun 4, 2024 • edited Loading

VReaperV commented Jun 16, 2024

VReaperV commented Jun 16, 2024

VReaperV commented Jun 16, 2024

VReaperV commented Jun 16, 2024

VReaperV commented Jun 16, 2024

VReaperV commented Jun 16, 2024

VReaperV commented Jun 20, 2024

VReaperV commented Jun 21, 2024

VReaperV commented Jun 22, 2024

VReaperV commented Jul 29, 2024

VReaperV commented Jul 29, 2024 • edited Loading

VReaperV commented Jul 29, 2024

illwieckz left a comment • edited Loading

Choose a reason for hiding this comment

illwieckz commented Jul 30, 2024

slipher commented Aug 2, 2024

VReaperV commented Aug 2, 2024

VReaperV commented Aug 2, 2024

slipher Aug 4, 2024

Choose a reason for hiding this comment

VReaperV Aug 6, 2024

Choose a reason for hiding this comment

VReaperV Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

VReaperV Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

VReaperV commented Aug 6, 2024

VReaperV commented Aug 6, 2024

VReaperV commented Aug 6, 2024

slipher left a comment

Choose a reason for hiding this comment

slipher Aug 7, 2024

Choose a reason for hiding this comment

VReaperV Aug 7, 2024

Choose a reason for hiding this comment

VReaperV commented Aug 7, 2024 • edited Loading

slipher commented Aug 9, 2024

VReaperV commented Aug 11, 2024

VReaperV commented Aug 11, 2024

VReaperV commented Jun 4, 2024 •

edited

Loading

VReaperV commented Jul 29, 2024 •

edited

Loading

illwieckz left a comment •

edited

Loading

VReaperV Aug 6, 2024 •

edited

Loading

VReaperV Aug 6, 2024 •

edited

Loading

VReaperV commented Aug 7, 2024 •

edited

Loading