Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Software renderer redesign part 3 #258

Merged
merged 255 commits into from
Jul 14, 2024
Merged

Software renderer redesign part 3 #258

merged 255 commits into from
Jul 14, 2024

Conversation

afritz1
Copy link
Owner

@afritz1 afritz1 commented Apr 7, 2024

This has several optimizations and bug fixes for the new renderer. Tl;dr:

  • Voxel visibility optimization via quadtree (VoxelVisibilityChunk)
  • Entity visibility optimization via bounding box tests (EntityVisibilityChunk)
  • Sky visibility optimization (SkyVisibilityManager)
  • Changed triangle clipping from world space to clip space
  • Calculate world space position in rasterizer by unprojecting clip space point
  • Render sky much closer to camera and w/o depth test to avoid far clip plane issue
  • Several rasterizer optimizations (binning, pixel shader variants via template parameters, constexpr, __restrict to get compiler-generated vector instructions)

I added multi-threading to the rasterizer today but performance still scales poorly, likely due to thread synchronization for draw calls, and each thread's workload not being big enough. Ideally FPS will be at a playable level for all PCs before this is merged. Just creating now for visibility.

Also added default voxel transform ID for memory savings since a lot of voxels don't have special transformations.
Also use default voxel transform buffer ID in various places. Need to do entity transforms next.
Need to have a tree level populate all its child nodes with the same value if all visible or all not visible.
Doesn't crash which is a good sign. Would like to eventually make the broadcast function iterative instead of recursive.
Needed to differentiate between the 0-3 child index and the 0-end of the nodes on a tree level. My quadtree debug image is starting to show something resembling a bird's eye view, but still glitchy.
To be used with quadtree node look-up.
This completely fixes the quadtree visibility calculation as far as I can tell.
Putting the voxel and entity rendering code in separate files for ease of understanding.
afritz1 added 6 commits March 23, 2024 12:26
Don't need to when there is a fully-enclosing sky mesh.
Spent a lot of time debugging inclusive and exclusive bin pixels. Each rasterizer bin is also like 14 MB which is huge and causes debug builds to chug whenever changing the resolution scale. I think it's bloated because of the sky being a special case high-density mesh.
This should be easier to multi-thread. Not sure about the possibility that rasterizer threads will have to synchronize after every 1-8 draw calls.
Tried several things before it worked without deadlocking. Performance seems heavily bottlenecked on g_totalDepthTests accumulation.
Fixes a huge performance issue with threads fighting over it.
These were being freed due to a bad alloc exception in the SoftwareRenderer before their manager had a chance to init().
@afritz1 afritz1 added this to the 0.15.0 milestone Apr 7, 2024
@afritz1 afritz1 self-assigned this Apr 7, 2024
afritz1 added 20 commits April 20, 2024 11:47
Eventually want bin dimensions to vary with frame buffer resolution for better thread balancing.
It was making things too hard to understand while designing for multi-threading. Slight performance loss but will try to make it up later.
This reduces each bin from 14MB to about 300KB.
Black screen currently because workers aren't doing anything.
Need to map each range of triangle indices to the worker's draw call index somehow.
Still need a way of iterating over each draw call and its rasterizer triangles.
Rendering is working again but multi-threaded is still slower than single-threaded, and deadlocks sometimes. Frustrating.
Still getting deadlocks, need to fix the condition variables etc..
Dealing with occasional deadlock though I think the problem is with workers that get 0 draw calls and are not waiting properly.
Caused by not properly setting all g_workers conditions.
@afritz1 afritz1 merged commit e8dd0bb into main Jul 14, 2024
1 check passed
@afritz1 afritz1 deleted the sw-renderer-redesign-part-3 branch July 14, 2024 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant