Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan: improve staging data performance by using scratch buffers per frame. #3295

Merged
merged 8 commits into from
Jun 21, 2024

Conversation

mcourteaux
Copy link
Contributor

@mcourteaux mcourteaux commented May 21, 2024

Opening this PR to have a discussion on the work I did to improve performance of streaming data in the Vulkan backend. Will put as draft for now.

Summary

Staging scratch buffers are created at init to use for transfering data to the device. These scratch buffers are thus allocated once and reused for the lifetime of the renderer. This prevents allocating and freeing buffers for this purpose all the time (likely every several per frame). When the scratch buffer is full, the old procedure of simply creating a buffer for this purpose is used. This functionality is implemented in the auxiliary function allocFromScratchStagingBuffer, which will return a struct with info on which buffer to use and at what offset, and if this was a custom-made buffer that needs to be freed afterwards.

Minor changes:

  • bug fix for the assert detecting when the scratch buffer is out of free space.
  • add a lot more profiling scopes to get better insight in where the time is spent.

Added config:

  • BGFX_CONFIG_PER_FRAME_SCRATCH_STAGING_BUFFER_SIZE: 32 MiB
  • BGFX_CONFIG_MAX_STAGING_SIZE_FOR_SCRACH_BUFFER: 4 MiB

Related issues:

Result

Overall, I see a 3x speedup of simple bgfx::frame() times in the profiling data. UpdateDynamicVertexBuffer has a speedup of 30x almost! CommandQueueVK::finish() almost 8x faster.

A few before/after profiling screenshots for regular SilverNode usage (simply navigating UI and rerendering UI every frame with vg-renderer): Yellow is with this PR using scratch buffers, red is before.

  • UpdateDynamicVertexBuffer: (not allocating temporary memory each time)
    image

  • CommandQueueVK::finish(): (this time not freeing the temporary memory anymore)
    finish

  • Exec commands pre
    Exec commands pre

  • ContextVK::submit
    ContextVK::submit

  • bgfx::frame():
    frame

These zones were profiled thanks to additional profiled scopes (also in this PR).
Now most of the runtime of the vulkan backend is actually timed if the profiler is enabled:
image

@bkaradzic
Copy link
Owner

bkaradzic commented Jun 2, 2024

Can you test a little bit examples with your changes? One of first examples I tried with your branch crashed...

../../../src/renderer_vk.cpp  (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp (7976): BGFX Allocate command buffer error: vkWaitForFences failed -4: VK_ERROR_DEVICE_LOST.
../../../src/renderer_vk.cpp (4303): ASSERT VK_SUCCESS == vkresult -> m_cmd.alloc(&m_commandBuffer); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (4303): BGFX FATAL 0x00000000: m_cmd.alloc(&m_commandBuffer); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST

@mcourteaux
Copy link
Contributor Author

Aha, I don't get any crashes, but I do get the warning if I run the examples in Debug mode indeed. I get a loop that looks like this:

../../../src/renderer_vk.cpp (7549): BGFX vkWaitForFences( device , 1 , &m_backBufferFence[m_backBufferColorIdx] , 1U , (18446744073709551615UL) )
../../../src/renderer_vk.cpp (3806): BGFX vkAllocateDescriptorSets(m_device, &dsai, &descriptorSet)
../../../src/renderer_vk.cpp (3806): BGFX vkAllocateDescriptorSets(m_device, &dsai, &descriptorSet)
../../../src/renderer_vk.cpp (4731): BGFX vkFlushMappedMemoryRanges(device, 1, &range)
../../../src/renderer_vk.cpp (4731): BGFX vkFlushMappedMemoryRanges(device, 1, &range)
../../../src/renderer_vk.cpp (8044): BGFX vkEndCommandBuffer(m_activeCommandBuffer)
../../../src/renderer_vk.cpp (8049): BGFX vkResetFences(device, 1, &m_completedFence)
../../../src/renderer_vk.cpp (8067): BGFX vkQueueSubmit(m_queue, 1, &si, m_completedFence)
../../../src/renderer_vk.cpp (4303): BGFX m_cmd.alloc(&m_commandBuffer)
../../../src/renderer_vk.cpp (683): BGFX ---E-           Semaphore, Validation, 0: Validation Error: [ VUID-vkAcquireNextImageKHR-semaphore-01779 ] Object 0: handle = 0xb3ee8b0000000070, type = VK_OBJECT_TYPE_SEMAPHORE; | MessageID = 0x5717e75b | vkAcquireNextImageKHR():  Semaphore must not have any pending operations. The Vulkan spec states: If semaphore is not VK_NULL_HANDLE it must not have any uncompleted signal or wait operations pending (https://vulkan.lunarg.com/doc/view/1.3.280.0/linux/1.3-extensions/vkspec.html#VUID-vkAcquireNextImageKHR-semaphore-01779)

I'll investigate later. I don't have any time today.

@mcourteaux
Copy link
Contributor Author

@bkaradzic This semaphore bug on vkAcquireNextImageKHR is also on master. I guess we will need to fix that one first before being able to continue testing this. You can compile bgfx in debug with examples, and you'll see the issue when launching example-01-cubesDebug.

@bkaradzic
Copy link
Owner

@mcourteaux I don't see any issue with master... I run regularly thru examples.

@mcourteaux
Copy link
Contributor Author

I'm writing up an issue with my analysis. I think I'm wrapping my head around this issue, and I think I'll have a fix soonish as well. Will link the issue once I post it.

@bkaradzic
Copy link
Owner

Make sure you update your drivers before writing that analysis... It's unlikely you're first to discover something so trivial by running 01-cubes example. @pezcode Please, could you test on your end master?

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Jun 4, 2024

@bkaradzic #3302 I posted the issue, but I might keep updating it for a while, but I think it's detailed enough to accurately communicate the problem. I am on latest NVIDIA driver. The issue at hand is a race condition.

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Jun 4, 2024

@bkaradzic Does the crash you were seeing with my PR disappear if you change kick() on the last line of ContextRendererVK::submit() to kick(true)?

@bkaradzic
Copy link
Owner

Nope...

../../../src/renderer_vk.cpp (6013): BGFX Texture   1: BGRA8 (requested: BGRA8), 1024x1024x1 RT[ ], BO[ ], CW[ ].
../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp(8073): ASSERT VK_SUCCESS == vkresult -> vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (8073): BGFX FATAL 0x00000000: vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
wb: signal: trace/breakpoint trap (core dumped)

/.../bgfx-vk-staging|vulkan-staging-buffer-arena⚡* > git diff
diff --git a/src/renderer_vk.cpp b/src/renderer_vk.cpp
index e7dc43495..945fd9dfb 100644
--- a/src/renderer_vk.cpp
+++ b/src/renderer_vk.cpp
@@ -9334,7 +9334,7 @@ VK_DESTROY
                        }
                }
 
-               kick();
+               kick(true);
        }
 
 } /* namespace vk */ } // namespace bgfx
/.../bgfx-vk-staging|vulkan-staging-buffer-arena⚡* > 

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Jun 4, 2024

Aha, right, that's because the fence is already waited for. For a quick and dirty test, you could delete the failing vkWaitForFences on line 8073, because it is always already waited for in case of kick(true).

Fences can be waited for twice if I read the docs well.

@mcourteaux
Copy link
Contributor Author

Update: I have found two bugs with this:

  1. Texture data does not meet alignment requirements when using the scratch staging buffer.
  2. Potential Synchronization issue described in Potential fix for the synchronization issue. Needs review. #3303 (comment)

Will fix, and let know.

@mcourteaux
Copy link
Contributor Author

@d-s-h Can you test this branch, to see if it helps with your performance issue described in #3225? If you do, please try to use the bgfx Debug version. Also note that if you have bgfx callbacks configured, you must implement the traceVargs method to actually print something; otherwise no errors or warnings will be visible.

@mcourteaux mcourteaux force-pushed the vulkan-staging-buffer-arena branch from eb428c5 to fddfcf6 Compare June 6, 2024 13:01
@mcourteaux
Copy link
Contributor Author

@bkaradzic Would you like to test this again, now that you have updated to the latest Vulkan SDK, and the sync bug is fixed? Hopefully it works, or at least you might get some new error messages. Still testing it here without issues.

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Jun 12, 2024

@bkaradzic Update on this? Did your crashes disappear after we fixed the semaphore bug; and I fixed the texture alignment bug?

@bkaradzic
Copy link
Owner

bkaradzic commented Jun 12, 2024

With your latest changes (from your branch):

../../../src/renderer_vk.cpp (683): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-dstImage-07975 ] Object 0: handle = 0x799f384bcdd0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4e4775000000006e, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x35cfc6de | vkCmdCopyBufferToImage(): pRegions[0].bufferOffset (326) must be a multiple of VK_FORMAT_B8G8R8A8_UNORM texel size (4). The Vulkan spec states: If dstImage does not have either a depth/stencil format or a multi-planar format, then for each element of pRegions, bufferOffset must be a multiple of the texel block size (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-dstImage-07975)
../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp(8073): ASSERT VK_SUCCESS == vkresult -> vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (8073): BGFX FATAL 0x00000000: vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST

@mcourteaux
Copy link
Contributor Author

Thanks a lot for this update! I'll look into it. It's sort of annoying that we can't know for sure when it's ready... I don't get these validation errors. Perhaps we ask @pezcode to review the code once it at least works on your machine too.

@mcourteaux
Copy link
Contributor Author

@bkaradzic I pushed a change. For me the examples still work. The alignment requirement for copying into an image buffer says that it must be a multiple of the "texel block size". Before, I was using the "bytes per pixel" value as alignment. Now I'm using bimg::getBlockInfo()::blockSize as alignment. Does that sound right to you?

If this still doesn't fix the issue, can you tell me which example you are running? I tried example-08-update which deals with a whole lot of texture format, but I wasn't getting this issue.

I updated my Vulkan SDK and I'm getting another validation error on example-08-update, but this issue also exists on master currently:

../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xec25c90000000093, name = TH 13: textures/texture_compression_bc1.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (2). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xec25c90000000093, name = TH 13: textures/texture_compression_bc1.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (2). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xec25c90000000093, name = TH 13: textures/texture_compression_bc1.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (1). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xec25c90000000093, name = TH 13: textures/texture_compression_bc1.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (1). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x8f5f070000000095, name = TH 14: textures/texture_compression_bc2.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (2). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x8f5f070000000095, name = TH 14: textures/texture_compression_bc2.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (2). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x8f5f070000000095, name = TH 14: textures/texture_compression_bc2.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (1). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x8f5f070000000095, name = TH 14: textures/texture_compression_bc2.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (1). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xe150c50000000097, name = TH 15: textures/texture_compression_bc3.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (2). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xe150c50000000097, name = TH 15: textures/texture_compression_bc3.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (2). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xe150c50000000097, name = TH 15: textures/texture_compression_bc3.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (1). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xe150c50000000097, name = TH 15: textures/texture_compression_bc3.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (1). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)

@bkaradzic
Copy link
Owner

With latest (I'm running 01-cubes):

../../../src/renderer_vk.cpp (683): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-dstImage-07975 ] Object 0: handle = 0x7e3c104bcf90, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4e4775000000006e, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x35cfc6de | vkCmdCopyBufferToImage(): pRegions[0].bufferOffset (326) must be a multiple of VK_FORMAT_B8G8R8A8_UNORM texel size (4). The Vulkan spec states: If dstImage does not have either a depth/stencil format or a multi-planar format, then for each element of pRegions, bufferOffset must be a multiple of the texel block size (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-dstImage-07975)
../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp(8073): ASSERT VK_SUCCESS == vkresult -> vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (8073): BGFX FATAL 0x00000000: vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST

@mcourteaux
Copy link
Contributor Author

I'm terribly sorry. I was compiling with make config=release64 and then running the debug example. When you do something so obviously silly for more than a few days, you don't even question it.

I found a bug with my code, for which I'm also sorry. I fixed this bug too now. This is getting ridiculous (I'm really sorry for wasting so much time of yours). However I did test it several times with additional robustness tests and it seems to correctly align the allocations into the scratch buffer for me. The bug I found was dumb, but as far as I can tell, it wouldn't explain the misalignment.

Q: Am I misunderstanding what bx::strideAlign does? I'm using it to round up the current offset in the buffer to the desired alignment.

@bkaradzic
Copy link
Owner

Tested again, now 01, 04, 09 crash everything in between works.

bx::strideAlign returns next aligned offset by stride. Templated version finds next offset that's aligned by _stride and with Min template argument.

@bkaradzic
Copy link
Owner

All crashes look like this:

../../../src/renderer_vk.cpp (683): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-dstImage-07975 ] Object 0: handle = 0x79b8bc2fb050, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4e4775000000006e, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x35cfc6de | vkCmdCopyBufferToImage(): pRegions[0].bufferOffset (326) must be a multiple of VK_FORMAT_B8G8R8A8_UNORM texel size (4). The Vulkan spec states: If dstImage does not have either a depth/stencil format or a multi-planar format, then for each element of pRegions, bufferOffset must be a multiple of the texel block size (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-dstImage-07975)
../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp(8073): ASSERT VK_SUCCESS == vkresult -> vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (8073): BGFX FATAL 0x00000000: vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST

@mcourteaux
Copy link
Contributor Author

@bkaradzic I took the time to do some additional debugging with Valgrind. I found a few minor issues, which I fixed. Regarding the crash, I added a bunch of BX_ASSERTs and some printing. Could you please generate a crash, and share the entire output of one of the crashing programs? Feel free to also debug yourself, as really did what I could. With some luck, you'll hit one of the asserts I added, which might make debugging easier.

@bkaradzic
Copy link
Owner

Still looks the same...

../../../src/renderer_vk.cpp (683): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-dstImage-07975 ] Object 0: handle = 0x7ef70c4bd030, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4e4775000000006e, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x35cfc6de | vkCmdCopyBufferToImage(): pRegions[0].bufferOffset (326) must be a multiple of VK_FORMAT_B8G8R8A8_UNORM texel size (4). The Vulkan spec states: If dstImage does not have either a depth/stencil format or a multi-planar format, then for each element of pRegions, bufferOffset must be a multiple of the texel block size (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-dstImage-07975)
../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp(8073): ASSERT VK_SUCCESS == vkresult -> vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (8073): BGFX FATAL 0x00000000: vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Jun 13, 2024

Yes, but I didn't change anything, besides adding printing and asserts. Can you share the full output?

@mcourteaux
Copy link
Contributor Author

So, the 326 the validation layer reports for you should be impossible, as I use strideAlign to round that up to the next multiple of the blockSize for the given TextureFormat, which is supposed to be 4 (as reported by both my added print statements, and the validation layer).

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Jun 13, 2024

@bkaradzic To be precise, here I added asserts and prints:

bgfx/src/renderer_vk.cpp

Lines 6496 to 6511 in 52e3814

const bimg::ImageBlockInfo &bi = bimg::getBlockInfo(bimg::TextureFormat::Enum(m_textureFormat));
for (uint32_t ii = 0; ii < _bufferImageCopyCount; ++ii)
{
bx::printf(" -- BufferImageCopy[%u].bufferOffset = %u (align = %u, texture format=%s)\n",
ii, _bufferImageCopy[ii].bufferOffset, bi.blockSize, bimg::getName(bimg::TextureFormat::Enum(m_textureFormat)));
BX_ASSERT(_bufferImageCopy[ii].bufferOffset % bi.blockSize == 0, "Buffer image copy offset (%u) is not aligned correctly (%u)",
_bufferImageCopy[ii].bufferOffset, bi.blockSize);
}
vkCmdCopyBufferToImage(
_commandBuffer
, _stagingBuffer
, m_textureImage
, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
, _bufferImageCopyCount
, _bufferImageCopy
);

This is the only vkCmdCopyBufferToImage in the entire file. The validation layer is complaining that the bufferOffset is not correctly aligned to the desired texel block size.

You should at least see those prints appear as well.

@bkaradzic
Copy link
Owner

I'm testing with 01-cubes, it doesn't hit this code.

@mcourteaux
Copy link
Contributor Author

Just to be sure: you are not making any weird mistake where you are not running the code you were compiling (like I was doing)?

Can you construct a stacktrace from where the validation layer reports this? I'm flabbergasted. A breakpoint on line 671 for example would catch this. (For gdb, this would be: break -source renderer_vk.cpp -line 671).

@bkaradzic
Copy link
Owner

Rebuilt whole thing and it actually works now... Stale code.

@mcourteaux
Copy link
Contributor Author

Oh wauw, that makes sense! I'll clean up the PR and get rid of some of the excessive printing. Would this be a bug in GENie or the dependency detection system?

@bkaradzic
Copy link
Owner

Not sure what happened, makefile should not have issues...

@mcourteaux
Copy link
Contributor Author

@bkaradzic @pezcode Ready for review/merge as far as I'm concerned.

@mcourteaux mcourteaux marked this pull request as ready for review June 13, 2024 21:45
@bkaradzic
Copy link
Owner

47-pixelformats

../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp (8017): BGFX Allocate command buffer error: vkWaitForFences failed -4: VK_ERROR_DEVICE_LOST.
../../../src/renderer_vk.cpp(4309): ASSERT VK_SUCCESS == vkresult -> m_cmd.alloc(&m_commandBuffer); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (4309): BGFX FATAL 0x00000000: m_cmd.alloc(&m_commandBuffer); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST

@mcourteaux
Copy link
Contributor Author

mcourteaux commented Jun 14, 2024

Oh good catch! I reproduced this one. I was unaware of the conversion process. I was aligning to the texture format of the original format instead of the format that bimg converts the texture to. Is fixed now. Please be aware of the bug I reported in #3311. If you suffer this issue with example-08, you can try this patch:

diff --git a/examples/08-update/update.cpp b/examples/08-update/update.cpp
index 109efb0ff..200c92e20 100644
--- a/examples/08-update/update.cpp
+++ b/examples/08-update/update.cpp
@@ -173,18 +173,11 @@ bgfx::TextureHandle loadTextureWithUpdate(const char* _filePath, uint64_t _flags
                                        , NULL
                                        );
 
-                               const bimg::ImageBlockInfo& blockInfo = getBlockInfo(imageContainer->m_format);
-                               const uint32_t blockWidth  = blockInfo.blockWidth;
-                               const uint32_t blockHeight = blockInfo.blockHeight;
-
                                uint32_t width  = imageContainer->m_width;
                                uint32_t height = imageContainer->m_height;
 
                                for (uint8_t lod = 0, num = imageContainer->m_numMips; lod < num; ++lod)
                                {
-                                       width  = bx::max(blockWidth,  width);
-                                       height = bx::max(blockHeight, height);
-
                                        bimg::ImageMip mip;
 
                                        if (bimg::imageGetRawData(*imageContainer, 0, lod, imageContainer->m_data, imageContainer->m_size, mip))

Side question: BX_ASSERT and friends macro doesn't deal well with conditions that contain the modulo operator: the modulo disappears as it's treated as a format specifier. In:

https://github.com/bkaradzic/bx/blob/0ec634e8fdf8c810f9911c686a8158088ae25379/include/bx/macros.h#L285-L292

bx::assertFunction(bx::Location::current(), "ASSERT " #_condition " -> " _format, ##__VA_ARGS__)

could be turned into:

bx::assertFunction(bx::Location::current(), "ASSERT %s -> " _format, #_condition, ##__VA_ARGS__)

For now, I replaced the % operator with bx::uint32_mod() to avoid this printing issue.

src/bgfx_p.h Outdated
@@ -2168,6 +2171,9 @@ namespace bgfx
bx::memSet(m_occlusion, 0xff, sizeof(m_occlusion) );

m_perfStats.viewStats = m_viewStats;

bx::printf(" -- Clearing m_renderItemBind -- \n");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove printfs...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bkaradzic bkaradzic merged commit 1109f3c into bkaradzic:master Jun 21, 2024
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants