-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan: improve staging data performance by using scratch buffers per frame. #3295
Vulkan: improve staging data performance by using scratch buffers per frame. #3295
Conversation
Can you test a little bit examples with your changes? One of first examples I tried with your branch crashed...
|
Aha, I don't get any crashes, but I do get the warning if I run the examples in Debug mode indeed. I get a loop that looks like this:
I'll investigate later. I don't have any time today. |
@bkaradzic This semaphore bug on |
@mcourteaux I don't see any issue with master... I run regularly thru examples. |
I'm writing up an issue with my analysis. I think I'm wrapping my head around this issue, and I think I'll have a fix soonish as well. Will link the issue once I post it. |
Make sure you update your drivers before writing that analysis... It's unlikely you're first to discover something so trivial by running 01-cubes example. @pezcode Please, could you test on your end master? |
@bkaradzic #3302 I posted the issue, but I might keep updating it for a while, but I think it's detailed enough to accurately communicate the problem. I am on latest NVIDIA driver. The issue at hand is a race condition. |
@bkaradzic Does the crash you were seeing with my PR disappear if you change |
Nope...
|
Fences can be waited for twice if I read the docs well. |
Update: I have found two bugs with this:
Will fix, and let know. |
@d-s-h Can you test this branch, to see if it helps with your performance issue described in #3225? If you do, please try to use the bgfx Debug version. Also note that if you have bgfx callbacks configured, you must implement the |
eb428c5
to
fddfcf6
Compare
@bkaradzic Would you like to test this again, now that you have updated to the latest Vulkan SDK, and the sync bug is fixed? Hopefully it works, or at least you might get some new error messages. Still testing it here without issues. |
@bkaradzic Update on this? Did your crashes disappear after we fixed the semaphore bug; and I fixed the texture alignment bug? |
With your latest changes (from your branch):
|
Thanks a lot for this update! I'll look into it. It's sort of annoying that we can't know for sure when it's ready... I don't get these validation errors. Perhaps we ask @pezcode to review the code once it at least works on your machine too. |
@bkaradzic I pushed a change. For me the examples still work. The alignment requirement for copying into an image buffer says that it must be a multiple of the "texel block size". Before, I was using the "bytes per pixel" value as alignment. Now I'm using If this still doesn't fix the issue, can you tell me which example you are running? I tried I updated my Vulkan SDK and I'm getting another validation error on
|
With latest (I'm running 01-cubes):
|
I'm terribly sorry. I was compiling with I found a bug with my code, for which I'm also sorry. I fixed this bug too now. This is getting ridiculous (I'm really sorry for wasting so much time of yours). However I did test it several times with additional robustness tests and it seems to correctly align the allocations into the scratch buffer for me. The bug I found was dumb, but as far as I can tell, it wouldn't explain the misalignment. Q: Am I misunderstanding what |
Tested again, now 01, 04, 09 crash everything in between works.
|
All crashes look like this:
|
@bkaradzic I took the time to do some additional debugging with Valgrind. I found a few minor issues, which I fixed. Regarding the crash, I added a bunch of |
Still looks the same...
|
Yes, but I didn't change anything, besides adding printing and asserts. Can you share the full output? |
So, the |
@bkaradzic To be precise, here I added asserts and prints: Lines 6496 to 6511 in 52e3814
This is the only You should at least see those prints appear as well. |
I'm testing with 01-cubes, it doesn't hit this code. |
Just to be sure: you are not making any weird mistake where you are not running the code you were compiling (like I was doing)? Can you construct a stacktrace from where the validation layer reports this? I'm flabbergasted. A breakpoint on line 671 for example would catch this. (For |
Rebuilt whole thing and it actually works now... Stale code. |
Oh wauw, that makes sense! I'll clean up the PR and get rid of some of the excessive printing. Would this be a bug in GENie or the dependency detection system? |
Not sure what happened, makefile should not have issues... |
@bkaradzic @pezcode Ready for review/merge as far as I'm concerned. |
47-pixelformats
|
Oh good catch! I reproduced this one. I was unaware of the conversion process. I was aligning to the texture format of the original format instead of the format that diff --git a/examples/08-update/update.cpp b/examples/08-update/update.cpp
index 109efb0ff..200c92e20 100644
--- a/examples/08-update/update.cpp
+++ b/examples/08-update/update.cpp
@@ -173,18 +173,11 @@ bgfx::TextureHandle loadTextureWithUpdate(const char* _filePath, uint64_t _flags
, NULL
);
- const bimg::ImageBlockInfo& blockInfo = getBlockInfo(imageContainer->m_format);
- const uint32_t blockWidth = blockInfo.blockWidth;
- const uint32_t blockHeight = blockInfo.blockHeight;
-
uint32_t width = imageContainer->m_width;
uint32_t height = imageContainer->m_height;
for (uint8_t lod = 0, num = imageContainer->m_numMips; lod < num; ++lod)
{
- width = bx::max(blockWidth, width);
- height = bx::max(blockHeight, height);
-
bimg::ImageMip mip;
if (bimg::imageGetRawData(*imageContainer, 0, lod, imageContainer->m_data, imageContainer->m_size, mip)) Side question: bx::assertFunction(bx::Location::current(), "ASSERT " #_condition " -> " _format, ##__VA_ARGS__) could be turned into: bx::assertFunction(bx::Location::current(), "ASSERT %s -> " _format, #_condition, ##__VA_ARGS__) For now, I replaced the |
src/bgfx_p.h
Outdated
@@ -2168,6 +2171,9 @@ namespace bgfx | |||
bx::memSet(m_occlusion, 0xff, sizeof(m_occlusion) ); | |||
|
|||
m_perfStats.viewStats = m_viewStats; | |||
|
|||
bx::printf(" -- Clearing m_renderItemBind -- \n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove printfs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Opening this PR to have a discussion on the work I did to improve performance of streaming data in the Vulkan backend. Will put as draft for now.
Summary
Staging scratch buffers are created at init to use for transfering data to the device. These scratch buffers are thus allocated once and reused for the lifetime of the renderer. This prevents allocating and freeing buffers for this purpose all the time (likely every several per frame). When the scratch buffer is full, the old procedure of simply creating a buffer for this purpose is used. This functionality is implemented in the auxiliary function
allocFromScratchStagingBuffer
, which will return a struct with info on which buffer to use and at what offset, and if this was a custom-made buffer that needs to be freed afterwards.Minor changes:
Added config:
Related issues:
Result
Overall, I see a 3x speedup of simple
bgfx::frame()
times in the profiling data.UpdateDynamicVertexBuffer
has a speedup of 30x almost!CommandQueueVK::finish()
almost 8x faster.A few before/after profiling screenshots for regular SilverNode usage (simply navigating UI and rerendering UI every frame with vg-renderer): Yellow is with this PR using scratch buffers, red is before.
UpdateDynamicVertexBuffer
: (not allocating temporary memory each time)CommandQueueVK::finish()
: (this time not freeing the temporary memory anymore)Exec commands pre
ContextVK::submit
bgfx::frame():
These zones were profiled thanks to additional profiled scopes (also in this PR).
Now most of the runtime of the vulkan backend is actually timed if the profiler is enabled: