-
-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux de-profile and msvc compilation fix #5967
Linux de-profile and msvc compilation fix #5967
Conversation
- By default, uses sequentially-consistent std::atomic_compare_exchange_weak instead of hand-rolled asm - USE_CHRONO is approx. 4.3x faster on AMD Ryzen 7 5800X than the hand-rolled rdtsc implementation (50ns per-call overhead vs. 216ns per-call overhead) - Performance on aarch64 is unknown at this time
Continuing from #5943 (comment), I've pushed the fix which enables multithreaded profiler builds on non-x86 platforms to this branch as well. Interestingly enough, when using std::atomics rather than the hand-rolled compare-and-swap implementation, the per-call overhead becomes quite a bit lower - up to 5x lower on my machine, in fact. I'd appreciate a quick check to ensure that the CAS-lock path has a properly zero-initialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good on my end.
On Windows with On RaspberryPi though it works with both and produces a good profile 👍 massive improvement there. I'll look into the CAS |
I cannot see anything logically wrong with the CAS The Windows crash when using zone profiling appears to be some kind of double-delete, maybe, it's really hard to debug even with blocks of I think for now we take the win of what we've got and tackle that Windows issue in it's own PR |
If you have time left today, feel free to zip up a trace (both .json and .html) from the running game on the Pi and send it my way (per the original comment that sparked all this). |
At risk of being "that guy", this is why I build with profiler enabled in the debug configuration - makes it much easier to find crash bugs. |
rpi5-profiler.zip |
This tells me everything I need to know - we're 100% fully GPU-bound on RPi5. (Specifically, I can only assume that this is serializing on a map-write GL call to update the light buffer used in 95% of drawcalls in the previous frame.) An individual physics tick takes ~3ms, which is surprisingly Not Bad for that hardware. Other than the GPU catch-up stall, rendering takes about 2.5ms to record command buffers, and another 3.75-4ms to upload those command buffers as OpenGL API calls. (Pending further insight, I'm going to say that means the software command list is Working As Intended.) I'm actually not sure if there are any external GPU profilers for Raspberry Pi 5. We certainly don't have any in-process GPU profiling, and while it's possible to hack in to our existing profiler implementation, we might be better served by optionally using Tracy as our profiler backend. We'll certainly need to make a few alterations to our rendering architecture to allow embedding timing regions into the draw stream. |
Moving further discussion back to #5700 where it probably should have been posted in the first place 😄 |
Ah ha! This feels slightly hilarious, but the issue with the Zone profiling on Windows is a double delete and I could not figure out where it was coming from or why because it made no sense. However this fixes it...
I think it's because the |
I'll give this a try this afternoon! |
Damn, I had to make some other tweaks to make this build on RPi and I suspect would be the same under x64 Linux builds
|
Damn, disappointingly this causes a SIGSEGV which I'd have to look into later |
This is a nasty one - the HashTable implementation in |
Two minor changes: