-
-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Performance Tests #240
Comments
As @tcoopman pointed out in another thread - it's important to have targeted scenarios and be able to validate with measurements that performance has changed. Performance is complex and in some cases our assumptions about what is fast may not actually be correct, so having as much tooling as we can to verify it will be helpful. |
I'm wondering if 2 levels of performance tests are necessary/useful:
The low level benchmarks could be done with something like Core bench and test some performance critical functions, like for example creating the styles. The higher level stress tests could be something like a Game Of Life that we try to track the performance for. There should be some infrastructure for running both kind of tests (the benchmarks should be easy enough to add them to a CI pipeline - although I'm not sure how reproducible the results would be). The higher level test are probably more work. |
Commenting here because I don't want to open a new issue and this is sort of relevant. I went back and tried the examples using
Looks like each individual piece of work sums up to about 7ms, with the other 90ms coming from elsewhere. I can't imagine such a large amount coming from GC especially after tuning since OCaml usually has a lightning fast GC. Note that everything is smooth at the default resolution, performance only explodes once I maximize (and it's obviously more severe than normal here because we're printing to stdout). @bryphe Any tips on how to profile Revery? Would love to dive into this and figure out what's causing the big slow down but I have no idea how to get an |
Thanks for your help investigating @OhadRau ! My suspicion is that in your case - somehow its using a software rendering mode instead of actually leveraging the GPU. The 97.51 ms is indeed way too long for a single frame render. That's barely 10FPS. Most of the work the GPU does is actually in this
If the bottleneck is either a software rendering mode, or some other GPU/driver issue, this is where we'd hit a performance issue. If it happens that the glfwSwapBuffers is fast, than we can look elsewhere. Another potential culprit could be the If there are still no answers - there's a few other tools here that we could look at using for actual profiling: https://www.khronos.org/opengl/wiki/Debugging_Tools Also - @OhadRau - did you try building + running outside of WSL, using Windows |
Looks like your intuition was correct about swapping buffers. It was spending about 9ms/frame before maximizing and ~80ms/frame after when rendering with native OpenGL disabled in VcXsrv. I attempted to turn it on, but that actually brought the numbers up to ~11ms vs. ~130ms (not sure if it was actually utilizing OpenGL here or not). FWIW I'm using an iGPU rather than discrete graphics, not sure if that has any effect. Haven't yet tried running natively on Windows because I haven't gotten all the libraries and tools set up yet. If anyone has a binary precompiled I could run that, otherwise I'll try to get everything installed and see if it works any better. |
Performance is a feature!
It's important that we can target scenarios, and guarantee - build-over-build - that we aren't regressing performance. This is always very challenging to hook up and handle in CI - even on the same machine, variations in environment state can impact timing measurements, so we should look to more deterministic methods to validate our performance.
One of the biggest enemies of real-time performance in a language like Reason / OCaml is the garbage collector - a surprise major collection can easily knock out a frame or more! Testing the GC behavior was difficult (impossible) in Electron-based apps, at least in my experience
Luckily - the garbage collector is deterministic in Reason / OCaml - so we can actually create 'benchmarks' and record the minor/major allocations, and verify that code that we put in these benchmarks never causes more allocations (and validate our assumptions that performance fixes are actually fixing performance).
In addition - we could instrument computationally heavy code paths with Performance 'counters' - and validate that these counters never increase (again, with a snapshot-style test).
This is tracking creating infrastructure for us to create performance tests. It should be as easy to run these as
esy bench
, and easy to author and add new scenarios to put under test.The text was updated successfully, but these errors were encountered: