Feature: Performance Tests #240

bryphe · 2019-01-22T23:28:42Z

Performance is a feature!

It's important that we can target scenarios, and guarantee - build-over-build - that we aren't regressing performance. This is always very challenging to hook up and handle in CI - even on the same machine, variations in environment state can impact timing measurements, so we should look to more deterministic methods to validate our performance.

One of the biggest enemies of real-time performance in a language like Reason / OCaml is the garbage collector - a surprise major collection can easily knock out a frame or more! Testing the GC behavior was difficult (impossible) in Electron-based apps, at least in my experience

Luckily - the garbage collector is deterministic in Reason / OCaml - so we can actually create 'benchmarks' and record the minor/major allocations, and verify that code that we put in these benchmarks never causes more allocations (and validate our assumptions that performance fixes are actually fixing performance).

In addition - we could instrument computationally heavy code paths with Performance 'counters' - and validate that these counters never increase (again, with a snapshot-style test).

This is tracking creating infrastructure for us to create performance tests. It should be as easy to run these as esy bench, and easy to author and add new scenarios to put under test.

The text was updated successfully, but these errors were encountered:

bryphe · 2019-01-22T23:30:07Z

As @tcoopman pointed out in another thread - it's important to have targeted scenarios and be able to validate with measurements that performance has changed. Performance is complex and in some cases our assumptions about what is fast may not actually be correct, so having as much tooling as we can to verify it will be helpful.

tcoopman · 2019-01-23T07:47:16Z

I'm wondering if 2 levels of performance tests are necessary/useful:

Low level function benchmarks
High level application stress tests

The low level benchmarks could be done with something like Core bench and test some performance critical functions, like for example creating the styles.

The higher level stress tests could be something like a Game Of Life that we try to track the performance for.

There should be some infrastructure for running both kind of tests (the benchmarks should be easy enough to add them to a CI pipeline - although I'm not sure how reproducible the results would be). The higher level test are probably more work.
Next to running them we should also have something in place to detect regressions from commit to commit. So there should be some graphs/tables with historical data or something?

OhadRau · 2019-01-25T02:29:56Z

Commenting here because I don't want to open a new issue and this is sort of relevant. I went back and tried the examples using REVERY_DEBUG=1 and got some interesting results:

-[BEGIN: renderWindows]
--[BEGIN: reconcile]
---[BEGIN: RenderedElement.update]
---[END: RenderedElement.update] Time: 0.632000000001ms Memory: | minor: 16589 | major: 0 | promoted: 0 |
---[BEGIN: RenderedElement.flushPendingUpdates]
---[END: RenderedElement.flushPendingUpdates] Time: 0.442ms Memory: | minor: 11746 | major: 0 | promoted: 0 |
---[BEGIN: RenderedElement.executeHostViewEffects]
---[END: RenderedElement.executeHostViewEffects] Time: 0.480000000001ms Memory: | minor: 18000 | major: 0 | promoted: 0 |
---[BEGIN: RenderedElement.executePendingEffects]
---[END: RenderedElement.executePendingEffects] Time: 0.422ms Memory: | minor: 1205 | major: 0 | promoted: 0 |
--[END: reconcile] Time: 4.095ms Memory: | minor: 48175 | major: 0 | promoted: 0 |
--[BEGIN: layout]
--[END: layout] Time: 0.621000000001ms Memory: | minor: 10099 | major: 0 | promoted: 0 |
--[BEGIN: recalculate]
--[END: recalculate] Time: 0.468ms Memory: | minor: 7084 | major: 0 | promoted: 0 |
--[BEGIN: flush]
--[END: flush] Time: 0.401ms Memory: | minor: 23 | major: 0 | promoted: 0 |
--[BEGIN: draw]
--[END: draw] Time: 1.615ms Memory: | minor: 19822 | major: 0 | promoted: 0 |
-[END: renderWindows] Time: 97.51ms Memory: | minor: 86606 | major: 0 | promoted: 0 |

Looks like each individual piece of work sums up to about 7ms, with the other 90ms coming from elsewhere. I can't imagine such a large amount coming from GC especially after tuning since OCaml usually has a lightning fast GC. Note that everything is smooth at the default resolution, performance only explodes once I maximize (and it's obviously more severe than normal here because we're printing to stdout). @bryphe Any tips on how to profile Revery? Would love to dive into this and figure out what's causing the big slow down but I have no idea how to get an esy build to work with profiling enabled.

bryphe · 2019-01-25T03:13:54Z

Thanks for your help investigating @OhadRau !

My suspicion is that in your case - somehow its using a software rendering mode instead of actually leveraging the GPU. The 97.51 ms is indeed way too long for a single frame render. That's barely 10FPS.

Most of the work the GPU does is actually in this glfwSwapBuffers call in Window.re. We should add instrumentation around that, too:

  Performance.bench("glfwSwapBuffers", () => {
      Glfw.glfwSwapBuffers(w.glfwWindow);
  });

If the bottleneck is either a software rendering mode, or some other GPU/driver issue, this is where we'd hit a performance issue. If it happens that the glfwSwapBuffers is fast, than we can look elsewhere. Another potential culprit could be the glfwMakeContextCurrent.

If there are still no answers - there's a few other tools here that we could look at using for actual profiling: https://www.khronos.org/opengl/wiki/Debugging_Tools

Also - @OhadRau - did you try building + running outside of WSL, using Windows cmd.exe? This would be a very helpful data point. Without configuration it seems the default opengl libs in WSL will not pass-through to windows and use software emulation.

OhadRau · 2019-01-25T04:50:55Z

Looks like your intuition was correct about swapping buffers. It was spending about 9ms/frame before maximizing and ~80ms/frame after when rendering with native OpenGL disabled in VcXsrv. I attempted to turn it on, but that actually brought the numbers up to ~11ms vs. ~130ms (not sure if it was actually utilizing OpenGL here or not). FWIW I'm using an iGPU rather than discrete graphics, not sure if that has any effect.

Haven't yet tried running natively on Windows because I haven't gotten all the libraries and tools set up yet. If anyone has a binary precompiled I could run that, otherwise I'll try to get everything installed and see if it works any better.

bryphe added enhancement New feature or request performance labels Jan 22, 2019

bryphe mentioned this issue Jan 22, 2019

Feature/Update Style API #215

Merged

glennsl added the A-infrastructure Area: Project infrastructure, build system, Ci, website etc. label Nov 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Performance Tests #240

Feature: Performance Tests #240

bryphe commented Jan 22, 2019 •

edited

Loading

bryphe commented Jan 22, 2019

tcoopman commented Jan 23, 2019

OhadRau commented Jan 25, 2019

bryphe commented Jan 25, 2019 •

edited

Loading

OhadRau commented Jan 25, 2019

Feature: Performance Tests #240

Feature: Performance Tests #240

Comments

bryphe commented Jan 22, 2019 • edited Loading

bryphe commented Jan 22, 2019

tcoopman commented Jan 23, 2019

OhadRau commented Jan 25, 2019

bryphe commented Jan 25, 2019 • edited Loading

OhadRau commented Jan 25, 2019

bryphe commented Jan 22, 2019 •

edited

Loading

bryphe commented Jan 25, 2019 •

edited

Loading