Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an implementation of Stack Blur as a juce::ImageEffectFilter #1049

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

ImJimmi
Copy link
Contributor

@ImJimmi ImJimmi commented Apr 4, 2022

Adds an implementation of the Stack Blur algorithm as described here: https://observablehq.com/@jobleonard/mario-klingemans-stackblur

Stack blur is a significantly faster blurring algorithm than the existing Gaussian blur, especially at higher blur radiuses. Here's a graph showing the render time in Milliseconds for increasing blur radiuses for Gaussian Blur, multi-threaded stack blur, and single-threaded stack blur:

Screenshot 2022-04-04 at 15 36 11

Note how the Y-axis is logarithmic. Even without the use of the thread pool, Stack Blur is around 7x faster than Gaussian Blur at a blur radius of 25px. With the thread pool, Stack Blur is around 38x faster than Gaussian Blur.

To look at it another way, the maximum framerate you'd get from the Gaussian blur at 25 blur radius would be ~1.3FPS. With Stack Blur using a thread pool also with 25 blur radius you could achieve ~48FPS.


Stack Blur (top) also gives a much "smoother" blur than Gaussian (bottom), which tends to 'smudge' elements, especially noticeable at the edges of images:

comparison


This PR also adds a new Blur Demo to the Demo Runner example project. The demo shows the differences between the two available blur techniques in JUCE and their respective render times, with a slider to adjust the blur radius.

Screenshot 2022-04-04 at 15 58 19


This is an extension of a previous PR made here: #934. However this version is written from scratch in a more JUCEy way.

This work was initially inspired by this thread on the forums: https://forum.juce.com/t/faster-blur-glassmorphism-ui/43086

@ImJimmi ImJimmi force-pushed the feature/StackBlur branch from 8ae769f to a544085 Compare April 4, 2022 15:22
@sudara
Copy link

sudara commented Jun 9, 2022

@ImJimmi I'm pretty excited by the work you've done here. I have my own implementation of stack blur, but I never did the profiling.

With Stack Blur using a thread pool also with 25 blur radius you could achieve ~48FPS.

I'm wondering what size image is this on? It would be nice to know if blurring X amount of pixels is possible at > 60fps...

Also, do you run into any further optimization ideas? Web browsers do this very efficiently, but they are GPU based.

Ran into this cool thing too tho: https://developer.chrome.com/blog/animated-blur/

@ImJimmi
Copy link
Contributor Author

ImJimmi commented Jun 15, 2022

@sudara

I'm wondering what size image is this on? It would be nice to know if blurring X amount of pixels is possible at > 60fps...

Those benchmarks were with the new demo I added to the DemoRunner using whatever size the component is - I think about 640x480 based on the screenshot above.

60fps would be easily achievable on a smaller component, say a drop-down menu. But even with the best blurring algorithm, blurring a large image on a CPU is not going to be particularly performant.

Also, do you run into any further optimization ideas? Web browsers do this very efficiently, but they are GPU based.

The two biggest performance gains (other than the Stack Blur algorithm itself) came from:

  • Using a thread-pool which allows for multiple chunks of the image to be processed in parallel - I've @aceaudio to thank for that idea!
  • Keeping dynamic allocations to a minimum. Initially I was dynamically only allocating the minimum amount of memory required using a juce::Array, however that meant allocating that memory for every row and column of the image on the heap. By limiting the blur radius and then always allocating the max amount of memory required on the stack, you get much better performance.
    Oh, looks like I changed that again in the final version... that might be something to look at again.

The only other thing I did look at was trying to align the data so you're always accessing it in order. Reading from a container by iterating one index at a time is much quicker than stepping over large chunks. However that's exactly what you have to do to process each column of the image, you read one pixel and then jump NUM_PIXELS_WIDE forward in the memory to get the pixel below it. So I tried to align the memory beforehand so you're always reading one pixel after the next, however that was actually slower in the end since the overhead of aligning the data was more than inefficiency of reading the non-aligned data.

@sudara
Copy link

sudara commented Jun 15, 2022

Those benchmarks were with the new demo I added to the DemoRunner using whatever size the component is - I think about 640x480 based on the screenshot above.

Ok, great! Just wanted to confirm that the benchmarks were from that!

Using a thread-pool which allows for multiple chunks of the image to be processed in parallel

This is pretty interesting, wouldn't have thought of it!

The only other thing I did look at was trying to align the data so you're always accessing it in order.

Yeah, I was also wondering if there was an opportunity for matrix multiplication here. But without an ability to run that on a gpu... maybe no gainz possible.

The only other thing that I've noticed is impactful on my end is providing tie-in to JUCE's components.

For example, I use stack blur for drop shadows and there's all sorts of optimization techniques like having the shadow be in its own container, cached by setBufferedToImage etc. To some degree, this sort of optimization could be under the hood (avoid re-calculating the same blur over and over during painting).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants