Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unoptimized gamma correction shader math in crt-pi #35

Open
battaglia01 opened this issue Oct 8, 2017 · 4 comments
Open

Unoptimized gamma correction shader math in crt-pi #35

battaglia01 opened this issue Oct 8, 2017 · 4 comments

Comments

@battaglia01
Copy link

There's a few unoptimized lines of code in the gamma correction part of crt-pi.glsl, which is linked for reference here:
https://github.com/libretro/glsl-shaders/blob/master/crt/shaders/crt-pi.glsl

Gamma correction has been noted to be a potential source of slowdown in the code, and also in this thread here. However, all of the math here is really unoptimized, which is likely what is causing the slowdown.

Gamma correction is done on line 190-208. For reference here:

#if defined(SCANLINES)
#if defined(GAMMA)
#if defined(FAKE_GAMMA)
		colour = colour * colour;
#else
		colour = pow(colour, vec3(INPUT_GAMMA));
#endif
#endif
		scanLineWeight *= BLOOM_FACTOR;
		colour *= scanLineWeight;

#if defined(GAMMA)
#if defined(FAKE_GAMMA)
		colour = sqrt(colour);
#else
		colour = pow(colour, vec3(1.0/OUTPUT_GAMMA));
#endif
#endif
#endif

If we assume SCANLINES, GAMMA and FAKE_GAMMA are all defined, the above reduces to the following:

		colour = colour * colour;
		scanLineWeight *= BLOOM_FACTOR;
		colour *= scanLineWeight;
		colour = sqrt(colour);

Is there a reason it's being done like this? All of that is equivalent to

		colour *= sqrt(scanLineWeight * BLOOM_FACTOR)

This saves one multiplication and three assignments per loop! We avoid the unnecessary squaring and subsequent square rooting of colour, and we also don't need to update scanLineWeight as it's never used again in this scope.
we'
I don't know how much the assignments matter or if they're optimized out anyway, but fighting with the emulator over memory accesses has been noted as one of the major causes of slowdown, so worth bringing up...

There's a similar (but slightly trickier) thing you can do with the true gamma correction, not just FAKE_GAMMA, but I'll start here for now to see if I'm on the right wavelength...

@hizzlekizzle
Copy link
Collaborator

Yeah, probably just done that way for code clarity. It'd be worth looking at the assembly to see how much of a difference it makes.

@battaglia01
Copy link
Author

battaglia01 commented Oct 10, 2017 via email

@hizzlekizzle
Copy link
Collaborator

That's a good question. I've used fxc.exe for HLSL shaders, but there doesn't seem to be anything as universally easy to use for GLSL, which probably shouldn't surprise me...

However, it seems this Radeon GPU Analyzer from AMD may be able to do it:
https://github.com/GPUOpen-Tools/RGA/releases

@metallic77
Copy link
Contributor

metallic77 commented May 18, 2023

It gains about 15-20 fps this way in my test. 668 after, 650 before, thats 2-3% difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants