Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

MSAA in GLES2 slow and breaks Fixed Foveated Rendering #70

Closed
NeoSpark314 opened this issue Nov 9, 2019 · 12 comments
Closed

MSAA in GLES2 slow and breaks Fixed Foveated Rendering #70

NeoSpark314 opened this issue Nov 9, 2019 · 12 comments

Comments

@NeoSpark314
Copy link
Collaborator

With the recent change in godotengine/godot#33444 turning on MSAA will result in fixed foveated rendeirng not working anymore as performance optimization.
It still gives the artifacts of FFR but not the performance benefit as godot first renders the scene to an internal framebuffer with full resolution and then does a full screen quad to display.
This means in addition a costly framebuffer copy is performed (similar to the problem
for GLES3 in #45).

The reason for the change was a fix in how MSAA works on desktop for GLES2 and to align the implementation to how godot handles MSAA.
The issue here for Oculus Mobile is at the moment only for information as it cannot be resolved inside the plugin. There were some potential solutions discussed with @BastiaanOlij but these would either be very hacky (by falling back only for android on the old behaviour for external textures) or require a breaking change in the ARVR api.
Another suggested solution by @akien-mga was to introduce an additional option called something like external MSAA in the render settings and then not use current the godot MSAA setting. This should also disable any post processing and sounds like a good solution to the problem.

Below are some performance measurements of the current master (the current state in the demo scene needs for 4xMSAA ~GPU level 3-4 to run while the old version was around GPU level 2-3):

MSAA turned off:
quest_master_msaaOff

4x MSAA:
quest_master_msaa4x

For reference here is a capture of the now reverted (hacky) fix from godotengine/godot@a3ac7a9

quest_oldfix_msaa4x

NeoSpark314 added a commit to NeoSpark314/godot that referenced this issue Nov 9, 2019
This change avoids any post processing in the case an external texture
is used and it also renders directly to the provided external texture.
This addresses GodotVR/godot_oculus_mobile#70
and ignores the changes in godotengine#33444
for the android case.
@m4gr3d
Copy link
Collaborator

m4gr3d commented Nov 9, 2019

@BastiaanOlij @akien-mga this is too high of a cost for a clean implementation. Unless the external msaa option is quick to add, can we revert to the 'hacky' fix for now on Android.

@m4gr3d
Copy link
Collaborator

m4gr3d commented Nov 9, 2019

@NeoSpark314 for a straight comparison, what does the performance looks like when lat is set to 0?

@NeoSpark314
Copy link
Collaborator Author

NeoSpark314 commented Nov 9, 2019

I just reran the test with my test branch and lat 0; the difference is not that significant as in my first test; I need to check if I made some mistake somewhere:
Current Master Reports GPU 2 / 77%:

11-09 15:14:19.153   989  1076 I VrApi   : FPS=72,Prd=45ms,Tear=0,Early=71,Stale=0,VSnc=1,Lat=0,Fov=0,CPU4/GPU=1/2,1344/414MHz,OC=FF,TA=0/0/E0,SP=N/N/F,Mem=1554MHz,Free=1220MB,PSM=0,PLS=0,Temp=27.0C/0.0C,TW=2.68ms,App=9.46ms,GD=0.00ms,CPU&GPU=10.51ms,LCnt=1,GPU%=0.77,CPU%=0.06(W0.10)

While my test branch is at: GPU 2 / 72%

11-09 15:16:00.561  6225  6285 I VrApi   : FPS=72,Prd=45ms,Tear=0,Early=72,Stale=0,VSnc=1,Lat=0,Fov=0,CPU4/GPU=1/2,1344/414MHz,OC=FF,TA=0/0/E0,SP=N/N/F,Mem=1554MHz,Free=1216MB,PSM=0,PLS=0,Temp=28.0C/0.0C,TW=2.69ms,App=8.73ms,GD=0.00ms,CPU&GPU=9.85ms,LCnt=1,GPU%=0.72,CPU%=0.05(W0.09)```

@NeoSpark314
Copy link
Collaborator Author

I could not find a mistake in my measurement above so far. I made a renderdoc trace and in the case of the master branch the full screen quad is rendered (but suspiciously cheap... maybe there is no copy performed in this setup...)

But the other problem is still very easily reproducible with Fixed Foveated Rendering (Fov). I made a Spatial Material with 3 textures:

Master Branch Fov 0:

11-09 15:56:48.880  2459  2527 I VrApi   : FPS=68,Prd=31ms,Tear=0,Early=0,Stale=72,VSnc=1,Lat=0,Fov=0,CPU4/GPU=4/4,2304/670MHz,OC=FF,TA=0/0/E0,SP=N/N/F,Mem=1295MHz,Free=1230MB,PSM=0,PLS=0,Temp=34.0C/0.0C,TW=1.96ms,App=13.72ms,GD=0.00ms,CPU&GPU=21.65ms,LCnt=1,GPU%=1.00,CPU%=0.02(W0.05)

Master Branch Fov 4:

11-09 15:55:57.580 32471 32536 I VrApi   : FPS=66,Prd=31ms,Tear=0,Early=0,Stale=72,VSnc=1,Lat=0,Fov=4,CPU4/GPU=4/4,2304/670MHz,OC=FF,TA=0/0/E0,SP=N/N/F,Mem=1554MHz,Free=1233MB,PSM=0,PLS=0,Temp=34.0C/0.0C,TW=1.95ms,App=15.23ms,GD=0.00ms,CPU&GPU=22.17ms,LCnt=1,GPU%=0.98,CPU%=0.03(W0.05)

Hack Branch Fov 0:

11-09 15:49:59.186 21900 21966 I VrApi   : FPS=65,Prd=31ms,Tear=0,Early=0,Stale=72,VSnc=1,Lat=0,Fov=0,CPU4/GPU=4/4,2304/670MHz,OC=FF,TA=0/0/E0,SP=N/N/F,Mem=1804MHz,Free=1246MB,PSM=0,PLS=0,Temp=33.0C/0.0C,TW=3.12ms,App=15.31ms,GD=0.00ms,CPU&GPU=22.71ms,LCnt=1,GPU%=1.00,CPU%=0.02(W0.05)

Hack Branch Fov 4:

11-09 15:54:27.988 30050 30113 I VrApi   : FPS=72,Prd=31ms,Tear=0,Early=0,Stale=0,VSnc=1,Lat=0,Fov=4,CPU4/GPU=1/2,1344/414MHz,OC=FF,TA=0/0/E0,SP=N/N/F,Mem=1554MHz,Free=1242MB,PSM=0,PLS=0,Temp=34.0C/0.0C,TW=2.72ms,App=9.84ms,GD=0.00ms,CPU&GPU=11.00ms,LCnt=1,GPU%=0.80,CPU%=0.10(W0.17)

And here are the screenshots in the same order;

master_fov0

master_fov4

hack_fov0

hack_fov4

@NeoSpark314
Copy link
Collaborator Author

General question I'm wondering about now: is GPU utilization actually the right performance indicator for the bandwidth cost of a full screen copy (and that supposedly 2.5ms seconds it costs as mentioned here: https://www.youtube.com/watch?v=CQxkE_56xMU&feature=youtu.be&t=2426)?

@m4gr3d
Copy link
Collaborator

m4gr3d commented Nov 9, 2019

@BastiaanOlij @akien-mga The fact that it breaks FFR is IMO reason enough to revert PR #33444.

@NeoSpark314 According to this ovrmetrics blog post, GPU utilization is useful but app GPU time might be a better metrics to measure the performance impact.

@NeoSpark314
Copy link
Collaborator Author

in this case my benchmark from above (the one with Lat 0) reports App=9.46ms vs App=8.73ms;
still not the 2.5ms claimed in the video; but slightly higher increase from a relative point of view compared to the gpu utilization. I might try the Snapdragon profiler later; but I can't get it to run on my Linux machine.

@BastiaanOlij
Copy link
Member

@m4gr3d, the problem is that it's more then just a hack, MSAA was simply broken due to the introduction of external textures, we can't leave broken features in place just because there is one use case where it half worked, especially seeing Quest users are only a small group of the total Android users that may be effected by the hack.

For me the ideal solution is where we enhance the plugin system to provide a properly setup FBO instead of just the texture when we have the plugins supply the render targets, and in doing so completely disable both pre and post processing done in Godot. But that is a 4.0 enhancement imho best combined with other changes we want to do to the GDNative XR Interface..

An in between solution for 3.2 would be to just add a switch to enable MSAA on the external textures FBO and make sure we don't use Godots own MSAA solution (as that just creates buffers that remain unused and result in even more hacky code in the rendering loop to ignore Godots MSAA when we're using an external texture). I also vote to make that a setting on the viewport instead of a project setting. That way this feature can also be used to optimize desktop VR (the Oculus Rift S will benefit from this improvement too) while we can still use MSAA for anything rendered to screen if we chose to.

It will also allow us to evaluate the impact on GLES3 and start thinking about how we want to solve this properly in Godot 4.0 when Vulkan support is done. Quest, Rift S and OpenXR will all need to have a solid way to provide the render target Godot renders into replacing Godots own internal render target.

@BastiaanOlij
Copy link
Member

Ok, this is my take on solving this:

godotengine/godot#33518

@NeoSpark314 already tested it out, would be good to get some more feedback.

@m4gr3d
Copy link
Collaborator

m4gr3d commented Nov 11, 2019

@BastiaanOlij I've tested the PR, and it works as expected!

@BastiaanOlij
Copy link
Member

Sweet! lets hope it gets merged soon :)

@m4gr3d
Copy link
Collaborator

m4gr3d commented Nov 11, 2019

Fixed by godotengine/godot#33518!

@m4gr3d m4gr3d closed this as completed Nov 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants