CUDA accelerated PSNR #1175

gedoensmax · 2023-03-20T18:16:22Z

The speedup that we see is very significant GPU compared to CPU, this scales well for higher resolutions.
When used with FFmpeg this is especially important as also omits a needed PCI copy when using the hardware decoders. When i find more time i will do the same for SSIM but this is a little more work.

./libvmaf/build/tools/vmaf --reference ../data/reference_1080p_yuv420p.yuv --distorted ../data/distorted_1080p_yuv420p.yuv --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 -o res/test_gpu.json --json --feature psnr_cuda
>>> VMAF version f52a8d72
>>> 128 frames ⠋⠉ 303.32 FPS
>>>  vmaf_v0.6.1: 99.867883

./libvmaf/build/tools/vmaf --reference ../data/reference_1080p_yuv420p.yuv --distorted ../data/distorted_1080p_yuv420p.yuv --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 -o res/test.json --json --feature psnr
>>> VMAF version f52a8d72
>>> 128 frames ⠋⠉ 204.50 FPS
>>> vmaf_v0.6.1: 99.867883

gedoensmax · 2023-03-20T18:17:51Z

Oh this will also contribute to ffmpeg as a colleague of mine has been experimenting with 8K footage and saw that there is no GPU accelerated PSNR as of now in ffmpeg. (At least not to our knowledge)

gedoensmax · 2023-03-20T18:18:41Z

Based on #1174

BlueSwordM · 2023-04-02T07:19:04Z

This looks interesting, but this doesn't have a lot of value considering it's still PSNR at the end of the day.

Instead, I believe some focus should be on GPU accelerating much more powerful metrics like butteraugli and ssimulacra2 respectively:
https://github.com/cloudinary/ssimulacra2

gedoensmax · 2023-04-02T07:56:39Z

The motivation behind this is to not hold CUDA VMAF backe because of PSNR. If video is decoded accelerated it is already in GPU memory and would have to be downloaded to CPU just to calculate PSNR.

gedoensmax · 2023-07-17T14:27:08Z

@kylophone could you give this a review/test ?

kylophone · 2023-07-31T16:58:51Z

I tested this and there was a speed regression for vmaf only with raw inputs, likely due to the chroma copy.

gedoensmax · 2023-07-31T17:01:11Z

Yes that can be true, in ffmpeg that should not be happening. Can you put any numbers behind that speed regression?

gedoensmax · 2023-08-18T09:06:52Z

@kylophone any update on this ? As said the big benefit comes from using this with ffmpeg: GPU decode + GPU filter. If PSNR has to be calculated on the CPU the GPU data has to be downloaded and blocks processing a lot.

gedoensmax · 2023-10-26T23:36:55Z

@kylophone Do you see the speed regression on the standalone tool as a blocker ? In ffmpeg this would not lead to a compression due to either using HW decode or overlapping with the kernels which the standalone tool cannot do (blocking fread in the main thread).

gedoensmax added 5 commits August 18, 2023 11:12

psnr CUDA implementation

88f01b2

pick feature extractor by feature

09274bf

simplified event handling

11f0382

fix launch dimensions dwt

2716d73

minor cleanup

4e889f9

gedoensmax force-pushed the psnr branch from e61667f to 4e889f9 Compare August 18, 2023 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA accelerated PSNR #1175

CUDA accelerated PSNR #1175

gedoensmax commented Mar 20, 2023

gedoensmax commented Mar 20, 2023

gedoensmax commented Mar 20, 2023

BlueSwordM commented Apr 2, 2023

gedoensmax commented Apr 2, 2023

gedoensmax commented Jul 17, 2023

kylophone commented Jul 31, 2023

gedoensmax commented Jul 31, 2023

gedoensmax commented Aug 18, 2023

gedoensmax commented Oct 26, 2023

CUDA accelerated PSNR #1175

Are you sure you want to change the base?

CUDA accelerated PSNR #1175

Conversation

gedoensmax commented Mar 20, 2023

gedoensmax commented Mar 20, 2023

gedoensmax commented Mar 20, 2023

BlueSwordM commented Apr 2, 2023

gedoensmax commented Apr 2, 2023

gedoensmax commented Jul 17, 2023

kylophone commented Jul 31, 2023

gedoensmax commented Jul 31, 2023

gedoensmax commented Aug 18, 2023

gedoensmax commented Oct 26, 2023