Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA accelerated PSNR #1175

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

CUDA accelerated PSNR #1175

wants to merge 5 commits into from

Conversation

gedoensmax
Copy link
Contributor

The speedup that we see is very significant GPU compared to CPU, this scales well for higher resolutions.
When used with FFmpeg this is especially important as also omits a needed PCI copy when using the hardware decoders. When i find more time i will do the same for SSIM but this is a little more work.

./libvmaf/build/tools/vmaf --reference ../data/reference_1080p_yuv420p.yuv --distorted ../data/distorted_1080p_yuv420p.yuv --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 -o res/test_gpu.json --json --feature psnr_cuda
>>> VMAF version f52a8d72
>>> 128 frames ⠋⠉ 303.32 FPS
>>>  vmaf_v0.6.1: 99.867883

./libvmaf/build/tools/vmaf --reference ../data/reference_1080p_yuv420p.yuv --distorted ../data/distorted_1080p_yuv420p.yuv --width 1920 --height 1080 --pixel_format 420 --bitdepth 8 -o res/test.json --json --feature psnr
>>> VMAF version f52a8d72
>>> 128 frames ⠋⠉ 204.50 FPS
>>> vmaf_v0.6.1: 99.867883

@gedoensmax
Copy link
Contributor Author

Oh this will also contribute to ffmpeg as a colleague of mine has been experimenting with 8K footage and saw that there is no GPU accelerated PSNR as of now in ffmpeg. (At least not to our knowledge)

@gedoensmax
Copy link
Contributor Author

Based on #1174

@BlueSwordM
Copy link

This looks interesting, but this doesn't have a lot of value considering it's still PSNR at the end of the day.

Instead, I believe some focus should be on GPU accelerating much more powerful metrics like butteraugli and ssimulacra2 respectively:
https://github.com/cloudinary/ssimulacra2

@gedoensmax
Copy link
Contributor Author

The motivation behind this is to not hold CUDA VMAF backe because of PSNR. If video is decoded accelerated it is already in GPU memory and would have to be downloaded to CPU just to calculate PSNR.

@gedoensmax
Copy link
Contributor Author

@kylophone could you give this a review/test ?

@kylophone
Copy link
Collaborator

I tested this and there was a speed regression for vmaf only with raw inputs, likely due to the chroma copy.

@gedoensmax
Copy link
Contributor Author

Yes that can be true, in ffmpeg that should not be happening. Can you put any numbers behind that speed regression?

@gedoensmax
Copy link
Contributor Author

@kylophone any update on this ? As said the big benefit comes from using this with ffmpeg: GPU decode + GPU filter. If PSNR has to be calculated on the CPU the GPU data has to be downloaded and blocks processing a lot.

@gedoensmax
Copy link
Contributor Author

@kylophone Do you see the speed regression on the standalone tool as a blocker ? In ffmpeg this would not lead to a compression due to either using HW decode or overlapping with the kernels which the standalone tool cannot do (blocking fread in the main thread).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants