Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate feasibility of SVT-AV1 #134

Open
zeridon opened this issue Apr 18, 2019 · 6 comments
Open

Evaluate feasibility of SVT-AV1 #134

zeridon opened this issue Apr 18, 2019 · 6 comments
Labels

Comments

@zeridon
Copy link

zeridon commented Apr 18, 2019

Today in my feed poped a writeup from netflix about SVT-AV1.

Review if feasible for use:

https://medium.com/netflix-techblog/introducing-svt-av1-a-scalable-open-source-av1-framework-c726cce3103a
https://github.com/OpenVisualCloud/SVT-AV1/

@abitrolly
Copy link

There are at least 3 AV1 mainstream encoders right now (source) and the repo of SVT-AV1 moved.

@yoe
Copy link
Member

yoe commented May 20, 2023

For reference:

I am planning to experiment with av1 encodes for debconf23 this year. Depending on the results, this might be sufficient to switch fosdem, too.

Will report back when there's more to report about.

@yoe
Copy link
Member

yoe commented Oct 27, 2023

FWIW, AV1 has happened for DebConf23 and as a result will likely happen for FOSDEM 2023 as well (with the same or similar settings).

This requires significant CPU time, but I'll talk to server people to figure out what we can do.

@yoe yoe closed this as completed Oct 27, 2023
@yoe
Copy link
Member

yoe commented Oct 27, 2023

Whoops, wrong button.

As an aside, we also figured out how to do AV1 live streams (w/ DASH), and I think it should be possible to do this for FOSDEM as well. You'll want to experiment to make sure, though.

@yoe yoe reopened this Oct 27, 2023
@yoe
Copy link
Member

yoe commented Oct 27, 2023

By request, here's a bit of what needs to be done for AV1 at FOSDEM:

  • For postprocessing, "use the most recent version of Media::Convert and SReview", which already has support. Also a fairly recent version of ffmpeg needs to be available. But not the one in Debian stable, because ffprobe there misprobes the file timings (sigh). Perhaps we can use a self-compiled ffprobe but the ffmpeg in stable, or some such. Otherwise Issues will ensue.
  • For live streaming, two things are necessary:
    • Run ffmpeg somewhere with an AV1 codec and DASH output (it supports that)
    • Serve the files that ffmpeg outputs over HTTPS to users

For reference, the ffmpeg command line in the debconf video team's ansible repository could, depending on template values, end up being something like this:

ffmpeg -i rtmp://localhost/$app/$name -async 1 -vsync -1 -map 0 -map 0 -map 0:v -c:v libsvtav1 -preset 8 -c:a libopus -crf:v:0 23 -crf:v:1 23 -s:v:1 640x360 -s:v:2 320x180 -b:v:2 192k -maxrate:v:2 256k -b:a:0 128k -b:a:1 64k -adaptation_sets "id=0,streams=v id=1,streams=a" -f dash /path/to/dash/$name/stream.mpd

A bit of an explanation of that:

  • -map 0 -map 0 -map 0:v: create three video and two audio streams. Every time you use -map, you copy an input stream (numbered from zero, we only have one here so we select that one every time). If you don't specify a substream, we select all streams (here, video and audio); if you do, then you only select that substream (either a number for the Nth stream, or a stream type specifier, here 'v' for video). Since we specify "all streams" twice we get audio and video twice, then we specify "only video" one more time, so we have 3 video and 2 audio streams.
  • -c:v libsvtav1: use the "libsvtav1" encoder for all output video streams.
  • -preset 8: set the quality/speed tradeoff of the svt-av1 encoder to 8. Can be a value between 0 and 13; higher values encode faster but at lower quality.
  • -c:a libopus: use the "libopus" encoder for all output audio streams.
  • -crf:v:0: set the CRF value for the first video stream to 23.
  • -crf:v:1: set the CRF value for the second video stream to 23.
  • -s:v:1 640x360: scale the second video stream to 640x360 pixels
  • -s:v:2 320x180: scale the third video stream to 320x180 pixels
  • -b:v:2 192k -maxrate:v:2 256k: set video bitrate target and limit values for the third video stream. This makes ffmpeg try to output 192k/s, and try really really hard to never go over 256k/s.
  • -b:a:0 128k: set the bitrate target for the first audio stream to 128k
  • -b:a:1 64k: set the bitrate target for the second audio stream to 64k
  • -adaptation_sets "id=0,streams=v id=1,streams=a": tell ffmpeg that you want a DASH manifest with an adaptation set for video ("id=0, streams=v") and one for audio ("id=0,streams=a"). An "adaptation set" is the part in the manifest which is claimed to be identical modulo bit rate values, and the user's player can switch between them transparently at the end of a fragment if their buffer is getting too low.
  • -f dash /path/to/$name/stream.mpd: tell ffmpeg that you want to write a DASH manifest in a given directory, with all the segment files in the same location. It is this directory that needs to be exported over HTTPS.

I ran the example command line on my laptop (Model name: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz, is what /proc/cpuinfo tells me), and managed to livestream 720p video this way with no issues, and we even still had space on the same machine to also encode H.264 for the HLS stream in our tests.

Obviously different machines may have different requirements; if the used CPU at FOSDEM does not have sufficient CPU power, bumping the -preset value somewhat is advisable. One could also consider playing with the number of encoded streams, but since the other two are scaled down, their CPU requirements are a fraction of the unscaled one so in my experience that wasn't really worth it.

Note: we don't specify -crf:v:2, and we also don't specify -b:v:0, -b:v:1, -maxrate:v:0, nor -maxrate:v:1. This is not an accident and it is not an oversight. If you use -crf, then you set ffmpeg to "constant quality" mode, in which it tries to get a constant video quality while varying bit rate. If you use -b and optionally -maxrate, then you set ffmpeg to "constant bitrate" mode, in which it tries to get a constant video bitrate while varying quality. This is obviously a tradeoff that reasonable people can disagree on reasonably.

So what this all means is:

  • We create three video streams; one unscaled, transcoded at preset 8 and crf 23; one scaled to half the size, transcoded at preset 8 and crf 23; and finally, one at a quarter of the size, transcoded at preset 8 and target bitrate of 192k.
  • We create two audio streams; one at 128k, one at 64k.
  • The end user's media player can dynamically switch from the unscaled video stream to one of the scaled ones, and can similarly dynamically switch from the 64k audio stream to the 128k one.

@yoe
Copy link
Member

yoe commented Feb 5, 2024

Everything appearing on video.fosdem.org is transcoded to AV1 for 2024. Since this doesn't need to be a live stream, it uses preset 6 rather than preset 8, but even so the resulting files average about 10 times smaller than the mp4 ones (which are unmodified copies from the live stream).

preset 8 will make the files somewhat larger, but not a whole order of magnitude, so I think it is definitely worth looking into for 2025.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants