Guide: How to create an ultra quality HEVC video pipeline with hardware cost of less than $500 #1585
Replies: 5 comments
-
To give an example on how this can be used: We are using this setup to do streams of productions and partys of the "demoscene", a UNESCO-recognized international art community producing music, graphics, code and "demos", which are self running realtime animations on various platforms, including old home computers: https://en.wikipedia.org/wiki/Demoscene Based on the mini pc transcoder setup described above for this I had removed the inner parts of old HP servers, and replaced them with 7 of those Mini-PCs. This way I able to transcode up to seven streams in parallel. Here is how this hacked stuff looks like in real-life: https://www.youtube.com/watch?v=1YmTKNaN07M The result of the project you can see here - however, outside of those "demoparty" events we are only streaming demos. For this, we have pre-encoded the demos as a HEVC mpeg transport stream, but you'll get the idea: |
Beta Was this translation helpful? Give feedback.
-
Here is the script I used to prepare the recordings of the demos to be live-streamed:
As parameter you pass a directory containing videos. As you can see any aspect ratio will do. Right now we re-encode to fixed 60fps as we are still experimenting with dynamic FPS changes in a stream. Here is the script we use to then stream those prepared demos in random order:
The reason why there are two ffmpeg instances piping to each other is that if you would have a single ffmpeg, it would reconnect SRT for every video, causing gaps/buffering in the streaming. By piping it, all videos become a single MPEG transport stream. As some videos may cause errors etc, the script above is wrapped in another script:
This way even should the above ffmpeg pipe loop abort, stuff will reconnect and keep streaming. |
Beta Was this translation helpful? Give feedback.
-
Your work is absolutely amazing! |
Beta Was this translation helpful? Give feedback.
-
Thank you. Not as amazing as OME, though :) The one thing I sadly can't share due to proprietary code is the backend server. What this does is check what Edge servers are alive, and what load they have, and then uses GeoIP to select the five servers with the lowest load that also are geographically nearest to the user. So we have load-balancing based both on location and load, and broken Edge-servers are automatically removed from the set. Same goes with channels/streams: The backend checks what streams are available right now, and updates the UI accordingly. |
Beta Was this translation helpful? Give feedback.
-
Updated the guide with some purchase links, screenshots and an Edge install script. |
Beta Was this translation helpful? Give feedback.
-
(Please note: This Guide is work in progress. I plan to add some screenshots later. Feel invited to ask about sections that should be extended.)
Guide: How to create an ultra quality HEVC video pipeline with hardware cost of less than $500
The basics - how to get h.265 HEVC done with cheap and energy-efficient hardware
Most live streaming today is done with the rather outdated h.264 video standard. This is because h.264 is well-established, and encoders for it are cheap and common, and patent situation somewhat settled.
However, it's outdated. H.265 (also called HEVC) can achieve the same visual quality of h.264 at half the bitrate. Or, the other way round, massively better visual quality at the same bitrate as a h.264 stream. So: We want HEVC.
But how? Doing HEVC encoding in Software requires massive CPU power. You either need a very fast and expensive CPU, that will be burning 100 Watts for the encode, or a NVidia graphics card which is expensive. And when doing encoding on the CPU in Software, you formally also will be doing a violation of the HEVC patent portfolio. This is not good. Far too expensive.
But what many people don't: Most Intel CPUs already include a h.265 encoder in hardware. And Intel is paying the patent license fees for it.
When it comes to streaming, Intel Quicksync (QSV) is underrated and often overlooked. What even many people experienced with streaming don't know: The same identical QSV engine is present in all Intel CPUs - it's the same in a $500 Core as in an $25 Atom CPU - same quality, same speed. And while other solutions that you can sometimes find included for example in cheap ARM CPUs, the Intel QSV hardware encoder actually is of pretty high quality.
Hardware selection
A really nice CPU for this mission is the Intel Celeron N5105. This is used in a gigantic amount of chinese Mini PCs, at prices starting at around $150 including RAM and disk.
Recently, MiniPCs with the successor, the Intel N100 became available. The CPU is about 30% faster than the N5105, and a bit more energy efficient.
There is also the N5095. The QSV engine is the same, but the integrated graphics are slower than on the N5105. Saving the $5 or so compared to a N5105 based model therefore does not make sense.
On the encoder PC, we will be running Windows 10 or 11 and a recent version of OBS. If you want to invest the extra $50 or so, for the Windows encoder go with the faster N100, as it will allow you to add some effects or plug-ins to your OBS without running out of CPU power.
Make sure NOT do buy one that has a J6426 CPU - those are totally outdated and too slow. There are also versions with the N6005. This is clocking a little bit faster than the N5105, but gets much hotter. Not worth it.
When looking for the MiniPCs you will find out that some are marketed as "Firewall Router" and are passively cooled, while others are even smaller, but have active cooling. What to chose? For the transcoder (origin) it does not matter much, provided that it will be placed in a room with reasonable ambient temperature, or even a server rack with air flow.
The the encoder, you often want total silence so you don't have any background fan noise in your broadcasts. The challenge here is that with two video inputs and transcoding etc, your system will run at very high load for hours. The tiny fans of those mini PCs can be quite loud. On the other hand, the passively cooled "firewall routers" enclosures can become so hot that you can seriously burn your fingers on them. In my case, I therefore decided to do a modding: I took a passively cooled "firewall router", and then mounted a big 92mm 5V USB fan on top of it. I invested the the $15 for a Noctua NF-A9 5V fan. Noctua is famous for creating the most silent high-quality fans available on this planet. You can't hear the NF-A9, but even after hours of broadcasting, it keeps the enclosure temperature at below 50°, safe to touch.
To summarize, here are my minimal recommendations on the hardware:
Most of the devices sold as "firewall appliance" have multiple 2.5Gbit Ethernet ports, using Intel i226-V Ethernet controllers. That's a good networking chip. The older version i225 has some bugs and should be avoided.
For reference, here is the OBS encoder PC I have selected and assembled. Links go to Amazon Germany.
https://www.amazon.de/gp/product/B0CPLPX78C/
Added fan:
https://www.amazon.de/gp/product/B07DXVZ9B2/
Added fan grill:
https://www.amazon.de/gp/product/B0B2135Y9W
Result:
And here is the passively cooled Transcoder/Origin PC I bought from Aliexpress. I can recommend that vendor "Topton":
https://www.aliexpress.com/item/1005004011402622.html
(You want Model A or Model B N5105, not Model C as that has fins that are too small to get proper cooling)
Optional: HDMI Capture inputs
If you want your Encoder to have HDMI input (for cameras or for streaming the output of a PC), you can get a cheap encoder dongle using the MacroSilicon MS2130 chip. Please note that other cheap "FullHD" or "4k" capture sticks typically are crap, as they are only using USB2.0, which means the chip will already heavily compress the video before it has even reached your encoder. So: Make sure the stick uses the MS2130 chip, which really sends uncompressed 1080p YUV video over USB3. Here is a vendor I have bought from:
https://www.aliexpress.com/item/1005004912019386.html
Yes, it really is only $14.
Note: When streaming for a long time, these can get hot, too. I recommend not plugging them in directly into your encoder PC. Instead buy a short 25cm USB3 extension cable, and plug in the capture stick there. If you have done the fan mod described above, you can then attach the encoder stick on top of that fan grill for perfect cooling.
An Intel N5105 Mini-PC will be fast enough to handle two of those capture devices in OBS, while still having some CPU left for effects. An Intel N100 will be about 30% faster, including the GPU being a bit faster, and will allow you to add a third capturing source to be mixed.
When using a USB Hub, keep in mind that each capture stick is fully utilizing the 5 GBit/s that USB3.0 (also known as USB 3.2 Gen 1) offers. You therefore should not connect multiple capture sticks to a USB Hub! Instead use a USB Hub for Keyboard, Mouse and potentially USB Audio Interfaces, and make sure the capture sticks are connected to the USB3 ports of the Mini PC directly.
Summary: Money
Assuming you want to have two HDMI capture inputs, the total cost of hardware for the project will be around $450.
The Encoder
For the Encoder, we will be running Windows. As you know, Windows these days is full of crapware. For the encoder, you don't want to suddenly get a popup asking you to install Edge, or the system suddenly rebooting without asking you etc. I therefore recommend installing "Microsoft Windows 10 IoT Enterprise 2021 LTSC". That's a very long name that translates into "Windows 10 without all the crapware, and with 10 years of security updates". Even better is the "Tiny10" distribution, where someone has created a very minimal Windows 10 image with really all crapware removed, using a fraction of RAM usage of a normal Windows install. Legally that's a grey area. If you don't want this grey area, install a normal Windows 10, and then run some decrapifier-scripts to remove all kinds of crap and spyware. In any case make sure to disable the online virus scanner, and the tamper protection (else the virus scanner will be turned back on after the next reboot). This is because else Windows will be scanning your outgoing video stream for "viruses", eating precious CPU time.
The SRT streaming protocol
Most streaming setups today use the RTMP streaming protocol. That protocol is outdated, and does not support HEVC. Being TCP-based, it is also not at all happy with unstable Internet connectivity. Instead we are going to use the SRT protocol, which is modern, supports HEVC and has automatic forward-error correction integrated that works well even if your Internet connection is crappy, for example because you are using LTE/5G or Starlink.
Note: SRT needs to be know the expected end-to-end-latency including time to reconstruct missing data. The default latency appears to have been designed for LANs, not Internet connections. OBS is not setting a sane default either. Due to this people first trying the SRT protocol stop using it frustrated. To set the end-to-end latency, use about 5 times the ping latency you have under load been your encoder and your origin server. If you are on an LTE connection with your origin server being in the same country, assume 100ms real latency, and tell SRT to optimize for 500ms. If you don't need "hard realtime": More is better. To make matters even more confusing, the latency parameter for SRT is not given in Milliseconds, but MICROseconds. I multiple times fell for that.
So, your SRT URL generated with "simple_signed_policy_url_generator.sh", optimizing for an end-to-end-latency of 1 second and the authentication being valid for 30 days should look like this:
So, on the Windows Encoder PC, do a standard OBS install. Configure OBS to use 1920x1080 at fixed 60FPS. Set streaming protocol to custom, and insert the SRT URL generated by OME's signed_policy_url_generator.sh. If you are using the OBS setup wizard you can not click next without entering something into the "key" field, which OME does not use or need. Just enter a space there. In the output section, select "Quicksync HEVC". Select constant bitrate (CBR), set bandwidth to 10000 KBit/s, key-frames to 1, disable B-Frames.
Settings you need to do on the Output tab:
Make sure to set the keyframe interval to 1 s, and Latency to normal. Profile "main" will give far better quality than "baseline". The target usage should be "TU1", because it's a hardware encoder and you won't save anything using worse quality. If you want lower latency you can change the Latency option, but I would recommend against it.
On the stream page, add your SRT url generated with signed_policy_url_generator.sh. Don't forget the latency parameter!
The transcoder / OME Origin
The transcoder ("origin" in OME speak) will be running on Linux. Sadly the driver situation when it comes to QSV under Linux is pretty bad, and it took me a while to find something that works. I'd recommend to take the cheaper N5095 for this. I tried a lot of Linux distributions, but had a hard time of getting QSV to fully work. In the end it turned out that Arch Linux is the best choice. This is because Arch Linux has working packages for the GPU, a matching ffmpeg build, and an always up to date OvenMediaEngine package.
At the time of this writing, OME 0.16.5 still has a typo bug causing the QSV HEVC decoder not getting opened. Due to this I had to build OME from source. By the time you are reading this the bug hopefully will be fixed, so for now I'm skipping instructions on how to build OME under Arch Linux. This is because due to that platform not being supported by OME's configure scripts, some fixes in the build scripts are required. But if you need to do, just open decoder_hevc_qsv.c and replace the string "h265_qsv" with "hevc_qsv" (OME bug #1558).
The origin will be getting HEVC encoded video via the SRT protocol, in our example at 1920x1080 with 60 fps 10 MBit/s. It will then do the following transcodes:
You can change the bitrates as you wish, this does not change the QSV system load. However, it is important to understand that with these resolutions you are using up 80% of QSVs resources. The QSV at 1080p is able to do roughly 250fps of decode/encode tasks in parallel. This means that with one 60fps decode of the incoming stream, we can not have four 1080p transcode outputs. However, having two 1080p and two 720p encodes works.
From the end-user perspective that means that if their browser or phone supports h.265, they will get the stream either at 10, 6 or 2 MBit/s. If we have to fall back to h.264, they can only pick between 6 and 2 MBit/s.
That's not so bad: Almost all Android phones can decode HEVC, as can iphones. On the desktop, Chrome and some other chromium-based browsers like Vivaldi and Brave include a HEVC decoder. For Edge, the user would need to buy a $0.99 HEVC extensions from the Microsoft Store, or just use Edge to download Chrome once, to then uninstall Edge. The only browser that does not support HEVC decoding at all is Firefox, as the mozilla foundation is not willing to pay the patent mafia. Outside of the browser world, the stand-alone players VLC and MPC support HEVC, too. Long story short: In my experience 90% of your viewers will be able to enjoy the HEVC stream.
On the transcoder you can monitor the CPU usage using the "htop" tool. As you can see we have quite some load with those transcodes, but the load is stable, and this works correctly over weeks.
For GPU load, use the "intel_gpu_top" tool. Again, as you can see we have high load, but the transcoder configuration listed above works just fine:
Using the "iptraf-ng" tool I am also monitoring the bandwidth used. The "Incoming rate" are the streams coming from the encoders to us. The "Outgoing rate" is the traffic going to the Edge servers:
Edge servers
But let's complete our streaming pipeline. So far we have the encoder PC, and the transcoder aka OME Origin. Now we need some edge servers that will actually deliver the stream to the viewers. Please note: The bandwidth the Origin has to send to each edge is the sum of the bypass plus all transcodes. In our example configuration above, that is 10+6+2+6+2 = 26 MBit/s than an origin need to send to an Edge. If you only have one or two Edge servers it may be possible to run your origin transcoder at home or in the office provided it has fast fiber Internet and a fixed public IP available. Else it needs to go to a data center.
In my experience in addition to doing the transcodes, a N5105-based "firewall router" Mini-PC with Intel Ethernet NIC will be able to serve 10 Edge-Servers easily (sending out 260 MBit/s). Personally as Edge Servers I use a bunch of cheap rented Virtual Servers (running Rocky Linux 8). The Edges don't have much to do, so they don't need much CPU or RAM, all they need is a good internet connectivity. If your cheap vservers turn out to be able to send out 400 MBit/s in a stable way, and with a typical mix of what stream bandwidth viewers use, using 10 Edge servers in our setup will be good to serve 500 concurrent viewers.
For the Edge servers I use Rocky Linux 8, which is a free version of Red Hat Enterprise Linux (RHEL8), just like Centos used to be.
Here is how to install OME for the edges:
Client side player
So, onto the final piece of the puzzle: The client-side video player.
Sadly OvenPlayer currently does not support doing HEVC detection. On my web page, I therefor have added a simple detection script to find out if the device is able to play HEVC. Depending on the result I redirect to a page that either provides the HEVC or the h.264 HLS playlists. Like this:
On hevc.html I then integrate OvenPlayer using a the HLS HEVC playlist, on avc.html the HLS h.264 playlist.
Beta Was this translation helpful? Give feedback.
All reactions