wip: 16bit shader conversions #1581

Julusian · 2024-10-07T16:51:39Z

This is something I started late last year, but haven't had a motivation to finish it. So I am pushing it here, in case someone wants to use it as inspiration or to copy pieces.

The work here was focussed on SDR 16bit compositing. At some point that would have evolved to HDR, but it wasn't considered yet. The intention being to get lossless SDR 10bit yuv through the system, rather than the slightly lossy flow that we have today.

The basic design was to on the producer side, to replace the point where we tell opengl to copy a buffer into a texture, with an opengl compute shader. This would allow us to do yuv->rgb conversion, and even to unpack certain common and packed formats, such as the decklink yuv10 packing. This was not implemented yet.
This would also mean that the composite shader could have the existing colour format handling code removed.

The hope was that doing it here (where opengl is likely already doing a copy, and rearranging the bytes) would have minimal cost on memory, and minimal cost on gpu power. I was trying to avoid doing this on the cpu, as in my experience that is typically under higher pressure (decoding video and deinterlacing). compute shaders are supported in our current minimum opengl version.

On the consumer side, the intention was to do something similar and using a compute shader to do the final copy from the composited texture into the buffer that is copied into cpu memory.
The intention is that the key_only and subregion options in the decklink consumer would make their way into this converter, so that only the subregion needs to be converted and downloaded from the gpu, and at the same time allow other consumers to support the same flows with very little additional code.
This does carry risk of doing more downloads from the gpu than before, but I don't expect that this will be a bottleneck for anyone. Some googling suggests that pcie has separate upload and download bandwidth, and we are much more likely to hit the limits of upload before download.

Whether the cost on bandwidth will be greater than cpu conversions is easily calculable.
For 2 decklinks doing 10bit yuv, that equates to ~2x21bpp, less than downloading as 16bit rgba, this would be the situation for a channel doing f+k. Adding anything more would equate more bandwidth.
And perhaps more importantly, for progressive channels, those buffers could be handed straight to the decklink driver without a copy. Which would relieve cpu memory pressure compared to a cpu conversion as well as cycles.

And by using compute shaders on all downloads, it lets us to use a texture format that glsl will prefer, using a float based format instead of int.
Another aim was to change the compositing to linear-rgb, and handle srgb in the conversion shaders. That has not been investigated at all.

This consumer portion is fairly implemented, with a working (but not verified for accuracy) decklink v210 implementation.
To support this, when constructing a consumer, it is passed a frame_converter, which it can use to convert the const_frame into whatever format it prefers. As part of this, the intention is to remove the 8bit rgba buffer off const_frame, so that it has to also be fetched through the frame_converter, this has not been done in this POC, to avoid breaking every consumer.

For the status of this, it is possible to high bit depth ffmpeg clips, or 16bit pngs, and output then in gpu generated yuv10 out of a decklink. The decklink consumer doesn't support k+f when fed yuv10 frames, but can be done with a second port set to key-only using the sync-group added previously. (I wanted to explore using the 3D api to support k+f on the 4k extreme cards)

A lot of things are hardcoded in testing setups, as this didn't progress beyond a POC.

…face

This reverts commit f22f6a2.

Julusian · 2025-01-12T14:32:51Z

@niklaspandersson I'm curious what your (and anyone else at nxtedition who might care) thoughts are on this approach, or if you disagree with some of my assumptions or preference for gpu work. The code is a bit of a mess so don't look in too much detail at it, and it needs a rebase following the merging of your hdr work.
I am considering picking some of this up but want to make sure the design/approach wont get complained about later.
It needs some thought about handling hdr, but I suspect that simply would be another thing for the conversion shaders to consider, depending on the format (or perhaps a different set of conversion shaders for hdr?).

Long term, the conversion shaders may want to handle converting between sdr and hdr, but from what I have read about tone-mapping, that doesn't sound fun.
Doing sdr to hdr for producers may be needed to allow non-hdr producers to work (html) or clips, but it sounds like that can be done with simple maths rather than tone mapping.

On an unrelated note, I had a quick dig into cef following their new implementation of shared-texture support. That looks a lot more likely to be stable so might be worth bringing back.
It also looks like it might be possible to modify cef to change the pixel format of the textures to 16bit. I don't know if it will composite internally at that, but I would hope so. I didn't think to check about hdr, but it must be possible to enable that somehow in chromium too (its been supported in Chrome for years).

Julusian and others added 30 commits December 30, 2023 14:50

wip: basic format mapping

a85012c

wip

8cd4cfc

wip

bf12120

wip

8a2a85b

wip

73fab17

things are hooked up, but has no output

c58ad0e

nope

90a6ecd

wip: something happens!

efd99f6

hack a mess

467062e

Add 16bit support to ogl texture

1f07c7f

add 16bit support to ogl device

a940412

Add create_frame override to specify bit_depth in frame_factory inter…

1332a46

…face

add native_depth property to caspar::array

8f87171

add 16bits support to image_mixer

723419d

wip: correct colour

b96d98e

simplify

79cde82

add 16bit yuv, untested

8e934cd

wip: propogate frame_converter type to consumers

0c69bb8

wip

57a3624

wip: boilerplate for frame conversion

f6a30be

wip: broke

111e8f7

fix: remove bit_depth property from array

95e30ec

wip: hackily expose composited texture inside const_frame

6618c4b

fix

1e3b38a

wip: incorrect conversion, but something semi identifiable

25fad45

fix colour and 8bit texture support

e7fc480

fix: rgba8 download was incorrectly 16bit packed

030308f

fix: remove unused windows only gl code

d6704df

wip: interleave shader and remove clamp

54e42bc

chore: remove some dead code

6155975

Julusian added 20 commits December 30, 2023 15:23

chore: format

a60827e

chore: add todos to ndi producer

9cb0282

wip: tidy

98419f6

wip: reimplement decklink key-only flag

4cf8a2a

feat: minimise cpu image conversions for image producer

4cc8a05

wip: start of 16bit png writing

219ada3

fix: image consumer 16bit generation

8cfc286

fix: image consumer 16bit defined by amcp

f22f6a2

Revert "fix: image consumer 16bit defined by amcp"

7476448

This reverts commit f22f6a2.

fix: typo

a99522e

feat: image producer can work in 64bit

c7984a4

fix: 64bit freeimage endianness

544b516

fix: propogate parameters from print command to image consumer

b9baeca

wip: tidy

60fca06

wip: tidying

69c29f5

wip: tidying

4f58d1a

wip: generic key-only implementation

e0047af

wip: fixes

f07b281

fix: allow 16bit from ffmpeg

bfcedf4

wip: boilerplate for decklink 12bit, but nothing happens

7adf098

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip: 16bit shader conversions #1581

wip: 16bit shader conversions #1581

Julusian commented Oct 7, 2024 •

edited

Loading

Julusian commented Jan 12, 2025

wip: 16bit shader conversions #1581

Are you sure you want to change the base?

wip: 16bit shader conversions #1581

Conversation

Julusian commented Oct 7, 2024 • edited Loading

Julusian commented Jan 12, 2025

Julusian commented Oct 7, 2024 •

edited

Loading