You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know this is external but I have nowhere else to post it. I would like request that Microsoft (most notably the Windows Imaging Component team) urgently reviews how it handles white level scaling for HDR images before it becomes too late. With the 24H2 update, WIC has added support for HDR AVIF images, but the white level scaling is wrong (or at least missing essential metadata) and I hope it can be addressed before it becomes too much of a mess to ever sort out.
Historically for SDR images, WIC has left it up to the application to apply any color profile stored in the image. However, now with formats like HEIC and AVIF it does this conversion during the decode process. There is nothing wrong with this per se (well other than certain lack of control, especially when it comes to transcoding images). For HDR AVIF images it produces a 64-bit floating point pixel format which we are left to assume is in the scRGB color space (although no color space information at all is provided). But what is most problematic is it gives us no indication of what pixel value represents diffuse white. In my opinion, the only sensible value for this is (1,1,1). Indeed, in the JPEG-XR specification (https://www.itu.int/rec/T-REC-T.832-201906-I, page 181), it says 'The scRGB perfect diffuse white point is specified by all three colour channels set to a value of 1.0.' Based on exporting a 32-bit floating-point TIFF from lightroom, the value (1,1,1) is also used to represent diffuse white. Using this value to represent diffuse white makes perfect sense, as we can seamlessly combine with SDR images after apply a color transform to may to a common space (and the reference scRGB color profile will map (1,1,1) to and XYZ luminance value of 1, matching what happens for an SDR image e.g. with an sRGB profile). If we want to export the image to SDR, or show it on an SDR display, we need to know the value representing diffuse white, and using (1,1,1) for this means we do not have to do any extra scaling and non-HDR aware software will display the image reasonably well, and if we can make floating point (1,1,1) = diffuse white assumption then no additional metadata is required.
However, it seems that Microsoft has stuck with the mindset that a floating point pixel value of (1,1,1) represents a fixed luminance of 80 nits, going against the quote from the JPEG-XR specification, and this approach causes many, many problems. I am not 100% sure this is the exact mindset, but one thing I do know is that a PQ-encoded HDR AVIF decoded by WIC is treated like this. The PQ standard (https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BT.2408-7-2023-PDF-E.pdf) defines HDR reference white/diffuse white/graphics white as 203 nits (alternatively 58% of way along the PQ curve), and that is definitely not being mapped to (1,1,1) - from testing it does look more like 80 nits is being mapped to (1,1,1). This means that is we want to combine with any SDR graphics, apply effects (many of which work best in the range [0,1]), convert to SDR etc, we are doing this with the wrong white level (given that we have no way of knowing what pixel value represents diffuse white, other than by assuming it's 1,1,1).
In my opinion, by far the best solution to this problem is to treat floating point (1,1,1) as diffuse white. Now when it comes to display, I think it's OK that floating point (1,1,1) represents the fixed luminance 80 nits since we are able to obtain the SDR white level for the display. An HDR capable viewer can then apply scaling of (display SDR white/80) to map (1,1,1) in the image to the SDR white of the display. This is a behavioral change from that idea that seems to be pushed in some of Microsoft's docs, which is that HDR images represent a fixed display luminance, but in my opinion, this doesn't really make sense - if someone in a very bright room creates and image and sends it to someone in a very dark room, they are unlikely to want the display to use the same brightness when viewing it. Mapping the between SDR white levels (which can be specified by the user in their display settings) overcomes this.
The alternative (inferior) solution to this problem would be to ensure that HDR images decoded by WIC to a floating point pixel format contain metadata indicating that pixel value that represents SDR white. This could be a single value (so in the AVIF case it would be 203/80 = 2.5375), or if you really want, you could specify the luminance of (1,1,1) in nits and then the diffuse white point in nits as 80 and 203 respectively, but these absolute values don't really have much meaning).
This also has implications for HDR wallpaper. It is unclear to me what pixel value should represent diffuse white in these images.
In general I think perhaps some of the thinking from Microsoft has come from gaming, which has the unique property that that HDR content is produced and consumed on the same display. In general images are commonly shared and edited between people on multiple different devices etc, and we need to maintain that possibility, especially regarding editing, with HDR images. I do not know a lot about HDR video but I believe the HDR video standards have probably focussed on optimizing for display on TVs, not editing them in the same way that people edit images.
I know this is perhaps slightly off-topic for project reunion, but hopefully this gets seen by the relevant team!
The text was updated successfully, but these errors were encountered:
I know this is external but I have nowhere else to post it. I would like request that Microsoft (most notably the Windows Imaging Component team) urgently reviews how it handles white level scaling for HDR images before it becomes too late. With the 24H2 update, WIC has added support for HDR AVIF images, but the white level scaling is wrong (or at least missing essential metadata) and I hope it can be addressed before it becomes too much of a mess to ever sort out.
Historically for SDR images, WIC has left it up to the application to apply any color profile stored in the image. However, now with formats like HEIC and AVIF it does this conversion during the decode process. There is nothing wrong with this per se (well other than certain lack of control, especially when it comes to transcoding images). For HDR AVIF images it produces a 64-bit floating point pixel format which we are left to assume is in the scRGB color space (although no color space information at all is provided). But what is most problematic is it gives us no indication of what pixel value represents diffuse white. In my opinion, the only sensible value for this is (1,1,1). Indeed, in the JPEG-XR specification (https://www.itu.int/rec/T-REC-T.832-201906-I, page 181), it says 'The scRGB perfect diffuse white point is specified by all three colour channels set to a value of 1.0.' Based on exporting a 32-bit floating-point TIFF from lightroom, the value (1,1,1) is also used to represent diffuse white. Using this value to represent diffuse white makes perfect sense, as we can seamlessly combine with SDR images after apply a color transform to may to a common space (and the reference scRGB color profile will map (1,1,1) to and XYZ luminance value of 1, matching what happens for an SDR image e.g. with an sRGB profile). If we want to export the image to SDR, or show it on an SDR display, we need to know the value representing diffuse white, and using (1,1,1) for this means we do not have to do any extra scaling and non-HDR aware software will display the image reasonably well, and if we can make floating point (1,1,1) = diffuse white assumption then no additional metadata is required.
However, it seems that Microsoft has stuck with the mindset that a floating point pixel value of (1,1,1) represents a fixed luminance of 80 nits, going against the quote from the JPEG-XR specification, and this approach causes many, many problems. I am not 100% sure this is the exact mindset, but one thing I do know is that a PQ-encoded HDR AVIF decoded by WIC is treated like this. The PQ standard (https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BT.2408-7-2023-PDF-E.pdf) defines HDR reference white/diffuse white/graphics white as 203 nits (alternatively 58% of way along the PQ curve), and that is definitely not being mapped to (1,1,1) - from testing it does look more like 80 nits is being mapped to (1,1,1). This means that is we want to combine with any SDR graphics, apply effects (many of which work best in the range [0,1]), convert to SDR etc, we are doing this with the wrong white level (given that we have no way of knowing what pixel value represents diffuse white, other than by assuming it's 1,1,1).
In my opinion, by far the best solution to this problem is to treat floating point (1,1,1) as diffuse white. Now when it comes to display, I think it's OK that floating point (1,1,1) represents the fixed luminance 80 nits since we are able to obtain the SDR white level for the display. An HDR capable viewer can then apply scaling of (display SDR white/80) to map (1,1,1) in the image to the SDR white of the display. This is a behavioral change from that idea that seems to be pushed in some of Microsoft's docs, which is that HDR images represent a fixed display luminance, but in my opinion, this doesn't really make sense - if someone in a very bright room creates and image and sends it to someone in a very dark room, they are unlikely to want the display to use the same brightness when viewing it. Mapping the between SDR white levels (which can be specified by the user in their display settings) overcomes this.
The alternative (inferior) solution to this problem would be to ensure that HDR images decoded by WIC to a floating point pixel format contain metadata indicating that pixel value that represents SDR white. This could be a single value (so in the AVIF case it would be 203/80 = 2.5375), or if you really want, you could specify the luminance of (1,1,1) in nits and then the diffuse white point in nits as 80 and 203 respectively, but these absolute values don't really have much meaning).
This also has implications for HDR wallpaper. It is unclear to me what pixel value should represent diffuse white in these images.
In general I think perhaps some of the thinking from Microsoft has come from gaming, which has the unique property that that HDR content is produced and consumed on the same display. In general images are commonly shared and edited between people on multiple different devices etc, and we need to maintain that possibility, especially regarding editing, with HDR images. I do not know a lot about HDR video but I believe the HDR video standards have probably focussed on optimizing for display on TVs, not editing them in the same way that people edit images.
I know this is perhaps slightly off-topic for project reunion, but hopefully this gets seen by the relevant team!
The text was updated successfully, but these errors were encountered: