Use libjpeg-turbo for all Lossless JPEG bit depths #105

SimonSegerblomRex · 2024-06-24T13:24:37Z

Enabled by the solution to libjpeg-turbo/libjpeg-turbo#768
(Planned to be included in libjpeg-turbo release 3.1.0.)

There are still some Lossless JPEG encoded images that libjpeg-turbo refuses to decode, see the discussions in:

Note to self while testing with local copy of libjpeg-turbo:
Put this in the customize_build function used by setup.py:

libjpeg_turbo_path = <path to libjpeg-turbo>
EXTENSIONS['jpeg8']['sources'] = []
EXTENSIONS['jpeg8']['include_dirs'] = [libjpeg_turbo_path + "/src"]  # moved to src in the dev branch
EXTENSIONS['jpeg8']['library_dirs'] = [libjpeg_turbo_path]

and make sure to set

export LD_LIBRARY_PATH=<libjpeg_turbo_path>

before running any python script importing imagecodecs.

Enabled by the solution to: libjpeg-turbo/libjpeg-turbo#768 (Planned to be included in libjpeg-turbo release 3.1.0.) There are still some Lossless JPEG encoded images that libjpeg-turbo refuses to decode, see the discussions in: * libjpeg-turbo/libjpeg-turbo#586 * libjpeg-turbo/libjpeg-turbo#765

SimonSegerblomRex · 2024-06-24T13:33:17Z

This is WIP and will stay as a draft pull request until there's an official libjpeg-turbo release that includes the changes necessary.

cgohlke · 2024-06-24T15:48:28Z

Thanks. I am aware of the ongoing work in libjpeg-turbo. Note that the JPEG codec in imagecodecs switches to the LJPEG codec for bit-depths not supported by libjpeg-turbo.

SimonSegerblomRex · 2024-06-24T16:32:23Z

Note that the JPEG codec in imagecodecs switches to the LJPEG codec for bit-depths not supported by libjpeg-turbo.

Yes, ljpeg_decode seems to work fine and will still be needed as backup in jpeg_decode for images that libjpeg-turbo refuses to decode due to the issues discussed in libjpeg-turbo/libjpeg-turbo#586 and libjpeg-turbo/libjpeg-turbo#765. ljpeg_encode shouldn't be needed any longer though.

The tests fail due to the issue discussed here: libjpeg-turbo/libjpeg-turbo#765

tests/test_imagecodecs.py

SimonSegerblomRex · 2024-06-25T11:28:29Z

imagecodecs/imagecodecs.py

@@ -109,7 +109,7 @@
 - `libheif <https://github.com/strukturag/libheif>`_ 1.17.6
  (`libde265 <https://github.com/strukturag/libde265>`_ 1.0.15,
  `x265 <https://bitbucket.org/multicoreware/x265_git/src/master/>`_ 3.6)
- `libjpeg-turbo <https://github.com/libjpeg-turbo/libjpeg-turbo>`_ 3.0.3
+- `libjpeg-turbo <https://github.com/libjpeg-turbo/libjpeg-turbo>`_ 6ec8e41f50e5a83fe078732cbf0360272165ed45


This is the latest sha1 from the dev branch. No official release tag.

SimonSegerblomRex · 2024-06-25T11:31:31Z

I tested this with a 16bit Lossless JPEG file as input:

import sys

from imagecodecs import imread, jpeg8_decode, jpeg8_encode
from numpy.testing import assert_array_equal

filename = sys.argv[1]

image = imread(filename)
if image.ndim > 2:
    image = image[..., 0].copy()  # copy to fix strides

for bit_depth in range(16, 1, -1):
    print(bit_depth)
    if bit_depth <= 8 and image.itemsize > 1:
        # FIXME: Should this really be necessary?
        image = image.astype("u1")
    enc = jpeg8_encode(
        image,
        lossless=True,
        predictor=1,
        bitspersample=bit_depth,
    )
    dec = jpeg8_decode(enc)
    assert_array_equal(image, dec)
    image <<= 1

It works, but the case with bit-depth <= 8 in a uint16 array should be handled in a better way.

EDIT: Fixed this with the check here.

imagecodecs/_jpeg8.pyx

And improve error handling.

SimonSegerblomRex · 2024-06-25T13:30:00Z

imagecodecs/_jpeg8.pyx

@@ -141,7 +141,7 @@ def jpeg8_encode(
        (src.dtype == numpy.uint8 or src.dtype == numpy.uint16)
        and src.ndim in {2, 3}
        # src.nbytes <= 2147483647 and  # limit to 2 GB
-        and samples in {1, 3, 4}
+        and samples in {1, 2, 3, 4}


Seems to work as expected with 2 components:

import sys from imagecodecs import imread, jpeg8_decode, jpeg8_encode from numpy.testing import assert_array_equal filename = sys.argv[1] image = imread(filename) enc = jpeg8_encode( image, lossless=True, predictor=1, bitspersample=16, ) dec = jpeg8_decode(enc) assert_array_equal(image, dec)

Using this input file:

These files were created by a Lossless JPEG encoder (implemented by me...) that contained a bug that caused the largest Huffman code to contain all ones. There's no problem to decode these images (even for libjpeg-turbo, see the discussion in libjpeg-turbo/libjpeg-turbo#765 ) but libjpeg-turbo refuses to do since they are not valid according to the JPEG spec. Now I recreated them using jpeg8_encode. dng0.ljp was not a valid file.

SimonSegerblomRex · 2024-06-26T12:04:50Z

(I replaced the broken dng*.ljp files that were created using my broken Lossless JPEG encoder.)

I did a quick benchmark comparing jpeg8_decode and ljpeg_decode. jpeg8_decode is about ~40 % faster using this input: Pentax-K-1-DNG-extracted.jpg ( 3696x4950, 2 components) (Note: Pentax DNG files are the only images I've found in the wild hit by this problem, so you need that patch to get past the "Bogus Huffman table definition" error.)

Everything seems to work as expected now, but I guess we should wait for an official libjpeg-turbo tag.

SimonSegerblomRex · 2024-06-26T13:35:43Z

I found this source containing a lot of Lossless JPEG files (embedded in DICOM files). A quick test shows that libjpeg-turbo and lj92 produce slightly different results for some of them, e.g., gdcm-JPEG-LossLessThoravision.dcm. BitsPerSample is 15 and in the decoded arrays there are values as high as 65520 for lj92 and 65535 for libjpeg-turbo... something weird is going on here (even considering that the decoded values are probably supposed to be reinterpreted as signed values or something). Do you have any input regarding this file @malaterre? EDIT: Solved by using gdcmrawto extract the JPEG file. Now this files behaves as expected both with lj92 and libjpeg-turbo.

malaterre · 2024-06-26T13:53:08Z

I found this source containing a lot of Lossless JPEG files (embedded in DICOM files). A quick test shows that libjpeg-turbo and lj92 produce slightly different results for some of them, e.g., gdcm-JPEG-LossLessThoravision.dcm. BitsPerSample is 15 and in the decoded arrays there are values as high as 65520 for lj92 and 65535 for libjpeg-turbo... something weird is going on here (even considering that the decoded values are probably supposed to be reinterpreted as signed values or something). Do you have any input regarding this file @malaterre?

@SimonSegerblomRex What do you get if you use thorfdbg/libjpeg ?

SimonSegerblomRex · 2024-06-26T14:04:40Z

@SimonSegerblomRex What do you get if you use thorfdbg/libjpeg ?

With thorfdbg/libjpeg I get:

reading a JPEG file failed - error -1038 - invalid stream, found invalid huffman code in entropy coded segment

and that's probably the right thing. The images decoded by lj92 and libjpeg-turbo are completely broken, so they would have been better off failing as well than trying to decode garbage.

SimonSegerblomRex · 2024-06-26T14:10:16Z

I found that lj92 fails to decode MARCONI_MxTWin-12-MONO2-JpegLossless-ZeroLengthSQ.dcm (just 0s out) while libjpeg-turbo decodes it without issues 👍 EDIT: Extracting and repairing the JPEG file using gdcmraw it decodes as expected also with lj92.

malaterre · 2024-06-26T14:17:31Z

@SimonSegerblomRex What do you get if you use thorfdbg/libjpeg ?

With thorfdbg/libjpeg I get:
reading a JPEG file failed - error -1038 - invalid stream, found invalid huffman code in entropy coded segment
and that's probably the right thing. The images decoded by lj92 and libjpeg-turbo are completely broken, so they would have been better off failing as well than trying to decode garbage.

What kind of command did you use ?

% gdcmraw gdcm-JPEG-LossLessThoravision.dcm  /tmp/bla.jpg
% jpeg /tmp/bla.jpg /tmp/bla.pgm
jpeg Copyright (C) 2012-2018 Thomas Richter, University of Stuttgart
and Accusoft

For license conditions, see README.license for details.


0 bytes memory not yet released.

15905134 bytes maximal required.

4197 allocations performed.

SimonSegerblomRex · 2024-06-26T14:35:11Z

EDIT: Using the output from gdcmraw (that's actually not part of the DICOM file) I get the same output using all three decoders 👍

First I just used this script to extract the JPEG file:

import re
import struct
import sys

SOI = struct.pack(">H", 0xFFD8)
SOF3 = struct.pack(">H", 0xFFC3)
EOI = struct.pack(">H", 0xFFD9)

with open(sys.argv[1], "rb") as f:
    data = f.read()

matches = re.finditer(b"(?=(" + SOI + b".*?" + SOF3 + b".+?" + EOI + b"))", data, re.S)
for i, match in enumerate(matches):
    with open(f"{i}.jpg", "wb") as f:
        print(i)
        f.write(match.group(1))

It seems like gdcmraw does some magic to repair the broken file.

SimonSegerblomRex · 2024-06-27T13:27:02Z

This is ready for code review (but there's still no new libjpeg-turbo release or tag).

cgohlke · 2024-06-27T16:27:38Z

Thank you. I will review this when libjpeg-turbo 3.1 is released.

SimonSegerblomRex force-pushed the libjpegturbo branch from be98be3 to 830f680 Compare June 24, 2024 13:29

Update libjpeg-turbo version (sha1) and tests

e268e6a

The tests fail due to the issue discussed here: libjpeg-turbo/libjpeg-turbo#765

SimonSegerblomRex commented Jun 25, 2024

View reviewed changes

tests/test_imagecodecs.py Outdated Show resolved Hide resolved

SimonSegerblomRex commented Jun 25, 2024

View reviewed changes

imagecodecs/_jpeg8.pyx Outdated Show resolved Hide resolved

Allow encoding 2 component images

7856b72

And improve error handling.

SimonSegerblomRex commented Jun 25, 2024

View reviewed changes

SimonSegerblomRex added 2 commits June 26, 2024 13:02

Enable testing some of the YCbCR files for libjpeg-turbo

54ad9e3

SimonSegerblomRex mentioned this pull request Jun 26, 2024

Lossless JPEG: Add support for decoding images with 14-bit data precision (needed by DNG) libjpeg-turbo/libjpeg-turbo#768

Closed

SimonSegerblomRex marked this pull request as ready for review June 27, 2024 13:27

cgohlke added the enhancement New feature or request label Jun 27, 2024

cgohlke mentioned this pull request Aug 9, 2024

Relax bitspersample checks cgohlke/tifffile#265

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use libjpeg-turbo for all Lossless JPEG bit depths #105

Use libjpeg-turbo for all Lossless JPEG bit depths #105

SimonSegerblomRex commented Jun 24, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 24, 2024

cgohlke commented Jun 24, 2024

SimonSegerblomRex commented Jun 24, 2024 •

edited

Loading

SimonSegerblomRex Jun 25, 2024

SimonSegerblomRex commented Jun 25, 2024 •

edited

Loading

SimonSegerblomRex Jun 25, 2024

SimonSegerblomRex commented Jun 26, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 26, 2024 •

edited

Loading

malaterre commented Jun 26, 2024

SimonSegerblomRex commented Jun 26, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 26, 2024 •

edited

Loading

malaterre commented Jun 26, 2024

SimonSegerblomRex commented Jun 26, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 27, 2024 •

edited

Loading

cgohlke commented Jun 27, 2024

Use libjpeg-turbo for all Lossless JPEG bit depths #105

Are you sure you want to change the base?

Use libjpeg-turbo for all Lossless JPEG bit depths #105

Conversation

SimonSegerblomRex commented Jun 24, 2024 • edited Loading

SimonSegerblomRex commented Jun 24, 2024

cgohlke commented Jun 24, 2024

SimonSegerblomRex commented Jun 24, 2024 • edited Loading

SimonSegerblomRex Jun 25, 2024

Choose a reason for hiding this comment

SimonSegerblomRex commented Jun 25, 2024 • edited Loading

SimonSegerblomRex Jun 25, 2024

Choose a reason for hiding this comment

SimonSegerblomRex commented Jun 26, 2024 • edited Loading

SimonSegerblomRex commented Jun 26, 2024 • edited Loading

malaterre commented Jun 26, 2024

SimonSegerblomRex commented Jun 26, 2024 • edited Loading

SimonSegerblomRex commented Jun 26, 2024 • edited Loading

malaterre commented Jun 26, 2024

SimonSegerblomRex commented Jun 26, 2024 • edited Loading

SimonSegerblomRex commented Jun 27, 2024 • edited Loading

cgohlke commented Jun 27, 2024

SimonSegerblomRex commented Jun 24, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 24, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 25, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 26, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 26, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 26, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 26, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 26, 2024 •

edited

Loading

SimonSegerblomRex commented Jun 27, 2024 •

edited

Loading