Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STY: Add two abbreviations for an inline image object #3048

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

j-t-1
Copy link
Contributor

@j-t-1 j-t-1 commented Jan 13, 2025

CCF and DCT appended to function decode_stream_data.

CCF and DCT appended to function decode_stream_data.
Copy link

codecov bot commented Jan 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.36%. Comparing base (8249981) to head (4f0c026).
Report is 84 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3048   +/-   ##
=======================================
  Coverage   96.36%   96.36%           
=======================================
  Files          52       52           
  Lines        8770     8770           
  Branches     1596     1596           
=======================================
  Hits         8451     8451           
  Misses        191      191           
  Partials      128      128           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stefan6419846
Copy link
Collaborator

This apparently is not covered by any tests. Are you able to add corresponding ones?

@j-t-1
Copy link
Contributor Author

j-t-1 commented Jan 13, 2025

Short term maybe not. Could raise as an issue?

@stefan6419846
Copy link
Collaborator

We usually have the policy to only merge changes which received proper testing, thus merging it now and adding the tests later is not really the way to go. The easiest solution would be to identify the existing relevant test files and run them through the corresponding test after doing a search-and-replace run for the filter name.

@j-t-1
Copy link
Contributor Author

j-t-1 commented Jan 13, 2025

Fair. Okay I think you are saying find the tests for CCITT_FAX_DECODE and DCT_DECODE, then find the associated files containing /CCITTFaxDecode and /DCTDecode, and then change these to /CCF and /DCT?

@stefan6419846
Copy link
Collaborator

Yes.

@j-t-1
Copy link
Contributor Author

j-t-1 commented Jan 13, 2025

The abbreviations are mainly (always? I need to read the specification!) for the inline images, but this would work. Ideally the same PDF used for the /CCITTFaxDecode test could have a /CCF in a different stream (same for /DCTDecode and /DCT) to avoid proliferation of files.

@stefan6419846
Copy link
Collaborator

We can use the same PDF if we replace the offending specifiers on the fly. Alternatively, we might be able to construct a dummy example to just have an unit test of the corresponding function instead of a full-blown integration test, but both cases require further research/preparation.

@stefan6419846 stefan6419846 added the needs-test A test should be added before this PR is merged. label Jan 13, 2025
@j-t-1
Copy link
Contributor Author

j-t-1 commented Jan 14, 2025

In filters.py some filters that do not have parameters have a decode_parms argument, for example class ASCIIHexDecode. Removing decode_parms for these classes simplifies them. Thoughts?

@stefan6419846
Copy link
Collaborator

No, this should not be removed to have a consistent interface.

@j-t-1
Copy link
Contributor Author

j-t-1 commented Jan 15, 2025

The consistency of the interfaces is unused. This would be for a future use case?

@stefan6419846
Copy link
Collaborator

Not in our code, but its public API, so someone might use it in a way that such a parameter is always passed (although unused internally). To keep it simple and consistent, I am against removing them - regardless of whether they are needed or not.

@j-t-1
Copy link
Contributor Author

j-t-1 commented Jan 15, 2025

We have the kwargs argument, also unused, and that could serve the same purpose allowing the argument to be passed.

I think the filters code (including _xobj_image_helpers.py) could be simpler.

The docstring of ASCIIHexDecode describes decode_parms but it would be better (if I understand this correctly) to be something like decode_parms: unused; present for interface consistency.

@stefan6419846
Copy link
Collaborator

The discussions tend to get more and more out of scope for this PR. Please keep on topic for the actual changes to keep the overview.

@stefan6419846
Copy link
Collaborator

stefan6419846 commented Mar 13, 2025

I have not tested this with the existing alternative names, but for the new names, CCITT output formats differ and DCT even fails to load the image data.

This is the code I used for testing (and might be added to the existing tests as well):

def test_ccitt_fax_decode_and_ccf():
    data = RESOURCE_ROOT.joinpath("imagemagick-CCITTFaxDecode.pdf").read_bytes()
    reader = PdfReader(BytesIO(data))
    image1 = reader.pages[0].images[0].data

    data = data.replace(b"/CCITTFaxDecode", b"/CCF")
    reader = PdfReader(BytesIO(data))
    image2 = reader.pages[0].images[0].data

    assert image1 == image2


def test_dct_decode_and_dct():
    data = RESOURCE_ROOT.joinpath("jpeg.pdf").read_bytes()
    reader = PdfReader(BytesIO(data))
    image1 = reader.pages[0].images[0].data

    data = data.replace(b"/DCTDecode", b"/DCT")
    reader = PdfReader(BytesIO(data))
    image2 = reader.pages[0].images[0].data

    assert image1 == image2

@j-t-1
Copy link
Contributor Author

j-t-1 commented Mar 13, 2025

In the docstring of images it has:

reader.pages[0].images[0] # return first image

So is the .items()[0][1] part needed? Or maybe it should be .items()[0][0]?

@stefan6419846
Copy link
Collaborator

You are correct - I have updated my code. The items() approach works as well, but is indeed not really required here.

@stefan6419846 stefan6419846 added needs-change The PR/issue cannot be handled as issue and needs to be improved needs-test A test should be added before this PR is merged. and removed needs-test A test should be added before this PR is merged. labels Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-change The PR/issue cannot be handled as issue and needs to be improved needs-test A test should be added before this PR is merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants