Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of tag list in Creator.add_metadata (fix #125) #126

Merged
merged 5 commits into from
Feb 12, 2024

Conversation

IMayBeABitShy
Copy link
Contributor

@IMayBeABitShy IMayBeABitShy commented Feb 6, 2024

Fixes #125

Previously, the type annotations of Creator.config_metadata() allowed "Tags" to be a list of strings, but didn't handle the conversion to a string, which resulted in python-libzim raising an error.

This commit modifies Creator.add_metadata() to accept a list of strings and, if the key of the metadata is "Tags", join them into a single string. A regression test is included. The metadata validation seems have already been written with a tag list in mind (though there is no check that a single tag in such a list should not contain ";", not sure if this is intended).

Note: the join has been implemented in Creator.add_metadata, but the conversion of datetime.datetime for the "Date" metadata key happens in python-libzim. I think splitting the conversion logic between those two libraries is suboptimal, we should consider implementing the tag list conversion in python-libzim instead.

@kelson42 kelson42 requested a review from rgaudin February 6, 2024 16:44
@rgaudin
Copy link
Member

rgaudin commented Feb 6, 2024

Thank you ; it looks good (except maybe the comments). @benoit74 will review but merge might be delayed a bit as he is in the process of changing the setup of this repo so it may be easier for him to merge his stuff first.

@rgaudin rgaudin requested a review from benoit74 February 6, 2024 16:58
IMayBeABitShy added a commit to IMayBeABitShy/sotoki that referenced this pull request Feb 7, 2024
openzim#290)

This commit upgrades python-scraperlib to version 3.x to implement issue openzim#290.
It depends on openzim/python-scraperlib#126 and will likely need at least one
revision. Honestly, I am not happy with the splitted Illustration handling.
Copy link
Collaborator

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ! Minor change to remove comments.

src/zimscraperlib/zim/creator.py Outdated Show resolved Hide resolved
@benoit74
Copy link
Collaborator

benoit74 commented Feb 8, 2024

Note: the join has been implemented in Creator.add_metadata, but the conversion of datetime.datetime for the "Date" metadata key happens in python-libzim. I think splitting the conversion logic between those two libraries is suboptimal, we should consider implementing the tag list conversion in python-libzim instead.

@rgaudin any views on this?

@rgaudin
Copy link
Member

rgaudin commented Feb 8, 2024

Note: the join has been implemented in Creator.add_metadata, but the conversion of datetime.datetime for the "Date" metadata key happens in python-libzim. I think splitting the conversion logic between those two libraries is suboptimal, we should consider implementing the tag list conversion in python-libzim instead.

@rgaudin any views on this?

Sure bothers me as well. I'd prefer we implement the date/datetime conversion here as well and eventually (needs a major release) remove the conversion there as pylibzim is just a wrapper around the C libzim.

@benoit74
Copy link
Collaborator

benoit74 commented Feb 8, 2024

Sure bothers me as well. I'd prefer we implement the date/datetime conversion here as well and eventually (needs a major release) remove the conversion there as pylibzim is just a wrapper around the C libzim.

Issue is opened in python-libzim: openzim/python-libzim#188

@IMayBeABitShy
Copy link
Contributor Author

I've removed the comments as requested. I've also added the datetime conversion here so all metadata value conversion happens in python-scraperlib.

Copy link
Collaborator

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good.

Good idea to already add the conversion of Date here.

To everybody: ⚠️ please do not merge ⚠️ I will move the issue to Draft to avoid mistakes.

I have another PR with lots of code changes to migrate to our new Python conventions, I will merge this other PR first to limit the risk of conflicts / simplify resolution. Do not mind about it, I will take care of everything (just do not delete your branch / repo until then ;-)).

@benoit74 benoit74 marked this pull request as draft February 9, 2024 20:11
This fixes issue openzim#125. Previously, the type annotations of Creator.config_metadata()
allowed "Tags" to be a list of strings, but didn't handle the conversion to a string,
which resulted in python-libzim raising an error.

This commit modifies Creator.add_metadata() to accept a list of strings and, if the
key of the metadata is "Tags", join them into a single string. A regression test is
included.
I didn't think I'd ever encounter an open source project opposed to comments, but here we are.
Some comments will soon be outdated anyway.
Copy link

codecov bot commented Feb 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (cfee792) 100.00% compared to head (10ca3fa) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #126   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           32        32           
  Lines         1332      1337    +5     
  Branches       225       227    +2     
=========================================
+ Hits          1332      1337    +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benoit74
Copy link
Collaborator

I rebased your changes and it made me realize there was a typing issue + we can mutualise your new test with the existing one, this is the intent of my last commit.

As mentioned in this last commit, I had to ignore the typing issue, it was hard to solve without TypeGuards (3.10) or would necessitate a significant code change. We can live with it for now.

@benoit74 benoit74 marked this pull request as ready for review February 12, 2024 16:53
@benoit74
Copy link
Collaborator

@rgaudin would you mind to review the last commit please, I'm not sure you will agree with my suggestion to ignore the typing issue for now.

@rgaudin
Copy link
Member

rgaudin commented Feb 12, 2024

Why not be safe with

if name == "Tags" and not isinstance(content, str) and isinstance(content, collections.abc.Iterable):
    content = ";".join(str(tag) for tag in content)

pyright ignore is mandatory because the type checker has no idea about
the impact of validate_metadata/validate_tag. This might be fixed by a
shared logic and a TypeGuard but is not available until Python 3.10 ;
other solution would be to transfer metadata to a way more typed
container after validation.
@benoit74
Copy link
Collaborator

Thank you very much! I chose to be even safer with

if (
            name == "Tags"
            and not isinstance(content, str)
            and not isinstance(content, bytes)
            and isinstance(content, collections.abc.Iterable)
        ):
            content = ";".join(content)

Just in case some day we begin to accept bytes ... don't see why but who knows.

@benoit74 benoit74 merged commit a7fe235 into openzim:main Feb 12, 2024
10 checks passed
benoit74 pushed a commit to openzim/sotoki that referenced this pull request Mar 27, 2024
#290)

This commit upgrades python-scraperlib to version 3.x to implement issue #290.
It depends on openzim/python-scraperlib#126 and will likely need at least one
revision. Honestly, I am not happy with the splitted Illustration handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Type/Behavior mismatch for "Tags" parameter of Creator.config_metadata
3 participants