Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 cached images are missing #2136

Open
benoit74 opened this issue Jan 24, 2025 · 10 comments · May be fixed by #2143
Open

S3 cached images are missing #2136

benoit74 opened this issue Jan 24, 2025 · 10 comments · May be fixed by #2143
Assignees
Labels
bug regression First as tragedy, then as farce ;-)
Milestone

Comments

@benoit74
Copy link
Contributor

I've started again https://farm.openzim.org/recipes/wikipedia_hi_basketball. At that moment, S3 cache was mostly empty for this ZIM since we've emptied it few months ago and the recipe never succeeded again since then.

The task is https://farm.openzim.org/pipeline/7005eeb5-b08e-4a28-91a7-0575990427e9

While the nopic ZIM (which is produced first) seems OK (at least, all photo on the welcome page are here), while on the maxi all welcome page photos are missing, and we find quite a lot of articles with missing photos.

I started the recipe again and this time, all photo from nopic welcome page are missing.

To me, it looks like all images which are fetched from S3 cache (instead of being downloaded from online) do not make it into the ZIM anymore.

I checked in the S3 bucket and everything looks fine there. I did not yet had time to look into the ZIM to see what is there.

@audiodude do you have time to work on this soon or would you prefer I do it (I do not know exactly who is supposed to do the maintenance on mwoffliner now). This is obviously a pretty major issue to solve.

@benoit74 benoit74 added bug regression First as tragedy, then as farce ;-) labels Jan 24, 2025
@benoit74
Copy link
Contributor Author

Nota: first ZIMs are obviously now gone from storage and I forgot to save them for later reference. Probably not a big deal since anyway the issue is more visible on latest ZIMs ^^

@benoit74 benoit74 changed the title S3 cached images are not missing S3 cached images are missing Jan 24, 2025
@audiodude
Copy link
Member

@audiodude do you have time to work on this soon or would you prefer I do it (I do not know exactly who is supposed to do the maintenance on mwoffliner now). This is obviously a pretty major issue to solve.

My understanding is that I'm supposed to be prioritizing mwoffliner 2.0, specifically using latest libzim. I can definitely take a look if you need help though.

@benoit74
Copy link
Contributor Author

I will have a first look then

@benoit74
Copy link
Contributor Author

Issue is pretty minor, image is inside the ZIM, correctly converted to WebP, but the HTML is not looking after proper path. I will have to confirm if problem is that image is not at proper location or if it is that HTML is not looking after proper path.

@kelson42
Copy link
Collaborator

Yesterday, I have run https://farm.openzim.org/pipeline/85a92f51-4df0-468e-8adb-877cf5001cee which seems to be OK... beside the fact at the ZIM with images if significantly bigger than with 1.13.0

@benoit74
Copy link
Contributor Author

Of course, for your case S3 cache was empty and nopic uses no picture even on welcome page. ZIM is going to be pretty different next time you run it ^^

Please open an issue about bigger ZIM if analysis is needed, might be that we just have more items in the selection.

@audiodude
Copy link
Member

Yesterday, I have run https://farm.openzim.org/pipeline/85a92f51-4df0-468e-8adb-877cf5001cee which seems to be OK... beside the fact at the ZIM with images if significantly bigger than with 1.13.0

Please see comments in #2101 on image sizes between 1.13 and 1.14.

@kelson42
Copy link
Collaborator

@benoit74 We will have to fix this bug around S3 in prio and make a patch release. Will prepare the 1.14.1 milestone.

@kelson42 kelson42 added this to the 1.14.1 milestone Jan 26, 2025
@benoit74 benoit74 self-assigned this Jan 27, 2025
@Optimus-NP
Copy link

I am a student and currently don't have an AWS account. I'm trying to debug an issue, but I lack access to the AWS S3 service. I kindly request access to a developer AWS account so that I can investigate the issue and submit a pull request.

@benoit74
Copy link
Contributor Author

@Optimus-NP sorry, but I'm already working on this complex issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug regression First as tragedy, then as farce ;-)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants