Re-evaluate the use of openfoodfacts-build-cache #9220

alexgarel · 2023-10-27T17:01:12Z

We are using a git repository as a cache for our taxonomy builds.

So far so good. But it is growing at a quite big pace (more than 2000 files in the taxonomies directory).

I'm not sure how long we should go with it but I propose to be able to clean it up from times to times.

I think this would mean:

having a CI task to:
- (easy) remove old files from time to time,
- (harder) and eventually cleanup history and git force push a new main branch (but instances pulling from them must ensure they pull with option to avoid being stuck by this)
to avoid loosing taxonomies linked to releases, we should attach corresponding compiled taxonomies in release, in a tar file (you can see how it is done for dist files)

alexgarel · 2023-10-27T17:01:51Z

@john-gom and @stephanegigandet, what do you think ? (It's not that urgent)

john-gom · 2023-10-30T18:21:11Z

Regarding the concern around losing taxonomies linked to releases, could we not just look in the image history? https://github.com/openfoodfacts/openfoodfacts-server/pkgs/container/openfoodfacts-server%2Fbackend/versions?filters%5Bversion_type%5D=tagged

Regarding cleaning up, I would be tempted to go with the easy option for now. I feel like the hard option might be superseded by other initiatives, e.g. if the taxonomy editor or some similar project gets to completion then it might make more sense for that project to build the taxonomies as an "image" so that PO can just download them.

I'd also still like to persuade @stephanegigandet that we shouldn't be using real taxonomies in our tests, but test taxonomies with specifically designed test data to exercise the desired parts of the code. If we want to specifically run tests on the real taxonomies then maybe that should be part of the taxonomy editor project (or ideally not allow users to make errors in the first place).

stephanegigandet · 2023-11-06T17:30:50Z

@alexgarel I don't think we really need to keep built taxonomies for old releases. They can always be rebuilt if for some reason they are needed.

stephanegigandet · 2023-11-06T17:46:10Z

@john-gom I can see the advantages of having test taxonomies, but would be very hard to test things like ingredients parsing thoroughly with a minimal crafted taxonomy. It's extra work to craft them and to maintain them. And on the reverse side, we are much less likely to spot issues, and much more likely to have discrepancies. For instance in the PR I did today #9247 , I changed the values of a property, and the code and the taxonomy has to be updated at the same time. That's why I want to keep taxonomies in the Product Opener taxonomy, and I prefer the tests to run on them. There are strong dependencies between the code and the taxonomies (especially their hierarchy and the properties).

alexgarel added task 🧬 Taxonomies - Rebuild Taxonomies are compiled before they can be used. labels Oct 27, 2023

teolemon added ✅ Task and removed task labels Nov 10, 2023

teolemon added this to 🍊 Open Food Facts Server issues Apr 4, 2024

teolemon moved this to To discuss and validate in 🍊 Open Food Facts Server issues Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-evaluate the use of openfoodfacts-build-cache #9220

Re-evaluate the use of openfoodfacts-build-cache #9220

alexgarel commented Oct 27, 2023

alexgarel commented Oct 27, 2023

john-gom commented Oct 30, 2023

stephanegigandet commented Nov 6, 2023

stephanegigandet commented Nov 6, 2023

Re-evaluate the use of openfoodfacts-build-cache #9220

Re-evaluate the use of openfoodfacts-build-cache #9220

Comments

alexgarel commented Oct 27, 2023

alexgarel commented Oct 27, 2023

john-gom commented Oct 30, 2023

stephanegigandet commented Nov 6, 2023

stephanegigandet commented Nov 6, 2023