Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-evaluate the use of openfoodfacts-build-cache #9220

Open
alexgarel opened this issue Oct 27, 2023 · 4 comments
Open

Re-evaluate the use of openfoodfacts-build-cache #9220

alexgarel opened this issue Oct 27, 2023 · 4 comments
Labels
✅ Task 🧬 Taxonomies - Rebuild Taxonomies are compiled before they can be used.

Comments

@alexgarel
Copy link
Member

We are using a git repository as a cache for our taxonomy builds.

So far so good. But it is growing at a quite big pace (more than 2000 files in the taxonomies directory).

I'm not sure how long we should go with it but I propose to be able to clean it up from times to times.

I think this would mean:

  • having a CI task to:
    • (easy) remove old files from time to time,
    • (harder) and eventually cleanup history and git force push a new main branch (but instances pulling from them must ensure they pull with option to avoid being stuck by this)
  • to avoid loosing taxonomies linked to releases, we should attach corresponding compiled taxonomies in release, in a tar file (you can see how it is done for dist files)
@alexgarel alexgarel added task 🧬 Taxonomies - Rebuild Taxonomies are compiled before they can be used. labels Oct 27, 2023
@alexgarel
Copy link
Member Author

@john-gom and @stephanegigandet, what do you think ? (It's not that urgent)

@john-gom
Copy link
Contributor

Regarding the concern around losing taxonomies linked to releases, could we not just look in the image history? https://github.com/openfoodfacts/openfoodfacts-server/pkgs/container/openfoodfacts-server%2Fbackend/versions?filters%5Bversion_type%5D=tagged

Regarding cleaning up, I would be tempted to go with the easy option for now. I feel like the hard option might be superseded by other initiatives, e.g. if the taxonomy editor or some similar project gets to completion then it might make more sense for that project to build the taxonomies as an "image" so that PO can just download them.

I'd also still like to persuade @stephanegigandet that we shouldn't be using real taxonomies in our tests, but test taxonomies with specifically designed test data to exercise the desired parts of the code. If we want to specifically run tests on the real taxonomies then maybe that should be part of the taxonomy editor project (or ideally not allow users to make errors in the first place).

@stephanegigandet
Copy link
Contributor

@alexgarel I don't think we really need to keep built taxonomies for old releases. They can always be rebuilt if for some reason they are needed.

@stephanegigandet
Copy link
Contributor

@john-gom I can see the advantages of having test taxonomies, but would be very hard to test things like ingredients parsing thoroughly with a minimal crafted taxonomy. It's extra work to craft them and to maintain them. And on the reverse side, we are much less likely to spot issues, and much more likely to have discrepancies. For instance in the PR I did today #9247 , I changed the values of a property, and the code and the taxonomy has to be updated at the same time. That's why I want to keep taxonomies in the Product Opener taxonomy, and I prefer the tests to run on them. There are strong dependencies between the code and the taxonomies (especially their hierarchy and the properties).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✅ Task 🧬 Taxonomies - Rebuild Taxonomies are compiled before they can be used.
Projects
Status: To discuss and validate
Development

No branches or pull requests

4 participants