-
-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move products_tags to Postgres to improve performance #8676
Comments
May want to consider using Citus. https://github.com/citusdata/citus |
Evaluating Citus vs DuckDB. First, export all the tags into a CSV file. To import into Postgres:
Import took 10 minutes, 52 seconds. DuckDB
Queries:
DuckDB: 16s, Citus: 39s
DuckDB: 50s, Citus: 87s The Citus figures are a lot slower than before. Investigating... |
Tried inserting into Citus in a different order: |
For DuckDB, re-exported data sorted by tag_type and value and re-imported into a fresh database. Import took 2 minutes 56 seconds. |
Tried another approach with Postgres - create a table for each tag type. Just created the two being used: Query 1 is now: Query 2: |
Things to do:
|
Description
Many aggregate queries time-out, even when using the products_tags collection in MongoDB. Initial tests indicate that Postgres would offer much better performance for this type of query.
Acceptance criteria
What would a demo look like
Show improved query performance
Notes
Strategy is to create a new openfoodfacts-data repo to wrap the data storage functionality. This would periodically query MongoDB for changed products (needs to check products_obsolete too) and update a cache in Postgres.
Can hopefully use the minon database for this, but might want to consider something separate.
Future work could include joining to taxonomies in the query to avoid the need for tag matching in Perl.
Tasks
Part of epic
The text was updated successfully, but these errors were encountered: