Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy prod data to stagging (aka .net) #894

Open
alexgarel opened this issue Sep 12, 2022 · 3 comments
Open

Copy prod data to stagging (aka .net) #894

alexgarel opened this issue Sep 12, 2022 · 3 comments
Labels

Comments

@alexgarel
Copy link
Member

Problem

On production we have a lot of data but very few in stagging, because there are very few product addition.
For productopener we copy data from production (in mongodb).

Copying data from production to stagging is not that easy because the server_domain column has to be updated which takes a lot of time.

Proposed solution

I'm not sure what is the best option, but I would say:

  • export prod postgresql every week (dump in postgres) - not too hard to do with a cron
    • IMO we should use the "custom" format which is more efficient while still easy to modify
    • Note: there is a way to pass password through a file you mount in the docker
  • ssh copy file to stagging
  • modify the export thanks to a script (even a sed command) to change .org to .net
    • verify if it works well with custom format (hint: it's compressed)
  • import in postgresql on stagging
@raphael0202
Copy link
Collaborator

As discussed with @alexgarel, it would be more interesting to completely drop the server_domain field, as we have distinct environments for staging/production. This way we can import production data into staging without having to do any DB migration.
As we're still considering adding support of OpenBeautyFacts/OpenProductFacts/... to Robotoff, we're keeping the server_type field.

What needs to be done:

  • Create missing prediction.server_type field (same field as product_insight.server_type), fill it with off for all prediction in DB.
  • Only use server_type instead of server_domain in Robotoff codebase. Use settings.ROBOTOFF_INSTANCE to know which Product Opener server to use to perform product update.
  • Delete product_insight.server_domain, prediction.server_domain and image_model.server_domain fields.

raphael0202 added a commit that referenced this issue Apr 12, 2023
- delete server_domain in image, product_insight and prediction tables
- add server_type field to image and prediction tables
- use ProductIdentifier (barcode + server_type) instead of barcode
  in codebase

See #894
raphael0202 added a commit that referenced this issue Apr 12, 2023
- delete server_domain in image, product_insight and prediction tables
- add server_type field to image and prediction tables
- use ProductIdentifier (barcode + server_type) instead of barcode
  in codebase

See #894
raphael0202 added a commit that referenced this issue Apr 13, 2023
- delete server_domain in image, product_insight and prediction tables
- add server_type field to image and prediction tables
- use ProductIdentifier (barcode + server_type) instead of barcode
  in codebase

See #894
@raphael0202
Copy link
Collaborator

raphael0202 commented Apr 14, 2023

Fixed by #1083.

What has been done:

  • created Image.server_type and Prediction.server_type, with off as default value (preprod and prod)
  • deleted ProductInsight.server_domain, Prediction.server_domain and Image.server_domain (preprod and prod)
  • updated ES logo index on prod and preprod to add a server_type field: http -vv PUT http://localhost:9200/logo/_mapping properties:='{"server_type": {"type": "keyword"}}'. This is used to filter logos when fetching nearest neighbors depending on the server type.
  • update all existing logos in logo index with server_type=off (preprod and prod): http POST "http://localhost:9200/logo/_update_by_query?conflicts=proceed&pretty" query:='{"bool": {"must_not": {"exists": {"field": "server_type"}}}}' script:='{"inline": "ctx._source.server_type = \"off\"", "lang": "painless"}'

Full docker command ES update by query:

docker exec -it robotoff_elasticsearch_1 curl -X POST "http://elastic:$ELASTIC_PASSWORD@localhost:9200/logo/_update_by_query?conflicts=proceed&pretty" -H 'Content-Type: application/json' -d'{"query": {"bool": {"must_not": {"exists": {"field": "server_type"}}}}, "script": {"inline": "ctx._source.server_type = \"off\"", "lang": "painless"}}'

@raphael0202
Copy link
Collaborator

Well I missed the fact this issue was not really about multi-platform support on Robotoff, so I'm reopening it.

@raphael0202 raphael0202 reopened this Apr 14, 2023
@teolemon teolemon removed the ✨ enhancement New feature or request label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

3 participants