Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor level-2 branch to use remote db for all steps #12

Open
austensen opened this issue Mar 16, 2024 · 1 comment
Open

refactor level-2 branch to use remote db for all steps #12

austensen opened this issue Mar 16, 2024 · 1 comment

Comments

@austensen
Copy link
Member

We want to be able to run this updating job via cron job in kubernetes, but the way we have it set up currently with docker-compose to run a local postrges as a second service complicates that. Rather than trying to use a second k8s job for that, we should just refactor the level-2 branch to do all the initial loading of the parsed data into the same remote aws db we use for the final data. That way we also don't have to rebuild every time from pg_dump. So the new steps would be something like:

  • upload parsed xml directly into the aws db (maybe under a different schema, though maybe this can just all be in the same)
  • export the aws db tables to csv files directly into the s3 bucket https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/postgresql-s3-export.html
  • download the address csv from s3, perform the geocoding, upload the new csv to s3
  • upload the updated geocoded address csv in the s3 directly into the db (already doing this)
@zhik
Copy link
Collaborator

zhik commented Mar 19, 2024

Additional tasks (low priority)

  • Batch parse_case . Write to the remote db takes one second per line (maybe this is due to my connection, it might improve if you are running on a data center), compared to a few milliseconds using the local docker instance. This results in processing a file taking 4h59min8s (35691 xml lines).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants