Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spin up a postgres database within our nightly build container #3003

Closed
bendnorman opened this issue Nov 1, 2023 · 1 comment
Closed

Spin up a postgres database within our nightly build container #3003

bendnorman opened this issue Nov 1, 2023 · 1 comment

Comments

@bendnorman
Copy link
Member

bendnorman commented Nov 1, 2023

#2996 created a Cloud SQL database for our nightly builds to write event logs to. This resolved our locked event log database issue but might not be an ideal solution for a few reasons:

  • We only need the database when the nightly builds are running. Handling the lifecycle of the Cloud SQL instance and databases from within the VM might create bugs and doesn't vibe the intent of a cloud database. For example, if the VM spins up the instance and a database but then runs out of memory during the ETL, the SQL instance won't be spun down and the database deleted. The SQL instance will be left running so we'll lose $$ and the new run will use the old database. I generally don't think we should be managing GCP resources from with in the VM. I think a higher level orchestration tool should manage this.
  • Authentication with the Cloud SQL db is cumbersome. You have to create an static external IP address for VMs then whitelist the IP. This might be challenging if migrate to Batch and have arbitrary VMs running builds.

We should explore running postgres within the catalystcoop/pudl-etl image. It's more common to run postgres in a separate container but container optimized VMs only allow for one container per VM so we'd have to do a complex setup. Also, it's helpful to have separate containers for longer running applications that might scale up and down with use. In our case, we're just using postgres as a concurrent friendly version of a temporary sqlite database.

Pros

  • Don't have to deal with networking VMs to Cloud SQL
  • Don't have to pay for Cloud SQL
  • Simpler deployment setup
  • Don't need to pass secrets around

Cons

  • Running postgres in the container will consume some of the VM resources which might slow down the ETL.
  • Won't be a viable setup if we decide to host a full dagster deployment.
@bendnorman
Copy link
Member Author

With #3211, the dagster storage logs are now written to a postgres database that runs inside of the nightly build VM. The dagster-storage Cloud SQL instance has been deleted. This will save us about $80 a month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

1 participant