You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#2996 created a Cloud SQL database for our nightly builds to write event logs to. This resolved our locked event log database issue but might not be an ideal solution for a few reasons:
We only need the database when the nightly builds are running. Handling the lifecycle of the Cloud SQL instance and databases from within the VM might create bugs and doesn't vibe the intent of a cloud database. For example, if the VM spins up the instance and a database but then runs out of memory during the ETL, the SQL instance won't be spun down and the database deleted. The SQL instance will be left running so we'll lose $$ and the new run will use the old database. I generally don't think we should be managing GCP resources from with in the VM. I think a higher level orchestration tool should manage this.
Authentication with the Cloud SQL db is cumbersome. You have to create an static external IP address for VMs then whitelist the IP. This might be challenging if migrate to Batch and have arbitrary VMs running builds.
We should explore running postgres within the catalystcoop/pudl-etl image. It's more common to run postgres in a separate container but container optimized VMs only allow for one container per VM so we'd have to do a complex setup. Also, it's helpful to have separate containers for longer running applications that might scale up and down with use. In our case, we're just using postgres as a concurrent friendly version of a temporary sqlite database.
Pros
Don't have to deal with networking VMs to Cloud SQL
Don't have to pay for Cloud SQL
Simpler deployment setup
Don't need to pass secrets around
Cons
Running postgres in the container will consume some of the VM resources which might slow down the ETL.
Won't be a viable setup if we decide to host a full dagster deployment.
The text was updated successfully, but these errors were encountered:
With #3211, the dagster storage logs are now written to a postgres database that runs inside of the nightly build VM. The dagster-storage Cloud SQL instance has been deleted. This will save us about $80 a month.
#2996 created a Cloud SQL database for our nightly builds to write event logs to. This resolved our locked event log database issue but might not be an ideal solution for a few reasons:
We should explore running postgres within the
catalystcoop/pudl-etl
image. It's more common to run postgres in a separate container but container optimized VMs only allow for one container per VM so we'd have to do a complex setup. Also, it's helpful to have separate containers for longer running applications that might scale up and down with use. In our case, we're just using postgres as a concurrent friendly version of a temporary sqlite database.Pros
Cons
The text was updated successfully, but these errors were encountered: