Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch storage over to timescaledb #9

Merged
merged 1 commit into from
Nov 6, 2023
Merged

Switch storage over to timescaledb #9

merged 1 commit into from
Nov 6, 2023

Conversation

ghickman
Copy link
Contributor

@ghickman ghickman commented Nov 3, 2023

This assumes the datastore will have been configured with the timescale extension already. Locally this is handled by the timescaledb container, and in production we have already configured it.

The --database-url option for the CLI ensures we have that env var set. We could always pass it through (since it's in the context) but I wasn't sure if we actually needed it set via the CLI, but I wanted click to give us a consistent error when it's not set, and to do so immediately, rather than when the writer is called.

The writer class means we can ensure the appropriate table exists and has had a hypertable created for it when we enter the context block.

All tables have a unique constraint with the common _must_be_different suffix so the INSERT's can act as an UPSERT (using ON CONFLICT) when it's just the value that needs updating.

I tried out the schema-based method we discussed in slack:

  1. create table in another schema
  2. add data
  3. move to primary schema

This worked, as we expected, but didn't fit how we've been planning to run these scripts, where they are adding a single period of data (eg day, week, etc) on each invocation.


Looking forward I'm expecting the table definitions to move to something slightly more complex. The GitHub and Slack tables are very similar, having been copied over from influx, but I can easily imagine a world where we need entirely different tables, and even tables that are not timeseries (a primary reason we're doing this switch!). At that point I expect us to look into ORM[-adjacent] soluations. The INSERT call in .write() is already a little gnarly and I'm sure someone else (almost certainly SQLAlchemy…) has better code for this than we will want to maintain. I suspect this will also make table management a little easier too.

However it didn't seem worth it for this PR and we can address that when it comes up.

This assumes the datastore will have been configured with the timescale
extension already.  Locally this is handled by the timescaledb
container, and in production we have already configured it.

The --database-url option for the CLI ensures we have that env var set.
We could always pass it through (since it's in the context) but I wasn't
sure if we actually needed it set via the CLI.

The writer class means we can ensure the appropriate table exists and
has had a hypertable created for it.

All tables have a unique constraint with the common _must_be_different
suffix so the INSERT's can act as an UPSERT (with ON CONFLICT) when it's
just the value that needs updating.
Base automatically changed from slack to main November 6, 2023 11:01
@ghickman ghickman merged commit a10da2d into main Nov 6, 2023
4 checks passed
@ghickman ghickman deleted the timescale branch November 6, 2023 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants