Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move time-series data to TimescaleDB #71

Open
CyrusVorwald opened this issue Nov 8, 2024 · 1 comment
Open

Move time-series data to TimescaleDB #71

CyrusVorwald opened this issue Nov 8, 2024 · 1 comment

Comments

@CyrusVorwald
Copy link

CyrusVorwald commented Nov 8, 2024

The current implementation uses separate PostgreSQL tables for different time granularities (5m, 15m, 1h, etc.). TimescaleDB is a PostgreSQL extension to make SQL scalable for time-series data.

TimescaleDB is chosen over the TICK stack due to relative ease of code changes.

Instead of having separate tables for different time intervals, time-series data can be consolidated. For example:

CREATE TABLE pool_metrics (
    time TIMESTAMPTZ NOT NULL,
    pool_id INTEGER NOT NULL,
    chain_id INTEGER NOT NULL,
    price DOUBLE PRECISION NOT NULL,
    volume DOUBLE PRECISION NOT NULL,
    total_supply DOUBLE PRECISION,
    base_volume DOUBLE PRECISION,
    quote_volume DOUBLE PRECISION,
    base_lp_fees DOUBLE PRECISION,
    quote_lp_fees DOUBLE PRECISION,
    base_baln_fees DOUBLE PRECISION,
    quote_baln_fees DOUBLE PRECISION,
    block_height INTEGER,
    PRIMARY KEY(time, pool_id)
);

CREATE TABLE token_metrics (
    time TIMESTAMPTZ NOT NULL,
    address TEXT NOT NULL,
    chain_id INTEGER NOT NULL,
    price DOUBLE PRECISION NOT NULL,
    volume DOUBLE PRECISION,
    liquidity DOUBLE PRECISION,
    total_supply DOUBLE PRECISION,
    block_height INTEGER,
    PRIMARY KEY(time, address)
);

SELECT create_hypertable('pool_metrics', 'time');
SELECT create_hypertable('token_metrics', 'time');

CREATE INDEX ON pool_metrics (pool_id, time DESC);
CREATE INDEX ON token_metrics (address, time DESC);

Continuous materialized views automatically maintain pre-aggregated summaries of data. For example, pool_metrics can be grouped into 5-minute intervals with OHLCV data as follows:

CREATE MATERIALIZED VIEW pool_metrics_5m
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('5 minutes', time) AS bucket, -- Group data into 5-min buckets
    pool_id,
    chain_id,
    first(price, time) AS open,    -- First price in interval
    last(price, time) AS close,    -- Last price in interval
    max(price) AS high,            -- Highest price in interval
    min(price) AS low,             -- Lowest price in interval
    sum(volume) AS volume          -- Total volume in interval
FROM pool_metrics
GROUP BY bucket, pool_id, chain_id;

Raw data can be kept for 30 days and summaries can be kept for a year:

SELECT add_retention_policy('pool_metrics', INTERVAL '30 days');
SELECT add_retention_policy('pool_metrics_5m', INTERVAL '365 days');

The old existing tables would need to be migrated to the new ones. Application code such as models, CRUD operations, and cron jobs would need to be updated to reflect the new tables and materialized views. Monitoring may similarly need to be updated.

@robcxyz
Copy link
Contributor

robcxyz commented Nov 8, 2024

This would be a rather large upgrade but yes, totally agree with the suggestion for timescale but to do the summary stats through materialized views would take a rather large change to backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants