Cleanup routine using table partitioning #659

eskebab · 2025-02-11T13:13:26Z

Problem
We want to keep data for as long as necessary to do proper analysis. When log entries are considered useless for troubleshooting purposes, they should be deleted or partitioned

Solution
We implement database table partitioning using built-in partitioning in postgres. We can use existing cron.schedule function IF we are able to get next month from context

eskebab · 2025-03-04T09:19:20Z

Following these guidelines for creating a partitioned table:
https://www.postgresql.org/docs/current/ddl-partitioning.html

primary key must include partitioned column

To create a unique or primary key constraint on a partitioned table, the partition keys must not include any expressions or function calls and the constraint's columns must include all of the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions.

create index on range column

Create an index on the key column(s), as well as any other indexes you might want, on the partitioned table. (The key index is not strictly necessary, but in most scenarios it is helpful.) This automatically creates a matching index on each partition, and any partitions you create or attach later will also have such an index. An index or unique constraint declared on a partitioned table is “virtual” in the same way that the partitioned table is: the actual data is in child indexes on the individual partition tables.

Ensure that the enable_partition_pruning configuration parameter is not disabled in postgresql.conf. If it is, queries will not be optimized as desired.

eskebab · 2025-03-04T10:59:46Z

CREATE TABLE events.trace_log_y2024m03 PARTITION OF events.trace_log
FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');

creates a range partition where this insert will fall into partition:
INSERT INTO events.trace_log(
cloudeventid, resource, eventtype, consumer, "time", subscriptionid, responsecode, subscriberendpoint, activity)
VALUES ('ec10976a-e507-4ef3-9ee8-1e44a52501be', 'resource', 'eventtype', 'consumer', '2025-03-01', 1, 200, 'http://localhost', 'TEST');

while this entry will create an error, given that the above partition is the only available partition:
INSERT INTO events.trace_log(
cloudeventid, resource, eventtype, consumer, "time", subscriptionid, responsecode, subscriberendpoint, activity)
VALUES ('ec10976a-e507-4ef3-9ee8-1e44a52501be', 'resource', 'eventtype', 'consumer', '2025-04-01', 1, 200, 'http://localhost', 'TEST');

ERROR: no partition of relation "trace_log" found for row
Partition key of the failing row contains ("time") = (2025-04-01 00:00:00+02).

eskebab · 2025-03-04T11:02:34Z

If one tries to create a partition that overlaps with another, this error will be given:
partition "trace_log_y2024m04" would overlap partition "trace_log_y2024m03"

eskebab · 2025-03-11T12:58:31Z

since creating a table partition is an idempotent operation, using a background service in .NET could be a good alternative, because we would then have control over the range of the coming month/partition.

eskebab changed the title ~~cleanup routine~~ Cleanup routine Feb 13, 2025

eskebab changed the title ~~Cleanup routine~~ Cleanup routine using table partitioning Feb 25, 2025

eskebab self-assigned this Feb 28, 2025

eskebab mentioned this issue Mar 11, 2025

Add migration script for partitioning table trace_log #678

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup routine using table partitioning #659

Cleanup routine using table partitioning #659

eskebab commented Feb 11, 2025 •

edited

Loading

eskebab commented Mar 4, 2025 •

edited

Loading

eskebab commented Mar 4, 2025

eskebab commented Mar 4, 2025

eskebab commented Mar 11, 2025

Cleanup routine using table partitioning #659

Cleanup routine using table partitioning #659

Comments

eskebab commented Feb 11, 2025 • edited Loading

eskebab commented Mar 4, 2025 • edited Loading

primary key must include partitioned column

create index on range column

eskebab commented Mar 4, 2025

eskebab commented Mar 4, 2025

eskebab commented Mar 11, 2025

eskebab commented Feb 11, 2025 •

edited

Loading

eskebab commented Mar 4, 2025 •

edited

Loading