Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup routine using table partitioning #659

Open
eskebab opened this issue Feb 11, 2025 · 4 comments
Open

Cleanup routine using table partitioning #659

eskebab opened this issue Feb 11, 2025 · 4 comments
Assignees

Comments

@eskebab
Copy link
Contributor

eskebab commented Feb 11, 2025

Problem
We want to keep data for as long as necessary to do proper analysis. When log entries are considered useless for troubleshooting purposes, they should be deleted or partitioned

Solution
We implement database table partitioning using built-in partitioning in postgres. We can use existing cron.schedule function IF we are able to get next month from context

@eskebab eskebab changed the title cleanup routine Cleanup routine Feb 13, 2025
@eskebab eskebab changed the title Cleanup routine Cleanup routine using table partitioning Feb 25, 2025
@eskebab eskebab self-assigned this Feb 28, 2025
@eskebab
Copy link
Contributor Author

eskebab commented Mar 4, 2025

Following these guidelines for creating a partitioned table:
https://www.postgresql.org/docs/current/ddl-partitioning.html

primary key must include partitioned column

To create a unique or primary key constraint on a partitioned table, the partition keys must not include any expressions or function calls and the constraint's columns must include all of the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are not duplicates in different partitions.

create index on range column

Create an index on the key column(s), as well as any other indexes you might want, on the partitioned table. (The key index is not strictly necessary, but in most scenarios it is helpful.) This automatically creates a matching index on each partition, and any partitions you create or attach later will also have such an index. An index or unique constraint declared on a partitioned table is “virtual” in the same way that the partitioned table is: the actual data is in child indexes on the individual partition tables.

Ensure that the enable_partition_pruning configuration parameter is not disabled in postgresql.conf. If it is, queries will not be optimized as desired.

@eskebab
Copy link
Contributor Author

eskebab commented Mar 4, 2025

CREATE TABLE events.trace_log_y2024m03 PARTITION OF events.trace_log
FOR VALUES FROM ('2025-03-01') TO ('2025-04-01');

creates a range partition where this insert will fall into partition:
INSERT INTO events.trace_log(
cloudeventid, resource, eventtype, consumer, "time", subscriptionid, responsecode, subscriberendpoint, activity)
VALUES ('ec10976a-e507-4ef3-9ee8-1e44a52501be', 'resource', 'eventtype', 'consumer', '2025-03-01', 1, 200, 'http://localhost', 'TEST');

while this entry will create an error, given that the above partition is the only available partition:
INSERT INTO events.trace_log(
cloudeventid, resource, eventtype, consumer, "time", subscriptionid, responsecode, subscriberendpoint, activity)
VALUES ('ec10976a-e507-4ef3-9ee8-1e44a52501be', 'resource', 'eventtype', 'consumer', '2025-04-01', 1, 200, 'http://localhost', 'TEST');

ERROR: no partition of relation "trace_log" found for row
Partition key of the failing row contains ("time") = (2025-04-01 00:00:00+02).

@eskebab
Copy link
Contributor Author

eskebab commented Mar 4, 2025

If one tries to create a partition that overlaps with another, this error will be given:
partition "trace_log_y2024m04" would overlap partition "trace_log_y2024m03"

@eskebab
Copy link
Contributor Author

eskebab commented Mar 11, 2025

since creating a table partition is an idempotent operation, using a background service in .NET could be a good alternative, because we would then have control over the range of the coming month/partition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant