-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropping columnstore table doesn't deleting its files #22
Comments
Thanks for the bug report! |
hey, any updates on this? and design doc or impl detail for resolving the issue?. Besides dropped tables, once we have |
It's on our shortlist and planned for support soon Yes, both DDL (e.g. DROP TABLE, TRUNCATE, ALTER DROP COLUMN) and DML (e.g. UPDATE, DELETE) can render data files unused, requiring cleanup The challenge is that we can't delete these data files immediately upon executing those commands because:
Thus, data files can only be deleted once they have no remaining references. In a thread-based database, reference counting is straightforward, but in Postgres, which is process-based, each connection runs in its own process and may crash without cleaning up references Postgres faces the same issue with its heap table tuples. We're considering a similar approach by integrating the cleanup into Postgres' VACUUM command Contributions are welcome if you're interested! |
if we track would love to contribute if no one has taken this on yet :) |
Yes, something like that
Sounds great! Appreciate the help! |
Hey @dpxcc , I'd like to discuss the design with you before diving into the code. Thanks in advance for your time and insights! i am planning to create a new heap table, Alternative design options:
The code will be organized under |
Thanks for working on this! These are very good points I agree that using a new heap table to track dead data files is the better approach here For dropped tables, I'd prefer not to add a bgwork since it would require autoloading the extension, which we've been avoiding so far - it's hard to get pg_mooncake autoloaded on hosted PG services like Neon. Instead, I'd rather add a new command or better hook into Also, there are more data files that need to be deleted. We also need to clean up data files created within an aborted transaction or, even worse, within a crashed session |
yeah, avoiding autoloading is reasonable. What if we hook the cleanup at the very end of other tables' autovacuum. Automating this would significantly improve the user experience. Additionally, we could provide a UDF for manual control, since running
YES! Files produced by aborted transactions must indeed be addressed. A straightforward solution is to perform a full listing of the object store, and cross-check it with entries in A more PG-like approach would be to track Let's create a new issue for handling aborted data files. We may clarify the design as I implementing the "dropped table" one. |
I believe you can run I agree that GC untracked data files is better handled in a follow-up task
This way, even if PostgreSQL crashes, orphaned data files remain tracked in |
Looks good to me. |
I suppose VACUUM without table name will have It just sounds weird that when user explicitly says to vacuum one table, but we end up also vacuum other dropped tables |
VACUUM will construct relation list if not specified. https://github.com/postgres/postgres/blob/302cf15759233e654512979286ce1a5c3b36625f/src/backend/commands/vacuum.c#L563 When dropping heap table, the storage is deleted in sync (after commit). https://github.com/postgres/postgres/blob/8a695d7998be67445b9cd8e67faa684d4e87a40d/src/backend/catalog/storage.c#L657
Agreed. Ideally, users should not be aware of the vacuum process for dropped tables at all. |
No, utility hook gets called before Thread 1 "postgres" hit Breakpoint 3, DuckdbUtilityHook_Cpp (pstmt=0xaaaacf051378, query_string=0xaaaacf050910 "VACUUM;", read_only_tree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, query_env=0x0, dest=0xaaaacf051738, qc=0xfffffd1214e8) at ../../src/pgduckdb/pgduckdb_ddl.cpp:139
139 VacuumStmt *stmt = castNode(VacuumStmt, pstmt->utilityStmt);
-exec p *stmt
$2 = {type = T_VacuumStmt, options = 0x0, rels = 0x0, is_vacuumcmd = true}
Thread 1 "postgres" hit Breakpoint 6, vacuum (relations=0x0, params=0xfffffd120c18, bstrategy=0xaaaacf190380, vac_context=0xaaaacf190280, isTopLevel=true) at vacuum.c:489
489 stmttype = (params->options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
-exec bt
#0 vacuum (relations=0x0, params=0xfffffd120c18, bstrategy=0xaaaacf190380, vac_context=0xaaaacf190280, isTopLevel=true) at vacuum.c:489
#1 0x0000aaaacca6b3a0 in ExecVacuum (pstate=0xaaaacf16b110, vacstmt=0xaaaacf0512c8, isTopLevel=true) at vacuum.c:449
#2 0x0000aaaaccd76610 in standard_ProcessUtility (pstmt=0xaaaacf051378, queryString=0xaaaacf050910 "VACUUM;", readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0xaaaacf051738, qc=0xfffffd1214e8) at utility.c:859
#3 0x0000ffffa3ad9fc8 in MooncakeHandleDDL (pstmt=0xaaaacf051378, query_string=0xaaaacf050910 "VACUUM;", read_only_tree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, query_env=0x0, dest=0xaaaacf051738, qc=0xfffffd1214e8) at ../../src/pgduckdb/pgduckdb_ddl.cpp:130
#4 0x0000ffffa3ada430 in DuckdbUtilityHook_Cpp (pstmt=0xaaaacf051378, query_string=0xaaaacf050910 "VACUUM;", read_only_tree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, query_env=0x0, dest=0xaaaacf051738, qc=0xfffffd1214e8) at ../../src/pgduckdb/pgduckdb_ddl.cpp:187
... For dropping heap table at transaction commit - what if there are concurrent transaction reading that dropped table? |
great! I'll proceed with this approach.
|
I see, that makes sense |
What happens?
when I drop a column store table , the file is alredy exists and the file space was not released, how to release the file space when drop a column store table
To Reproduce
create table sometable ..... using columnstore
insert into sometable .....
drop table sometable
OS:
linux
pg_mooncake Version:
latest docker image
Postgres Version:
latest docker image
Are you using pg_mooncake Docker, Neon, or the extension standalone?
pg_mooncake Docker Image
The text was updated successfully, but these errors were encountered: