Snapshot references old logs still after cleanup_expired_logs_for
#3057
Labels
bug
Something isn't working
cleanup_expired_logs_for
#3057
Environment
Delta-rs version: 0.22.3
Binding: Python
Environment:
Bug
What happened: Function
.is_deltatable
returnFalse
for existing Delta table after some iteration of an ingestion and clean up via compact + vacuum + checkpointWhat you expected to happen: Expects that a
write_delta
function withmode="append"
ormode="overwrite"
not broke delta table. Acheckpoint
function do not broke Delta tabel. A vacuum function cleanup all outdated files.How to reproduce it: I've prepared demo script to show my finding. Please guide me if some thing doing wrong, I start to figure out (intuition) that some sort of a 'state' stored with in delta-rs and ie before vacuum it is good idea to reinitiate dt with
DeltaTable
More details:
Initiate a python stuff. It is writing to a folder
lake_delta
.Define helper function to see some information about our table.
I run compact to join small files to a bigger one.
Create checkpoint and it harms table.
DeltaTable.is_deltatable(filename)
returnsFalse
as resultI continue to ingest a new portion of data 5 times (don't ask me why).
Each time it shows False, until 5th iteration, which returns True and a table "healed" itself (?)
Try to use existing
dt
to make a vacuum.Now I try to fix it with reread of
dt
variable and seams it works but it already leave 2 opfant files (should be 6 files but directory contains 8 files)Create a checkpoint again
Next ingestion via
write_table
don't brake anything.I execute a compaction and cleanup as it implemented in my project.
On 13th version it brakes again ...
We make 14 new ingestion and table eventually heals itself ...
I hope you can reproduce it locally
Need help
PS Whatever existing current bugs, many thanks for this project <3
The text was updated successfully, but these errors were encountered: