You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened: Function .is_deltatable return False for existing Delta table after some iteration of an ingestion and clean up via compact + vacuum + checkpoint
What you expected to happen: Expects that a write_delta function with mode="append" or mode="overwrite" not broke delta table. A checkpoint function do not broke Delta tabel. A vacuum function cleanup all outdated files.
How to reproduce it: I've prepared demo script to show my finding. Please guide me if some thing doing wrong, I start to figure out (intuition) that some sort of a 'state' stored with in delta-rs and ie before vacuum it is good idea to reinitiate dt with DeltaTable
More details:
Initiate a python stuff. It is writing to a folder lake_delta.
I continue to ingest a new portion of data 5 times (don't ask me why).
Each time it shows False, until 5th iteration, which returns True and a table "healed" itself (?)
Now I try to fix it with reread of dt variable and seams it works but it already leave 2 opfant files (should be 6 files but directory contains 8 files)
Honestly the issue report is quite convoluted, so I don't quite follow what you are trying to achieve, and what is going wrong.
The only thing I can mention for now is that at this stage: "Try to use existing dt to make a vacuum."
The writes you did prior to that was without passing the dt object, so you never updated the snapshot of your dt object. So it was still referencing an older version.
Hi Ion!
Main problem so far is if I did .optimize.compact and right after I did create_checkpoint it broke something internally and next DeltaTable.is_deltatable(filename) return False
Please check 7-9th lines of code.
If you prefer I can split this issue to smaller ones, but I beleave the problem here in integration test rather then "unit" tests
Environment
Delta-rs version: 0.22.3
Binding: Python
Environment:
Bug
What happened: Function
.is_deltatable
returnFalse
for existing Delta table after some iteration of an ingestion and clean up via compact + vacuum + checkpointWhat you expected to happen: Expects that a
write_delta
function withmode="append"
ormode="overwrite"
not broke delta table. Acheckpoint
function do not broke Delta tabel. A vacuum function cleanup all outdated files.How to reproduce it: I've prepared demo script to show my finding. Please guide me if some thing doing wrong, I start to figure out (intuition) that some sort of a 'state' stored with in delta-rs and ie before vacuum it is good idea to reinitiate dt with
DeltaTable
More details:
Initiate a python stuff. It is writing to a folder
lake_delta
.Define helper function to see some information about our table.
I run compact to join small files to a bigger one.
Create checkpoint and it harms table.
DeltaTable.is_deltatable(filename)
returnsFalse
as resultI continue to ingest a new portion of data 5 times (don't ask me why).
Each time it shows False, until 5th iteration, which returns True and a table "healed" itself (?)
Try to use existing
dt
to make a vacuum.Now I try to fix it with reread of
dt
variable and seams it works but it already leave 2 opfant files (should be 6 files but directory contains 8 files)Create a checkpoint again
Next ingestion via
write_table
don't brake anything.I execute a compaction and cleanup as it implemented in my project.
On 13th version it brakes again ...
We make 14 new ingestion and table eventually heals itself ...
I hope you can reproduce it locally
Need help
PS Whatever existing current bugs, many thanks for this project <3
The text was updated successfully, but these errors were encountered: