You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is to clarify the status of Delete Operation in Qbeast Spark library and which are the further steps on the roadmap.
DELETE is a basic Data Management operation supported in all Open Table Formats (Delta, Iceberg, and Hudi). It allows the removal of specific rows from a Table and can usually can be done in 2 strategies:
Merge On Read. The rows are marked as deleted and are discarded at read time.
Copy on Write. The files where the records are placed would be deleted and the data is rewritten again without the removed records.
As a consequence of interoperability between Formats and Qbeast, this operation can be executed through Delta's interface.
As a default strategy, Delta would use Copy on Write mechanism: delete files and add new ones. Deleting files means that the AddFile entry with the corresponding Qbeast Metadata would no longer be available in the Snapshot, and the newly written file would neither contain the appropriate tags to rebuild the OTree.
Or, in other words: the operation could potentially harm the index structure.
Things to do:
Add an entry in the Documentation that addresses the current limitations.
Analyze the impact of missing blocks.
Analyze the impact of missing cubes.
Develop a mechanism to maintain a correct structure even if some files are missing OR develop a mechanism to ensure deletes maintain the index in a correct shape.
The text was updated successfully, but these errors were encountered:
osopardo1
changed the title
Analyse the impact of Delete operation in Qbeast OTree Index
Analyse the impact of Delete operation in Qbeast Index
May 7, 2024
This issue is to clarify the status of Delete Operation in Qbeast Spark library and which are the further steps on the roadmap.
DELETE is a basic Data Management operation supported in all Open Table Formats (Delta, Iceberg, and Hudi). It allows the removal of specific rows from a Table and can usually can be done in 2 strategies:
As a consequence of interoperability between Formats and Qbeast, this operation can be executed through Delta's interface.
As a default strategy, Delta would use Copy on Write mechanism: delete files and add new ones. Deleting files means that the
AddFile
entry with the corresponding Qbeast Metadata would no longer be available in theSnapshot
, and the newly written file would neither contain the appropriate tags to rebuild the OTree.Or, in other words: the operation could potentially harm the index structure.
Things to do:
The text was updated successfully, but these errors were encountered: