-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] metric numDeletedRows
missing in Delta log when DELETING complete partition
#1423
Comments
delta/core/src/main/scala/org/apache/spark/sql/delta/commands/DeleteCommand.scala Lines 103 to 106 in 0eb4c7e
|
Thanks @sherlockbeard! That makes sense. I'll propose a change in the documentation so this is reflected there as well. So, if i want to know the number of deleted rows I'll have do perform a count() right before the delete; with the same predicate. EDIT: I just realized, that the documentation is not on GitHub yet (#1307). Would you happen to know, how changes to the documentation can be made? |
i am also not sure about it , but |
@sherlockbeard I did not find the source for the relevant page (https://docs.delta.io/latest/delta-utility.html#operation-metrics-keys) on https://github.com/delta-io/website. @allisonport-db or @zsxwing, can you help out here: Is the doc for https://docs.delta.io/latest/delta-utility.html#operation-metrics-keys already in GIT? I think this is not yet in GIT, but it is planned (#1307). In the meantime, how can we propose changes to the documentation? |
@keen85 we are working on migrating our doc to https://github.com/delta-io/website. Will post the update here when it's done. |
This should be solved by 2118e64 if the table has stats |
thanks @rahulsmahadev, |
Describe the problem
When performing a DELETE operation on a Delta Table, some operational metrics are added to the Delta log / table history that contain information (attribute
operationMetrics
) like number of rows (numDeletedRows
) and files (numAddedFiles
,numRemovedFiles
) deleted/added.See: https://docs.delta.io/latest/delta-utility.html#operation-metrics-keys
However, I noticed that when a complete partition of a partitioned table is deleted via partitionkey, some of those metrics are missing like the very central metric of how many rows were deleted
numDeletedRows
.Steps to reproduce
Observed results
When using the only the partition key for specifying the DELETE condition, the resulting entry in the Delta log does not contain all the operational metrics.
![image](https://user-images.githubusercontent.com/29750255/194708118-b6780936-6861-4ae2-a4db-3df23c46c6db.png)
Expected results
I'd like to see the
numDeletedRows
metric in the log also when partitions are deleted.Environment information
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
The text was updated successfully, but these errors were encountered: