-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delta+Azure is slow #104
Comments
This is something else we tried, the query time is around 4-5s. It has a range query so takes much more data in consideration.
|
How does the transaction log look? We have experienced similar performance issues for tables where the transaction log had a lot of entries. |
@nfoerster thanks for reporting, I will investigate. This sounds like filter pushdown on partitions is not working properly. @kyrre I'm working on an optimization that should improve that: #110 |
How can I record the logs? |
I am referring to the logs in in _delta_log folder. Afaik this issue is due to limitations with the Azure API: microsoft/AzureStorageExplorer#134 (comment) |
are you setting delta.logRetentionDuration to a really low value to achieve this? |
We are doing some tests as now azure and delta plugin work together, however we have heavy problems to write performant queries on our deltalake test.
The deltalake has around 70 columns and 1,5 billion rows, it is partitioned by 2 layers, the first on serialnumber has around 270 partitions and the second layer around 10-20 based on year-month. All files are parquets, we have only one deltalake version, the data is compacted and vacuumed, the metadata history is almost clean.
We are performing the queries with duckdb 1.1.1 from azure vm inside same vnet as the blob store is.
This is the setup:
As you can see there is not much of a difference between one or two partition clauses although its far less data. I think it scans the whole deltatable instead of pushing down the filters to partitions.
Did you ever had similar observations? Any hints would be nice.
The text was updated successfully, but these errors were encountered: