You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Investigating in the Spark UI with simple queries, we detected that the Metadata time for Qbeast datasource is bigger than expected.
Here's a comparison of a small (10 element) dataset read with Delta and Parquet:
Parquet
Delta
Qbeast
While Delta an Parquet spent only 2ms on Metadata time, Qbeast wasted 593ms. And this is for a small dataset, but the situation could get worsen specially in high-append scenarios.
I've checked the Execution Plan and the configuration, and does not seem to have much difference asides from the Index used.
For Parquet, an InMemoryFileIndex is initialized.
For Delta, a PreparedDeltaFileIndex is initialized.
For Qbeast a DefaultFileIndex is initialized.
Further investigation is needed. Will keep the conversation going on this issue.
The text was updated successfully, but these errors were encountered:
Investigating in the Spark UI with simple queries, we detected that the Metadata time for Qbeast datasource is bigger than expected.
Here's a comparison of a small (10 element) dataset read with Delta and Parquet:
Parquet
Delta
Qbeast
While Delta an Parquet spent only 2ms on Metadata time, Qbeast wasted 593ms. And this is for a small dataset, but the situation could get worsen specially in high-append scenarios.
I've checked the Execution Plan and the configuration, and does not seem to have much difference asides from the Index used.
Further investigation is needed. Will keep the conversation going on this issue.
The text was updated successfully, but these errors were encountered: