You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When enabling auto indexing, we call SparkColumnsToIndexSelector to choose which are the best columns to group the data.
This selection is based on statistics and correlations of the data itself, but if no data is provided, the current default behavior is to select the first N columns of the schema.
We should define and concrete if that makes sense and what is the minimum number of columns to index.
The text was updated successfully, but these errors were encountered:
After some discussion, we agreed that, if the DataFrame is empty, makes little sense to use AutoIndexing right away. The code should wait until some data is written to activate the feature.
What went wrong?
When enabling auto indexing, we call
SparkColumnsToIndexSelector
to choose which are the best columns to group the data.This selection is based on statistics and correlations of the data itself, but if no data is provided, the current default behavior is to select the first N columns of the schema.
We should define and concrete if that makes sense and what is the minimum number of columns to index.
The text was updated successfully, but these errors were encountered: