-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1064 profile array elements #1397
Conversation
…lements_update Profile array elements update
Test: test_2_rounds_1k_duckdbPercentage change: -5.9%
Test: test_2_rounds_1k_sqlitePercentage change: 1.6%
Click here for vega lite time series charts |
I think you can ignore the postgres test failure. I am pretty certain that's now been fixed in master. You're welcome to perform another rebase if you'd like to check and confirm. |
column_expressions: str | list[str], | ||
top_n=10, | ||
bottom_n=10, | ||
cast_arrays_as_str=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is False
by default too, so users are going to get spammed by the warning.
This won't be so much of an issue once this is implemented across the board, but it might be a little obnoxious if we don't apply it across the board for a while.
'{gn}' as group_name, | ||
|
||
(select count(value) from | ||
(select unnest ({col_or_expr} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a potential issue with unnest here that the function name changes depending on the dialect e.g. in spark it's explode. If so, might consider adding something to each backend similar to how we deal with random samples (where the dialect varies between linkers). Apologies if you're already aware
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up... yes this was on my radar and was more complicated to deal with as the 'translator' didn't translate the desired function, so Tom is going to work on applying this to the other backends. But I will ask him about the solution used with random samples as it's good to know.
5cc2023
to
c04d98a
Compare
EDIT: Apologies if you read this, I put it on the wrong issue, it was meant for here |
Type of PR
Is your Pull Request linked to an existing Issue or Pull Request?
Solves issue #1064 for DuckDB
Give a brief description for the solution you have provided
Can now profile the individual elements of an array with the DuckDB linker. This is the default behaviour. Can profile the whole array by setting
cast_array_as_string= True
Have written a Backend Agnostic Test for this feature (
test_profile_arrays_bat
). Can edit this and removetest_profile_with_arrays_spark
once the feature has been developed in Spark.PR Checklist