-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drill uses aliases in group by statements generated by SQLA resulting in no data when aggregating columns #28443
Comments
To address the issue with Apache Drill and For instance, if your query is: SELECT length(n_name) AS n_name
FROM
(select * from cp.`tpch/nation.parquet`)
GROUP BY length(n_name)
LIMIT 10; Modify it to avoid using the alias in the SELECT length(n_name) AS n_name
FROM
(select * from cp.`tpch/nation.parquet`)
GROUP BY n_name
LIMIT 10; This adjustment ensures compatibility with Apache Drill's SQL parsing and execution behavior, working around limitations related to the use of aliases in Regarding appending "__" to aliases, this solution was specific to If you encounter further issues or have additional questions, feel free to ask for more assistance.
|
I have implemented a fix based on the suggestion. This can be found in PR #29455 . I'm currently maintaining an image with this fix for my company, keeping it up to date with your changes. I would greatly appreciate it if this issue can be resolved so that I can use the official versions. |
Bug description
GROUP BY
statements usually do not support the use of aliases, but it is supported in Drill since 2019. This added support results in the alias being picked inGROUP BY
statements instead of the source column if they have the same name. This is problematic if the alias refers to an aggregation of the source column.These types of statements are generated when creating a chart and aggregating the values of an axis:
Testing this statement in the SQL Lab against a Drill connection results in the following error:
and an exception in the logs:
The problem is the statement
GROUP BY length(n_name)
. The aggregationlength(n_name)
refers to the source columnn_name
, but the Drill engine uses the aliasn_name
instead. The resulting executed statement is thusGROUP BY length(length(n_name))
, which returns no data.Suggestion
There was a similar issue with
ORDER BY
statements which was fixed by appending "__" to the name of such aliases, ensuring that the source column is used. This was done in PR 13739 (#13739)The same fix can also be applied for
GROUP BY
statements, and the problematic statement above would instead look like this:How to reproduce the bug
An error message should appear in the results view
Screenshots/recordings
No response
Superset version
4.0.0
Python version
Not applicable
Node version
Not applicable
Browser
Firefox
Additional context
From the logs:
Checklist
The text was updated successfully, but these errors were encountered: