Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use of HAVING outside of GROUP BY should raise a suitable SQLSyntaxError #19320

Merged
merged 1 commit into from
Oct 20, 2024

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Oct 19, 2024

Ref #19310.

While many (but not all) SQL dialects allow a bare HAVING (eg: outside of GROUP BY context) it is often clearer to implement such constraints using a WHERE. However, currently we are silently ignoring such HAVING clauses :\

We should raise a syntax error here for now (can consider allowing such clauses in the future, though use of HAVING outside of GROUP BY is pretty niche - it requires implicitly treating all rows as a single group, and therefore for the SELECT to be composed of aggregate functions. In the meantime these queries can be written with a follow-on WHERE clause).

(TODO: support this use of HAVING later).

Example

import polars as pl

df = pl.DataFrame({
    "key": ["aaa", "aaa", "bbb", "bbb", "bbb"],
    "value": [100, 25, -200, 50, 135],
})

Bare HAVING clause now raises a SQLSyntaxError:

df.sql("""
  SELECT
    key,
    SUM(value) AS total, 
    COUNT(key) AS n_keys
  FROM self
  HAVING n_keys > 2
""")

# SQLSyntaxError: HAVING clause not valid outside of GROUP BY

In this case the caller can add the missing GROUP BY to fix the query:

df.sql("""
  SELECT 
    key,
    SUM(value) AS total, 
    COUNT(key) AS n_keys
  FROM self
  GROUP BY key
  HAVING n_keys > 2
""")

# shape: (1, 3)
# ┌─────┬───────┬────────┐
# │ key ┆ total ┆ n_keys │
# │ --- ┆ ---   ┆ ---    │
# │ str ┆ i64   ┆ u32    │
# ╞═════╪═══════╪════════╡
# │ bbb ┆ -15   ┆ 3      │
# └─────┴───────┴────────┘

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Oct 19, 2024
@alexander-beedie alexander-beedie added the A-sql Area: Polars SQL functionality label Oct 19, 2024
Copy link

codecov bot commented Oct 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.21%. Comparing base (46edcd8) to head (228c7cf).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19320      +/-   ##
==========================================
- Coverage   80.23%   80.21%   -0.02%     
==========================================
  Files        1523     1523              
  Lines      209529   209529              
  Branches     2434     2434              
==========================================
- Hits       168115   168083      -32     
- Misses      40859    40891      +32     
  Partials      555      555              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alippai
Copy link

alippai commented Oct 19, 2024

Most engines don't allow select count(col1) as n from df where n > 2, actually all I know require to write select count(col1) as n from df having n > 2.

It's still LGTM, raising an error is much better than ignoring a filter

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Oct 20, 2024

Most engines don't allow select count(col1) as n from df where n > 2, actually all I know require to write select count(col1) as n from df having n > 2.

Ahh, I didn't mean you could literally swap HAVING for WHERE (because you can't usually have aggregates in WHERE clauses), but rather that you can easily write the given query with a follow-on WHERE clause instead, eg:

select * from (select count(key) as n from df) where n > 2

I'll slightly edit/clarify the statement to make that clearer, and take a look at how much work it is to add the bare HAVING support later.

@ritchie46 ritchie46 merged commit 5b44a96 into pola-rs:main Oct 20, 2024
56 checks passed
@alexander-beedie alexander-beedie deleted the sql-invalid-having-clause branch October 20, 2024 10:41
@alippai
Copy link

alippai commented Oct 20, 2024

This only works if it’s part of the projection. You might want to filter on SUM and return COUNT. Ofc you can have more columns in the inner projection but it’s really getting verbose (however this is what I exactly do now)

@alippai
Copy link

alippai commented Oct 20, 2024

Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants