Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to pass where clause predicate to rewrite_data_files which uses year of a timestamp column #11789

Open
salimpadela opened this issue Dec 14, 2024 · 1 comment
Labels
question Further information is requested

Comments

@salimpadela
Copy link

salimpadela commented Dec 14, 2024

Query engine

how to pass predicate in where clause for rewrite_data_files using Pyspark? If it matters, i am using AWS Glue to execute this job.

Question

I cant seem to figure out what is wrong with the way I am passing where clause predicate in rewrite_data_files.

`
spark.sql("CALL glue_catalog.system.rewrite_data_files(table=>'my-awesome-table', where => "col1 IN ('CT') AND col2 IN (5) AND year(CAST(col3 as DATE)) IN (1990)", strategy => 'binpack', options => map('min-input-files', '2'))")

Error Category: UNCLASSIFIED_ERROR; Failed Line Number: 3550; IllegalArgumentException: Cannot translate Spark expression: ((col1#50421 INSET CT AND col2#50422 INSET 10) AND year(cast(col3#50424 as date)) INSET 1990) to data source filter
`

I also tried AND year(col3) IN (1990) in the where clause.

If i don't pass AND year(CAST(col3 as DATE)) IN (1990) in the where clause, it works fine.

what am I missing here?

@salimpadela salimpadela added the question Further information is requested label Dec 14, 2024
@manuzhang
Copy link
Contributor

You may try glue_catalog.system.years(ts) (where ts is a TIMESTAMP and you may cast your column to it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants