-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Filters on partition columns not taking effect | Spark 3.5.0 | com.crealytics:spark-excel_2.12:3.5.0_0.20.2/3 and 3.5.1_0.20.4 #907
Comments
Please check these potential duplicates:
|
3 similar comments
Please check these potential duplicates:
|
Please check these potential duplicates:
|
Please check these potential duplicates:
|
It is not a duplicate; it is similar. excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.2.2_0.18.5 + Spark 3.4.1 |
@minnieshi I guess there might have been some change in the internal handling of predicate push-down in 3.5. |
Am I using the newest version of the library?
I've read similar OPEN issue
[BUG] Filters on partition columns don't work | Spark 3.3.1 | com.crealytics:spark-excel_2.12:3.3.1_0.18.5 #727
Is there an existing issue for this?
Current Behavior
The filter on a column, the partition folder, does not take effect on the below combination versions:
spark-excel_2.12-3.5.0_0.20.2 + Spark 3.5.0
spark-excel_2.12-3.5.0_0.20.3 + Spark 3.5.0
spark-excel_2.12-3.5.1_0.20.4 + Spark 3.5.0
(I did not list the 3.5.0_0.20.1 here as it has other issues which in older versions it had the same packing error
SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: excel
)
spark 3.5 meant databricks 15.4
The spark-excel library
databricks notebook (scala) filter code:
Expected Behavior
dataframe Filters work on partition folders
ps, the below version combinations work
spark-excel_2.12-3.2.4_0.20.4 + Spark 3.3.2
spark-excel_2.12-3.2.2_0.18.5 + Spark 3.3.2
Steps To Reproduce
see the notebook screenshot
val df = spark.read .format("excel") // for V2 implementation .option("dataAddress", "0!A3") // Optional, default: "A1" .option("header", "true") // Required .option("inferSchema", "true") // Optional, default: false .option("treatEmptyValuesAsNulls", "true") .load(excelPath)
also tried to filter using an integer
import org.apache.spark.sql.functions.col import org.apache.spark.sql.functions._ display(df.where(col("execution_date") === lit(20231218)).select("execution_date").distinct)
filter did not take effect
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: