[BUG] Filters on partition columns not taking effect | Spark 3.5.0 | com.crealytics:spark-excel_2.12:3.5.0_0.20.2/3 and 3.5.1_0.20.4 #907

minnieshi · 2024-12-17T14:41:31Z

Am I using the newest version of the library?

I have made sure that I'm using the latest version of the library.
I've read similar OPEN issue
[BUG] Filters on partition columns don't work | Spark 3.3.1 | com.crealytics:spark-excel_2.12:3.3.1_0.18.5 #727

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

The filter on a column, the partition folder, does not take effect on the below combination versions:

spark-excel_2.12-3.5.0_0.20.2 + Spark 3.5.0
spark-excel_2.12-3.5.0_0.20.3 + Spark 3.5.0
spark-excel_2.12-3.5.1_0.20.4 + Spark 3.5.0
(I did not list the 3.5.0_0.20.1 here as it has other issues which in older versions it had the same packing error
SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: excel
)

spark 3.5 meant databricks 15.4

The spark-excel library

databricks notebook (scala) filter code:

Expected Behavior

dataframe Filters work on partition folders
ps, the below version combinations work
spark-excel_2.12-3.2.4_0.20.4 + Spark 3.3.2
spark-excel_2.12-3.2.2_0.18.5 + Spark 3.3.2

Steps To Reproduce

see the notebook screenshot
val df = spark.read .format("excel") // for V2 implementation .option("dataAddress", "0!A3") // Optional, default: "A1" .option("header", "true") // Required .option("inferSchema", "true") // Optional, default: false .option("treatEmptyValuesAsNulls", "true") .load(excelPath)
also tried to filter using an integer
import org.apache.spark.sql.functions.col import org.apache.spark.sql.functions._ display(df.where(col("execution_date") === lit(20231218)).select("execution_date").distinct)
filter did not take effect

Environment

- Spark version:
- Spark-Excel version:
- OS:
- Cluster environment

Anything else?

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-12-17T14:41:43Z

Please check these potential duplicates:

[[BUG] Filters on partition columns don't work | Spark 3.3.1 | com.crealytics:spark-excel_2.12:3.3.1_0.18.5 #727] [BUG] Filters on partition columns don't work | Spark 3.3.1 | com.crealytics:spark-excel_2.12:3.3.1_0.18.5 (62.62%)
If this issue is a duplicate, please add any additional info to the ticket with the most information and close this one.

github-actions · 2024-12-17T14:59:24Z

Please check these potential duplicates:

[[BUG] Filters on partition columns don't work | Spark 3.3.1 | com.crealytics:spark-excel_2.12:3.3.1_0.18.5 #727] [BUG] Filters on partition columns don't work | Spark 3.3.1 | com.crealytics:spark-excel_2.12:3.3.1_0.18.5 (62.62%)
If this issue is a duplicate, please add any additional info to the ticket with the most information and close this one.

github-actions · 2024-12-17T15:09:52Z

Please check these potential duplicates:

[[BUG] Filters on partition columns don't work | Spark 3.3.1 | com.crealytics:spark-excel_2.12:3.3.1_0.18.5 #727] [BUG] Filters on partition columns don't work | Spark 3.3.1 | com.crealytics:spark-excel_2.12:3.3.1_0.18.5 (62.62%)
If this issue is a duplicate, please add any additional info to the ticket with the most information and close this one.

github-actions · 2024-12-17T15:11:17Z

Please check these potential duplicates:

[[BUG] Filters on partition columns don't work | Spark 3.3.1 | com.crealytics:spark-excel_2.12:3.3.1_0.18.5 #727] [BUG] Filters on partition columns don't work | Spark 3.3.1 | com.crealytics:spark-excel_2.12:3.3.1_0.18.5 (62.62%)
If this issue is a duplicate, please add any additional info to the ticket with the most information and close this one.

minnieshi · 2024-12-17T17:18:34Z

It is not a duplicate; it is similar.
What do you think @nightscape , i can provide all the testing matrix notebook, which has code and rerun result if that helps.

excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.2.2_0.18.5 + Spark 3.4.1
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.2.2_0.18.5 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.2.4_0.20.4 + Spark 3.4.1
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.2.4_0.20.4 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.3.1_0.18.7 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.3.2_0.19.0 + Spark 3.3.2
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.3.2_0.19.0 + Spark 3.4.1
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.3.2_0.19.0 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.3.3_0.20.3 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.3.3_0.20.3 + spark 3.4.1
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.3.4_0.20.4 + Spark 3.4.1
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.3.4_0.20.4 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.1_0.19.0 + Spark 3.4.1
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.1_0.19.0 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.1_0.20.1 + Spark 3.4.1
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.1_0.20.1 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.1_0.20.2 + Spark 3.4.1
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.1_0.20.2 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.1_0.20.3 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.1_0.20.3 + spark 3.4.1
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.1_0.20.4 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.1_0.20.4 + spark 3.4.1
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.4.3_0.20.4 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.5.0_0.20.1 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.5.0_0.20.2 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.5.0_0.20.3 + Spark 3.5.0
excel_reader_filter_poc-ISSUES-com.crealytics:spark-excel_2.12-3.5.1_0.20.4 + Spark 3.5.0
excel_reader_filter_poc-WORKS-com.crealytics:spark-excel_2.12-3.2.2_0.18.5 + Spark 3.3.2
excel_reader_filter_poc-WORKS-com.crealytics:spark-excel_2.12-3.2.4_0.20.4 + Spark 3.3.2

nightscape · 2025-01-03T22:17:15Z

@minnieshi I guess there might have been some change in the internal handling of predicate push-down in 3.5.
That would be interesting to find out.
I had quite some success in a similar case by asking Perplexity to read the Spark changelogs for relevant entries.

github-actions bot added the potential-duplicate label Dec 17, 2024

minnieshi mentioned this issue Dec 18, 2024

[BUG] ClassNotFoundException for 'excel.DefaultSource' while using API V2 #789

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Filters on partition columns not taking effect | Spark 3.5.0 | com.crealytics:spark-excel_2.12:3.5.0_0.20.2/3 and 3.5.1_0.20.4 #907

[BUG] Filters on partition columns not taking effect | Spark 3.5.0 | com.crealytics:spark-excel_2.12:3.5.0_0.20.2/3 and 3.5.1_0.20.4 #907

minnieshi commented Dec 17, 2024 •

edited

Loading

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 17, 2024

minnieshi commented Dec 17, 2024 •

edited

Loading

nightscape commented Jan 3, 2025

[BUG] Filters on partition columns not taking effect | Spark 3.5.0 | com.crealytics:spark-excel_2.12:3.5.0_0.20.2/3 and 3.5.1_0.20.4 #907

[BUG] Filters on partition columns not taking effect | Spark 3.5.0 | com.crealytics:spark-excel_2.12:3.5.0_0.20.2/3 and 3.5.1_0.20.4 #907

Comments

minnieshi commented Dec 17, 2024 • edited Loading

Am I using the newest version of the library?

Is there an existing issue for this?

Current Behavior

The filter on a column, the partition folder, does not take effect on the below combination versions:

spark 3.5 meant databricks 15.4

The spark-excel library

databricks notebook (scala) filter code:

Expected Behavior

Steps To Reproduce

Environment

Anything else?

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 17, 2024

minnieshi commented Dec 17, 2024 • edited Loading

nightscape commented Jan 3, 2025

minnieshi commented Dec 17, 2024 •

edited

Loading

minnieshi commented Dec 17, 2024 •

edited

Loading