Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AUDIT] [SPARK-49743][SQL] OptimizeCsvJsonExpr should not change schema fields when pruning GetArrayStructFields #11691

Closed
amahussein opened this issue Nov 4, 2024 · 1 comment
Labels
audit General label for audit related tasks Spark 3.5+ Spark 3.5+ issues Spark 4.0+ Spark 4.0+ issues

Comments

@amahussein
Copy link
Collaborator

apache/spark@a4fb6cbfda2

This PR affects the from_json operator and at least we need to test the behavior on the plugin.

SELECT
  from_json('[{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b: INT>>').a,
  from_json('[{"a": '||id||', "b": '|| (2*id) ||'}]', 'array<struct<a: INT, b: INT>>').A
FROM
  range(3) as t

Earlier, the result would had been:

Array([ArraySeq(0),ArraySeq(null)], [ArraySeq(1),ArraySeq(null)], [ArraySeq(2),ArraySeq(null)])

vs the new result is (verified through spark-shell):

Array([ArraySeq(0),ArraySeq(0)], [ArraySeq(1),ArraySeq(1)], [ArraySeq(2),ArraySeq(2)])
@amahussein amahussein added ? - Needs Triage Need team to review and classify audit General label for audit related tasks Spark 4.0+ Spark 4.0+ issues Spark 3.5+ Spark 3.5+ issues labels Nov 4, 2024
@revans2
Copy link
Collaborator

revans2 commented Nov 4, 2024

I just looked at this a bit more deeply, and this is a bug in a logical plan optimization in Spark. What is more we don't support top level arrays in from_json yet, so this does not impact us at all.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Nov 5, 2024
@mattahrens mattahrens closed this as not planned Won't fix, can't repro, duplicate, stale Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit General label for audit related tasks Spark 3.5+ Spark 3.5+ issues Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

No branches or pull requests

3 participants