Skip to content

Add custom decoder in arrow-json #7442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cht42
Copy link

@cht42 cht42 commented Apr 24, 2025

Similar behavior as #7015 but on the decoder side. This would allow introducing flexibility when decoding JSON.

For example, when trying to replicate JSON decoding in spark, spark will decode incorrect values as null or raw string. This would allow overriding the current StringDecoder to implement this functionality.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 24, 2025
@Blizzara
Copy link
Contributor

FYI @tustvold @alamb @adriangb since you worked on the encoder one. This would allow us (I work with @cht42) to not vendor the code to get the spark-compatibility. Alternatively, if you have other ideas for how to reach our goal (we could also make the spark compat a config option, for example), let us know!

@tustvold
Copy link
Contributor

tustvold commented Apr 24, 2025

I'm afraid I am unlikely to have time to review this in the near future, but I am not a big fan of making the tape public, this can and has changed in non-trivial ways in the past.

get the spark-compatibility

What does this look like? Is it just using nulls for fallible conversions, as that doesn't sound like a big lift, or is it something more intrusive?

@cht42
Copy link
Author

cht42 commented Apr 24, 2025

It is more intrusive when the target type is string. In that case it will use the raw string instead of null. For example

{"a": [1,2,3]}

With target schema of map<str, str> will be decoded as

{"a": "[1,2,3]"}

In spark.

For just fallback to null on parsing errors, #7230 is open that would handle that.

@adriangb
Copy link
Contributor

Without having looked at the code I will try to review - although I may not be able to give much feedback since I haven't worked on this side of the codebase before and we do not use it internally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants