-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Arbitrary JSON values in JSON Reader (#4905) #4911
Conversation
This does represent a fairly minor performance regression, but I'm not too concerned
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really solid to me -- thank you @tustvold
I had some documentation suggestions, but nothing I think that is required before merging
arrow-json/src/reader/mod.rs
Outdated
//! The reader is agnostic to whitespace, including `\n` and `\r`, and will ignore such characters. | ||
//! This allows parsing sequences of one or more arbitrarily formatted JSON values, including | ||
//! but not limited to newline-delimited JSON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor suggestion to improve the wording here
//! The reader is agnostic to whitespace, including `\n` and `\r`, and will ignore such characters. | |
//! This allows parsing sequences of one or more arbitrarily formatted JSON values, including | |
//! but not limited to newline-delimited JSON. | |
//! The reader ignores whitespace between JSON values, including `\n` and `\r`. | |
//! This allows parsing sequences of one or more arbitrarily formatted JSON values, including | |
//! but not limited to newline-delimited JSON. |
arrow-json/src/reader/mod.rs
Outdated
schema, | ||
} | ||
} | ||
|
||
/// Create a new [`ReaderBuilder`] with the provided [`FieldRef`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Create a new [`ReaderBuilder`] with the provided [`FieldRef`] | |
/// Create a new [`ReaderBuilder`] that will parse JSON values with a root schema of [`FieldRef`]. |
Perhaps we can add a note in new()
that says it does require the root to be an object like {..}
.
I wonder if new_from_field
might be a more descriptive name (as this isn't making a new Field, it is making a new reader that reads data with the type on the field
false, | ||
)?; | ||
let (data_type, nullable) = match self.is_field { | ||
false => (DataType::Struct(self.schema.fields.clone()), false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not allow null root fields with structs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because RecordBatch can't support nulls at the root level
@@ -297,7 +297,7 @@ macro_rules! next { | |||
pub struct TapeDecoder { | |||
elements: Vec<TapeElement>, | |||
|
|||
num_rows: usize, | |||
cur_row: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cur_row: usize, | |
/// logical row being decoded | |
cur_row: usize, |
Which issue does this PR close?
Closes #4905
Rationale for this change
We should support decoding JSON payloads that don't have an object as the root.
What changes are included in this PR?
Are there any user-facing changes?