Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: coerce_primitive for serde decoded data #5101

Merged
merged 4 commits into from
Nov 21, 2023

Conversation

fansehep
Copy link
Contributor

Which issue does this PR close?

Closes #5095

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 20, 2023
{"a": 1, "b": "A", "c": "T"},
{"a": 2, "b": "BB", "c": "F"},
{"a": 3, "b": 123, "c": false}
]"#;
Copy link
Contributor

@tustvold tustvold Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could test with some numbers that are larger than i32::MAX, i.e. require i64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could test with some numbers that are larger than i32::MAX, i.e. require i64

done, add i32::MAX and like i64::MAX tests.

@@ -89,6 +97,12 @@ impl<O: OffsetSizeTrait> ArrayDecoder for StringArrayDecoder<O> {
TapeElement::Number(idx) if coerce_primitive => {
builder.append_value(tape.get_string(idx));
}
TapeElement::I64(n) | TapeElement::I32(n) if coerce_primitive => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling of I64 is incorrect here, this will only use the low bits. There are some examples elsewhere in how to interpret this tape element

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling of I64 is incorrect here, this will only use the low bits. There are some examples elsewhere in how to interpret this tape element

I'm a little confused about the design here. Why we don't use a i64 to save the value?

Copy link
Contributor

@tustvold tustvold Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the TapeElement would then need to increase in size from the current 64-bits to 128-bits (for alignment reasons), tape decoding is exceptionally hot and so concerns like this actually matter 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the TapeElement would then need to increase in size from the current 64-bits to 128-bits (for alignment reasons), tape decoding is exceptionally hot and so concerns like this actually matter 😅

I get it. Just for performance. 😺

@tustvold tustvold changed the title fix: fix json decode number fix: coerce_primitive for serde decoded data Nov 20, 2023
Signed-off-by: fan <[email protected]>
_ => unreachable!(),
}
}
TapeElement::I32(n) if coerce_primitive => {
builder.append_value(n.to_string());
}
TapeElement::F32(n) | TapeElement::F64(n) if coerce_primitive => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do the same for f64 please 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do the same for f64 please 😄

done.

Signed-off-by: fan <[email protected]>
Comment on lines 64 to 69
TapeElement::I64(n) | TapeElement::I32(n) if coerce_primitive => {
data_capacity += n.to_string().len();
}
TapeElement::F32(n) | TapeElement::F64(n) if coerce_primitive => {
data_capacity += n.to_string().len();
}
Copy link
Contributor

@tustvold tustvold Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TapeElement::I64(n) | TapeElement::I32(n) if coerce_primitive => {
data_capacity += n.to_string().len();
}
TapeElement::F32(n) | TapeElement::F64(n) if coerce_primitive => {
data_capacity += n.to_string().len();
}
TapeElement::I64(n) | TapeElement::I32(n) | TapeElement::F32(n) | TapeElement::F64(n) => {
data_capacity += 10; // An arbitrary estimate
}

This avoids serializing to a string multiple times, this could be improved if it becomes a bottleneck, but this is probably good enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improvement looks good.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good thank you, just a minor nit

@tustvold tustvold merged commit df69ef5 into apache:master Nov 21, 2023
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coerce_primitive not honored when decoding from serde object
2 participants