Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve map(varchar, json) canonicalization bug #24232

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

infvg
Copy link
Contributor

@infvg infvg commented Dec 10, 2024

The map function will not sort a json object by its keys, despite the json_parse function sorting the same input.
If implemented, this will sort json objects.

Description

map(varchar, json) would previously not sort the json values. Modify the map(varchar, json) to sort the resulting json value

Motivation and Context

#24207

canonicalization makes it possible for us to treat the underlying json as a varchar and thus simplifies group by's and comparisons.

Impact

map(varchar, json) result might have different ordering.

Test Plan

Unit test added

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

General Changes
* Resolve a bug where map(varchar, json) does not canonicalize values :pr:`24232`

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Dec 10, 2024
@prestodb-ci prestodb-ci requested review from a team, auden-woolfson and imsayari404 and removed request for a team December 10, 2024 13:55
@infvg infvg force-pushed the BUG_24207 branch 6 times, most recently from e2b74bc to 8e99d17 Compare December 12, 2024 07:21
The map function will not sort a json object by its keys, despite the
json_parse function sorting the same input.
If implemented, this will sort json objects.

Resolves prestodb#24207
@infvg infvg marked this pull request as ready for review December 15, 2024 13:06
@infvg infvg requested a review from a team as a code owner December 15, 2024 13:06
@infvg infvg requested a review from presto-oss December 15, 2024 13:06
@@ -956,8 +960,18 @@ static BlockBuilderAppender createBlockBuilderAppender(Type type)
return new VarcharBlockBuilderAppender(type);
case StandardTypes.JSON:
return (parser, blockBuilder, sqlFunctionProperties) -> {
String json = OBJECT_MAPPED_UNORDERED.writeValueAsString(parser.readValueAsTree());
JSON.writeSlice(blockBuilder, Slices.utf8Slice(json));
Slice slice = Slices.utf8Slice(OBJECT_MAPPED_UNORDERED.writeValueAsString(parser.readValueAsTree()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in understanding that this fixes two problems:

  1. Invalid JSON will suddenly be validated
  2. The contents of the map will be sorted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, invalid json (I assume in presto that refers to unsorted json) will become valid due to the sorting.
Incorrect json eg {"t":"qwdq", "qdwqd"} will still cause errors since both before and after my fix the json gets mapped to a tree object.

@tdcmeehan tdcmeehan self-assigned this Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants