-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Internal] JSON Binary Encoding: Adds support for encoding uniform arrays #4866
[Internal] JSON Binary Encoding: Adds support for encoding uniform arrays #4866
Conversation
… arrays of uniform number arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good!
Microsoft.Azure.Cosmos/src/Json/JsonBinaryEncoding.Enumerator.cs
Outdated
Show resolved
Hide resolved
Microsoft.Azure.Cosmos/src/Json/JsonBinaryEncoding.Enumerator.cs
Outdated
Show resolved
Hide resolved
In a future change, it may be good to consider adding some emulator equivalence tests that utilize these code paths. We can use supported serialization formats to get both binary and text responses and then transcribe the binary to text using the sdk and compare with the backend text response as a baseline. #Resolved |
- Moved the round-trip execution and validation logic from `JsonWriterTests` to the shared utility class `JsonTestUtils`. - Updated `JsonRoundTripTests` to utilize the common utility for execution and verification. - Enhanced round-trip verification with the addition of a new rewrite scenario and improved error reporting. - Added new set of tests to `JsonRoundTripTests` to cover uniform number arrays as well as arrays of uniform number arrays. **JsonNavigator Fixes** - Resolved an issue with writing a navigator node to an `IJsonWriter` instance when the node represents a uniform array or an item within such an array. - Previously, when writing a root node and both the navigator and writer used the Binary serialization format, direct copying of the node was allowed. This is no longer valid as the write options between them may differ. This logic has been replaced with recursive node writing. **Addressed PR Feedback**
…ithub.com/Azure/azure-cosmos-dotnet-v3 into users/sboshra/BinaryEncodingUniformArrays
I'm not too sure if we need to rely on the emulator for that. All the new tests I’ve added are direct ports from the backend, which include the expected binary output. In addition, each implementation should be self-sufficient when it comes to verification as long as it's able to verify round-tripping between different formats. This is true for both SDK and backend tests. In reply to: 2453253955 |
Does it make sense to put these changes behind a flag to minimize issues (since binary is on by default)? #Resolved |
For the JsonBinaryWriter, the EnableUniformNumberArray option must be specified to enable writing uniform number arrays; without this, the behavior remains unchanged. For JsonReader and JsonNavigator, there is no option to disable the reading of uniform number arrays, and I don't believe such a feature is needed. I have significantly expanded the test coverage, including comprehensive round-trip testing, to ensure confidence in implementing these changes. In reply to: 2455776367 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… due to missing package dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…rays (#4866) ## Description Added full end-to-end support for writing and reading binary-encoded uniform number arrays, as well as nested arrays of uniform number arrays. **Uniform Number Arrays** A uniform number array is a JSON array where all items share the same numeric type. The encoding supports the following numeric types: - **Int8**: Signed 1-byte integer (-128 to 127) - **UInt8**: Unsigned 1-byte integer (0 to 255) - **Int16**: Signed 2-byte integer - **Int32**: Signed 4-byte integer - **Int64**: Signed 8-byte integer - **Float16**: 2-byte floating-point value (currently unsupported) - **Float32**: 4-byte floating-point value - **Float64**: 8-byte floating-point value Uniform number arrays are represented by these new type markers: - **ArrNumC1**: Uniform number array with a 1-byte item count - **ArrNumC2**: Uniform number array with a 2-byte item count Both type markers are encoded as follows: `| Type marker | Item type marker | Item count |` To maintain backward compatibility, writing uniform number arrays is controlled via the `EnableNumberArrays `write option. When enabled, at the end of writing an array, the writer checks if all values are numeric. It identifies the smallest numeric type that fits all values and compares the length of the uniform number array to the regular array. If the new length is less than or equal to the old one, the array is converted to a uniform number array. **Arrays of Uniform Number Arrays** This encoding enhancement allows for encoding multiple uniform number arrays with the same underlying numeric type and item count into a single contiguous array of numbers. The items in all arrays are preceded by a prefix indicating the common array encoding and the number of encoded arrays. Arrays of uniform number arrays are supported by these two new type-markers: - **ArrArrNumC1C1**: Array of 1-byte item count of common uniform number arrays with 1-byte item count. - **ArrArrNumC2C2**: Array of 2-byte item count of common uniform number arrays with 2-byte item count. Both new values are encoded as follows: `| Type-marker | Array type-marker | Number type-marker | Number item count | Array item count |` Similar to uniform number arrays, the writing of arrays of uniform number arrays is conditional on the `EnableNumberArrays` write option being specified. This ensures backward compatibility with readers and navigators that do not yet support this encoding. **JSON Serialization Testing** - Introduced a new set of tests for both uniform number arrays and nested arrays of uniform number arrays. - Enhanced the `JsonToken` class to support representation of uniform number array tokens. - Updated `JsonWriterTest` to include additional validation. This now not only checks the expected output but also verifies round-trip consistency across different formats and write options for all three rewrite scenarios: JSON Navigator, JSON Reader - Write All, and JSON Reader - Write Current Token. ## Type of change Please delete options that are not relevant. - [ ] New feature (non-breaking change which adds functionality) ## Closing issues
Description
Added full end-to-end support for writing and reading binary-encoded uniform number arrays, as well as nested arrays of uniform number arrays.
Uniform Number Arrays
A uniform number array is a JSON array where all items share the same numeric type. The encoding supports the following numeric types:
Uniform number arrays are represented by these new type markers:
Both type markers are encoded as follows:
| Type marker | Item type marker | Item count |
To maintain backward compatibility, writing uniform number arrays is controlled via the
EnableNumberArrays
write option. When enabled, at the end of writing an array, the writer checks if all values are numeric. It identifies the smallest numeric type that fits all values and compares the length of the uniform number array to the regular array. If the new length is less than or equal to the old one, the array is converted to a uniform number array.Arrays of Uniform Number Arrays
This encoding enhancement allows for encoding multiple uniform number arrays with the same underlying numeric type and item count into a single contiguous array of numbers. The items in all arrays are preceded by a prefix indicating the common array encoding and the number of encoded arrays.
Arrays of uniform number arrays are supported by these two new type-markers:
Both new values are encoded as follows:
| Type-marker | Array type-marker | Number type-marker | Number item count | Array item count |
Similar to uniform number arrays, the writing of arrays of uniform number arrays is conditional on the
EnableNumberArrays
write option being specified. This ensures backward compatibility with readers and navigators that do not yet support this encoding.JSON Serialization Testing
JsonToken
class to support representation of uniform number array tokens.JsonWriterTest
to include additional validation. This now not only checks the expected output but also verifies round-trip consistency across different formats and write options for all three rewrite scenarios: JSON Navigator, JSON Reader - Write All, and JSON Reader - Write Current Token.Type of change
Please delete options that are not relevant.
Closing issues