From f65d4e19a00955cc7b964c418708750055dde9d1 Mon Sep 17 00:00:00 2001 From: Ed Seidl Date: Wed, 28 Feb 2024 03:35:08 -0800 Subject: [PATCH] PARQUET-2435: Clarify behavior of DELTA_BINARY_PACKED encoding (#231) Address the issue of using more bits in the encoding than are used in the underlying type being encoded. --- Encodings.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Encodings.md b/Encodings.md index aaf7a362f..5040094ff 100644 --- a/Encodings.md +++ b/Encodings.md @@ -245,7 +245,9 @@ Subtractions in steps 1) and 2) may incur signed arithmetic overflow, and so will the corresponding additions when decoding. Overflow should be allowed and handled as wrapping around in 2's complement notation so that the original values are correctly restituted. This may require explicit care in some programming -languages (for example by doing all arithmetic in the unsigned domain). +languages (for example by doing all arithmetic in the unsigned domain). Writers +must not use more bits when bit packing the miniblock data than would be required +to PLAIN encode the physical type (e.g. INT32 data must not use more than 32 bits). The following examples use 8 as the block size to keep the examples short, but in real cases it would be invalid.