From c537700b6145ad4351fe2e4094bb004bbe1947be Mon Sep 17 00:00:00 2001 From: Joel Lubinitsky <33523178+joellubi@users.noreply.github.com> Date: Thu, 8 Aug 2024 14:22:46 -0400 Subject: [PATCH] GH-17682: [Format] Add Bool8 Canonical Extension Type (#43234) ### Rationale for this change Closes: #17682 Arrow Boolean arrays store values as individual bits, which is a very compact representation but does not match the layout of many systems with which it interoperates. By adding an 8-bit Boolean extension type, zero-copy compatibility with many systems can be improved at the cost of large physical representation. Go implementation: #43323 C++ / Python implementation: #43488 ### What changes are included in this PR? Proposal and documentation for `Bool8` canonical extension type. ### Are these changes tested? N/A ### Are there any user-facing changes? N/A * GitHub Issue: #17682 Lead-authored-by: Joel Lubinitsky Co-authored-by: Joel Lubinitsky <33523178+joellubi@users.noreply.github.com> Co-authored-by: Felipe Oliveira Carvalho Signed-off-by: Joel Lubinitsky --- docs/source/format/CanonicalExtensions.rst | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst index 1d86fcf23c4f7..5658f949ceeaa 100644 --- a/docs/source/format/CanonicalExtensions.rst +++ b/docs/source/format/CanonicalExtensions.rst @@ -393,6 +393,28 @@ Examples: {"type_name": "OTHER", "vendor_name": "JDBC driver name"} +8-bit Boolean +============= + +Bool8 represents a boolean value using 1 byte (8 bits) to store each value instead of only 1 bit as in +the original Arrow Boolean type. Although less compact than the original representation, Bool8 may have +better zero-copy compatibility with various systems that also store booleans using 1 byte. + +* Extension name: ``arrow.bool8``. + +* The storage type of this extension is ``Int8`` where: + + * **false** is denoted by the value ``0``. + * **true** can be specified using any non-zero value. Preferably ``1``. + +* Extension type parameters: + + This type does not have any parameters. + +* Description of the serialization: + + Metadata is an empty string. + ========================= Community Extension Types =========================