From 907987c3376bda185c9b95dcc5e92954d306b73a Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Tue, 26 Sep 2023 12:44:53 -0300 Subject: [PATCH 01/13] Format: Point out that there might be multiple physical layouts for each logical type --- docs/source/format/Columnar.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index afbe2a08ee28c..dc2334bb09fbc 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -100,8 +100,8 @@ Arrays are defined by a few pieces of metadata and data: Nested arrays additionally have a sequence of one or more sets of these items, called the **child arrays**. -Each logical data type has a well-defined physical layout. Here are -the different physical layouts defined by Arrow: +Each logical data type has one or more well-defined physical layouts. Here +are the different physical layouts defined by Arrow: * **Primitive (fixed-size)**: a sequence of values each having the same byte or bit width @@ -618,8 +618,8 @@ for the null struct but they are "hidden" by the struct array's validity bitmap. However, when treated independently, corresponding entries of the children array will be non-null. -Union Layout ------------- +Union Layouts +------------- A union is defined by an ordered sequence of types; each slot in the union can have a value chosen from these types. The types are named From e1426cf0012d0c8e0bf7dc0a6d2645ce130a2477 Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Tue, 26 Sep 2023 12:45:33 -0300 Subject: [PATCH 02/13] Format: Use singular like all the other items in this list --- docs/source/format/Columnar.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index dc2334bb09fbc..5e02f27418411 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -108,7 +108,7 @@ are the different physical layouts defined by Arrow: * **Variable-size Binary**: a sequence of values each having a variable byte length. Two variants of this layout are supported using 32-bit and 64-bit length encoding. -* **Views of Variable-size Binary**: a sequence of values each having a +* **View of Variable-size Binary**: a sequence of values each having a variable byte length. In contrast to Variable-size Binary, the values of this layout are distributed across potentially multiple buffers instead of densely and sequentially packed in a single buffer. From bc4d8a4bd9ca306b5080aa855bd2c29cba96264f Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Tue, 26 Sep 2023 12:46:30 -0300 Subject: [PATCH 03/13] Format: Fix typos and remove trailing whitespace --- docs/source/format/Columnar.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index 5e02f27418411..73882dfcc0a85 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -382,7 +382,7 @@ In both the long and short string cases, the first four bytes encode the length of the string and can be used to determine how the rest of the view should be interpreted. -In the short string case the string's bytes are inlined- stored inside the +In the short string case the string's bytes are inlined — stored inside the view itself, in the twelve bytes which follow the length. In the long string case, a buffer index indicates which data buffer @@ -858,19 +858,19 @@ are held in the second child array. For the purposes of determining field names and schemas, these child arrays are prescribed the standard names of **run_ends** and **values** respectively. -The values in the first child array represent the accumulated length of all runs +The values in the first child array represent the accumulated length of all runs from the first to the current one, i.e. the logical index where the current run ends. This allows relatively efficient random access from a logical index using binary search. The length of an individual run can be determined by subtracting two adjacent values. (Contrast this with run-length encoding, in which the lengths of the runs are represented directly, and in which random -access is less efficient.) +access is less efficient.) .. note:: Because the ``run_ends`` child array cannot have nulls, it's reasonable to consider why the ``run_ends`` are a child array instead of just a buffer, like the offsets for a :ref:`variable-size-list-layout`. This - layout was considered, but it was decided to use the child arrays. + layout was considered, but it was decided to use the child arrays. Child arrays allow us to keep the "logical length" (the decoded length) associated with the parent array and the "physical length" (the number From f070ce69a93dc617ab708f0f6f6736352ab91710 Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Tue, 26 Sep 2023 12:47:03 -0300 Subject: [PATCH 04/13] Format: Introduce the ListView layout to the spec --- docs/source/format/Columnar.rst | 114 +++++++++++++++++++++++++++++++- 1 file changed, 111 insertions(+), 3 deletions(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index 73882dfcc0a85..d2fe9ae58744c 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -118,6 +118,11 @@ are the different physical layouts defined by Arrow: variable-length sequence of values taken from a child data type. Two variants of this layout are supported using 32-bit and 64-bit length encoding. +* **View of Variable-size List**: a nested layout where each value is a + variable-length sequence of values taken from a child data type. This + layout differs from **Variable-size List** by having an additional + buffer containing the sizes of each list value. This removes a constraint + on the offsets buffer — it does not need to be in order. * **Struct**: a nested layout consisting of a collection of named child **fields** each having the same length but possibly different types. @@ -401,11 +406,17 @@ This layout is adapted from TU Munich's `UmbraDB`_. .. _variable-size-list-layout: -Variable-size List Layout -------------------------- +Variable-size List Layouts +-------------------------- List is a nested type which is semantically similar to variable-size -binary. It is defined by two buffers, a validity bitmap and an offsets +binary. There are two list layout variations — "list" and "list-view" — +and each variation can use either 32-bit or 64-bit offsets. + +List Layout +~~~~~~~~~~~ + +The List layout is defined by two buffers, a validity bitmap and an offsets buffer, and a child array. The offsets are the same as in the variable-size binary case, and both 32-bit and 64-bit signed integer offsets are supported options for the offsets. Rather than referencing @@ -487,6 +498,103 @@ will be represented as follows: :: |-------------------------------|-----------------------| | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 | unspecified (padding) | +ListView Layout +~~~~~~~~~~~~~~~ + +The ListView layout is defined by three buffers instead of just two: +a validity bitmap, an offsets buffer, and an additional sizes buffer. +The sizes have the same bit width as the offsets and both 32-bit and 64-bit +signed integer options are supported. Like in the List layout, the offsets +reference the child array. + +Rather then inferring list lengths from the offsets, the sizes buffer +stores the length of each list in the array. This in turn allows offsets to be +out of order. Elements of the child array do not have to be stored in the +same order they logically appear in the list elements of the parent array. + +When a value is null, the corresponding offset and size can have arbitrary +values. When size is 0, the corresponding offset can have an arbitrary value. +If choosing a value is possible, we recommend setting offsets and sizes to 0 in +these cases. + +A list-view type is specified like ``ListView``, where ``T`` is any type +(primitive or nested). In these examples we use 32-bit offsets where +the 64-bit offset version would be denoted by ``LargeListView``. + +**Example Layout: ``List`` Array** + +We illustrate an example of ``ListView`` with length 4 having values:: + + [[12, -7, 25], null, [0, -127, 127, 50], []] + +will have the following representation: :: + + * Length: 4, Null count: 1 + * Validity bitmap buffer: + + | Byte 0 (validity bitmap) | Bytes 1-63 | + |--------------------------|-----------------------| + | 00001101 | 0 (padding) | + + * Offsets buffer (int32) + + | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-63 | + |------------|-------------|-------------|-------------|-----------------------| + | 0 | unspecified | 3 | unspecified | unspecified (padding) | + + * Sizes buffer (int32) + + | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-63 | + |------------|-------------|-------------|-------------|-----------------------| + | 3 | unspecified | 4 | 0 | unspecified (padding) | + + * Values array (Int8array): + * Length: 7, Null count: 0 + * Validity bitmap buffer: Not required + * Values buffer (int8) + + | Bytes 0-6 | Bytes 7-63 | + |------------------------------|-----------------------| + | 12, -7, 25, 0, -127, 127, 50 | unspecified (padding) | + +**Example Layout: ``ListView`` Array** + +We continue with the ``ListView`` type, but this instance illustrates out +of order offsets and sharing of child array values. It is an array with length 5 +having logical values:: + + [[12, -7, 25], null, [0, -127, 127, 50], [], [50, 12]] + +It will have the following representation: :: + + * Length: 4, Null count: 1 + * Validity bitmap buffer: + + | Byte 0 (validity bitmap) | Bytes 1-63 | + |--------------------------|-----------------------| + | 00011101 | 0 (padding) | + + * Offsets buffer (int32) + + | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-19 | Bytes 20-63 | + |------------|-------------|-------------|-------------|-------------|-----------------------| + | 4 | unspecified | 0 | unspecified | 3 | unspecified (padding) | + + * Sizes buffer (int32) + + | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-19 | Bytes 20-63 | + |------------|-------------|-------------|-------------|-------------|-----------------------| + | 3 | unspecified | 4 | 0 | 2 | unspecified (padding) | + + * Values array (Int8array): + * Length: 7, Null count: 0 + * Validity bitmap buffer: Not required + * Values buffer (int8) + + | Bytes 0-6 | Bytes 7-63 | + |------------------------------|-----------------------| + | 0, -127, 127, 50, 12, -7, 25 | unspecified (padding) | + Fixed-Size List Layout ---------------------- From f3842854f0f132c7c8a176f9727022d741dd8faf Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Fri, 22 Sep 2023 12:22:34 -0300 Subject: [PATCH 05/13] Format: Add ListView and LargeListView to Flatbuffers spec --- format/Schema.fbs | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/format/Schema.fbs b/format/Schema.fbs index fdaf623931760..cc50170c2c1ee 100644 --- a/format/Schema.fbs +++ b/format/Schema.fbs @@ -22,7 +22,8 @@ /// Version 1.1 - Add Decimal256. /// Version 1.2 - Add Interval MONTH_DAY_NANO. /// Version 1.3 - Add Run-End Encoded. -/// Version 1.4 - Add BinaryView, Utf8View, and variadicBufferCounts. +/// Version 1.4 - Add BinaryView, Utf8View, variadicBufferCounts, ListView, and +/// LargeListView. namespace org.apache.arrow.flatbuf; @@ -97,6 +98,17 @@ table List { table LargeList { } +/// Represents the same logical types that List can, but contains offsets and +/// sizes allowing for writes in any order and sharing of child values among +/// list values. +table ListView { +} + +/// Same as ListVIew, but with 64-bit offsets and sizes, allowing to represent +/// extremely large data values. +table LargeListView { +} + table FixedSizeList { /// Number of list items per value listSize: int; @@ -451,6 +463,8 @@ union Type { RunEndEncoded, BinaryView, Utf8View, + ListView, + LargeListView, } /// ---------------------------------------------------------------------- From 077684afdcbf014e39c299ff3e8622bb40e1f4d6 Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Sat, 30 Sep 2023 00:51:09 -0300 Subject: [PATCH 06/13] format/Schema.fbs: fix typo Co-authored-by: David Li --- format/Schema.fbs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/format/Schema.fbs b/format/Schema.fbs index cc50170c2c1ee..70d9634463d61 100644 --- a/format/Schema.fbs +++ b/format/Schema.fbs @@ -104,7 +104,7 @@ table LargeList { table ListView { } -/// Same as ListVIew, but with 64-bit offsets and sizes, allowing to represent +/// Same as ListView, but with 64-bit offsets and sizes, allowing to represent /// extremely large data values. table LargeListView { } From d88e00afd3fdff3d97ca9e919dd381dee6d9d4c3 Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Sat, 30 Sep 2023 00:51:58 -0300 Subject: [PATCH 07/13] Improve the text after bkietz suggestions Co-authored-by: Benjamin Kietzman --- docs/source/format/Columnar.rst | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index d2fe9ae58744c..21e1925a4c6cb 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -411,7 +411,8 @@ Variable-size List Layouts List is a nested type which is semantically similar to variable-size binary. There are two list layout variations — "list" and "list-view" — -and each variation can use either 32-bit or 64-bit offsets. +and each variation can be delimited by either 32-bit or 64-bit offsets +integers. List Layout ~~~~~~~~~~~ @@ -452,7 +453,7 @@ will have the following representation: :: |------------|-------------|-------------|-------------|-------------|-----------------------| | 0 | 3 | 3 | 7 | 7 | unspecified (padding) | - * Values array (Int8array): + * Values array (Int8Array): * Length: 7, Null count: 0 * Validity bitmap buffer: Not required * Values buffer (int8) @@ -501,16 +502,15 @@ will be represented as follows: :: ListView Layout ~~~~~~~~~~~~~~~ -The ListView layout is defined by three buffers instead of just two: -a validity bitmap, an offsets buffer, and an additional sizes buffer. -The sizes have the same bit width as the offsets and both 32-bit and 64-bit -signed integer options are supported. Like in the List layout, the offsets -reference the child array. +The ListView layout is defined by three buffers: a validity bitmap, an offsets +buffer, and an additional sizes buffer. Sizes and offsets have the identical bit +width and both 32-bit and 64-bit signed integer options are supported. -Rather then inferring list lengths from the offsets, the sizes buffer -stores the length of each list in the array. This in turn allows offsets to be -out of order. Elements of the child array do not have to be stored in the -same order they logically appear in the list elements of the parent array. +As in the List layout, the offsets encode the start position of each slot in the +child array. In contrast to the List layout, list lengths are stored explicitly +in the sizes buffer instead of inferred. This allows offsets to be out of order. +Elements of the child array do not have to be stored in the same order they +logically appear in the list elements of the parent array. When a value is null, the corresponding offset and size can have arbitrary values. When size is 0, the corresponding offset can have an arbitrary value. @@ -521,7 +521,7 @@ A list-view type is specified like ``ListView``, where ``T`` is any type (primitive or nested). In these examples we use 32-bit offsets where the 64-bit offset version would be denoted by ``LargeListView``. -**Example Layout: ``List`` Array** +**Example Layout: ``ListView`` Array** We illustrate an example of ``ListView`` with length 4 having values:: @@ -548,7 +548,7 @@ will have the following representation: :: |------------|-------------|-------------|-------------|-----------------------| | 3 | unspecified | 4 | 0 | unspecified (padding) | - * Values array (Int8array): + * Values array (Int8Array): * Length: 7, Null count: 0 * Validity bitmap buffer: Not required * Values buffer (int8) @@ -586,7 +586,7 @@ It will have the following representation: :: |------------|-------------|-------------|-------------|-------------|-----------------------| | 3 | unspecified | 4 | 0 | 2 | unspecified (padding) | - * Values array (Int8array): + * Values array (Int8Array): * Length: 7, Null count: 0 * Validity bitmap buffer: Not required * Values buffer (int8) From ddb19022add31a5c02372c80593f2694ad5ab275 Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Mon, 2 Oct 2023 11:11:43 -0300 Subject: [PATCH 08/13] Columnar.rst: will -> it may because there are multiple possible representations Co-authored-by: Antoine Pitrou --- docs/source/format/Columnar.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index 21e1925a4c6cb..ef65401a7fbb4 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -527,7 +527,7 @@ We illustrate an example of ``ListView`` with length 4 having values:: [[12, -7, 25], null, [0, -127, 127, 50], []] -will have the following representation: :: +It may have the following representation: :: * Length: 4, Null count: 1 * Validity bitmap buffer: From 42b8ec7a70fae474629dc420bc31d4d69f3858a9 Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Mon, 2 Oct 2023 21:26:15 -0300 Subject: [PATCH 09/13] Columnar.rst: Be more strict about offset values --- docs/source/format/Columnar.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index ef65401a7fbb4..0317537f4cdcf 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -512,10 +512,10 @@ in the sizes buffer instead of inferred. This allows offsets to be out of order. Elements of the child array do not have to be stored in the same order they logically appear in the list elements of the parent array. -When a value is null, the corresponding offset and size can have arbitrary -values. When size is 0, the corresponding offset can have an arbitrary value. -If choosing a value is possible, we recommend setting offsets and sizes to 0 in -these cases. +When a value is null, the corresponding size is expected to be 0. When the size +is 0 because the value is null or because the value represents an empty list, +the corresponding offset may have any value between 0 and the length of the +child array (inclusive). A list-view type is specified like ``ListView``, where ``T`` is any type (primitive or nested). In these examples we use 32-bit offsets where @@ -540,13 +540,13 @@ It may have the following representation: :: | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-63 | |------------|-------------|-------------|-------------|-----------------------| - | 0 | unspecified | 3 | unspecified | unspecified (padding) | + | 0 | 7 | 3 | 0 | unspecified (padding) | * Sizes buffer (int32) | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-63 | |------------|-------------|-------------|-------------|-----------------------| - | 3 | unspecified | 4 | 0 | unspecified (padding) | + | 3 | 0 | 4 | 0 | unspecified (padding) | * Values array (Int8Array): * Length: 7, Null count: 0 @@ -565,7 +565,7 @@ having logical values:: [[12, -7, 25], null, [0, -127, 127, 50], [], [50, 12]] -It will have the following representation: :: +It may have the following representation: :: * Length: 4, Null count: 1 * Validity bitmap buffer: @@ -578,13 +578,13 @@ It will have the following representation: :: | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-19 | Bytes 20-63 | |------------|-------------|-------------|-------------|-------------|-----------------------| - | 4 | unspecified | 0 | unspecified | 3 | unspecified (padding) | + | 4 | 7 | 0 | 0 | 3 | unspecified (padding) | * Sizes buffer (int32) | Bytes 0-3 | Bytes 4-7 | Bytes 8-11 | Bytes 12-15 | Bytes 16-19 | Bytes 20-63 | |------------|-------------|-------------|-------------|-------------|-----------------------| - | 3 | unspecified | 4 | 0 | 2 | unspecified (padding) | + | 3 | 0 | 4 | 0 | 2 | unspecified (padding) | * Values array (Int8Array): * Length: 7, Null count: 0 From 4cf820195a232e6a0de0110896a49fae3b96389c Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Tue, 3 Oct 2023 14:44:26 -0300 Subject: [PATCH 10/13] Columnar.rst: Fix title level for 'Variadic Buffers' --- docs/source/format/Columnar.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index 0317537f4cdcf..3a75dd82d0e8b 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -1230,7 +1230,7 @@ addresses between libraries, it is recommended to set ``size`` to the actual memory size rather than the padded size. Variadic buffers -^^^^^^^^^^^^^^^^ +---------------- Some types such as Utf8View are represented using a variable number of buffers. For each such Field in the pre-ordered flattened logical schema, there will be From 0d02bdc8ecdf0bd3b3bea803c42ec586ffa76575 Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Tue, 3 Oct 2023 15:03:28 -0300 Subject: [PATCH 11/13] Columnar.rst: Reinforce the message that sizes bit-width match offsets' --- docs/source/format/Columnar.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index 3a75dd82d0e8b..9b94f4d1889ac 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -518,8 +518,8 @@ the corresponding offset may have any value between 0 and the length of the child array (inclusive). A list-view type is specified like ``ListView``, where ``T`` is any type -(primitive or nested). In these examples we use 32-bit offsets where -the 64-bit offset version would be denoted by ``LargeListView``. +(primitive or nested). In these examples we use 32-bit offsets and sizes where +the 64-bit version would be denoted by ``LargeListView``. **Example Layout: ``ListView`` Array** From c13b2223d40d96e8865ee1bb425fafca6200100f Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Tue, 3 Oct 2023 15:03:49 -0300 Subject: [PATCH 12/13] Columnar.rst: Turn the invariants into a relation between offsets and sizes --- docs/source/format/Columnar.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index 9b94f4d1889ac..6763c09500060 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -512,10 +512,11 @@ in the sizes buffer instead of inferred. This allows offsets to be out of order. Elements of the child array do not have to be stored in the same order they logically appear in the list elements of the parent array. -When a value is null, the corresponding size is expected to be 0. When the size -is 0 because the value is null or because the value represents an empty list, -the corresponding offset may have any value between 0 and the length of the -child array (inclusive). +Every list-view value, including null values, has to guarantee the following +invariants: :: + + 0 <= offsets[i] <= length of the child array + 0 <= offsets[i] + size[i] <= length of the child array A list-view type is specified like ``ListView``, where ``T`` is any type (primitive or nested). In these examples we use 32-bit offsets and sizes where From dd8991abd509f385d69c7eade9e4ed7e3bcd1461 Mon Sep 17 00:00:00 2001 From: Felipe Oliveira Carvalho Date: Thu, 5 Oct 2023 11:04:51 -0300 Subject: [PATCH 13/13] Columnar.rst: Don't say that logical types have multiple physical layouts --- docs/source/format/Columnar.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index 6763c09500060..3f8cd946292ea 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -100,8 +100,8 @@ Arrays are defined by a few pieces of metadata and data: Nested arrays additionally have a sequence of one or more sets of these items, called the **child arrays**. -Each logical data type has one or more well-defined physical layouts. Here -are the different physical layouts defined by Arrow: +Each logical data type has a well-defined physical layout. Here are +the different physical layouts defined by Arrow: * **Primitive (fixed-size)**: a sequence of values each having the same byte or bit width @@ -406,8 +406,8 @@ This layout is adapted from TU Munich's `UmbraDB`_. .. _variable-size-list-layout: -Variable-size List Layouts --------------------------- +Variable-size List Layout +------------------------- List is a nested type which is semantically similar to variable-size binary. There are two list layout variations — "list" and "list-view" — @@ -727,8 +727,8 @@ for the null struct but they are "hidden" by the struct array's validity bitmap. However, when treated independently, corresponding entries of the children array will be non-null. -Union Layouts -------------- +Union Layout +------------ A union is defined by an ordered sequence of types; each slot in the union can have a value chosen from these types. The types are named