From 492860c3b148fb0b15cacf6cb4f1228abd294dda Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Wed, 30 Oct 2024 10:20:56 +0800 Subject: [PATCH 01/19] GH-465: Clarify backward-compatibility rules on LIST type --- LogicalTypes.md | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 7b4b2033..85a91650 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -691,10 +691,14 @@ should always be determined by the following rules: 1. If the repeated field is not a group, then its type is the element type and elements are required. 2. If the repeated field is a group with multiple fields, then its type is the - element type and elements are required. -3. If the repeated field is a group with one field and is named either `array` - or uses the `LIST`-annotated group's name with `_tuple` appended then the - repeated type is the element type and elements are required. + element type and elements are required. To be clear, if the group does not + have annotation, the element type resolves to a multi-field Tuple. If the + group is `LIST`-annotated or `MAP`-annotated, it should resolve to List or + Map type, respectively. +3. If the repeated field is a group (without annotation) with one `required` or + `optional` field, and is named either `array` or uses the `LIST`-annotated + group's name with `_tuple` appended, then the repeated type (a single-field + Tuple type) is the element type and elements are required. 4. Otherwise, the repeated field's type is the element type with the repeated field's repetition. @@ -727,6 +731,14 @@ optional group my_list (LIST) { required binary str (STRING); }; } + +// List> (outer list is nullable with non-null elements, +// inner list is non-null with non-null elements) +optional group my_list (LIST) { + repeated group array (LIST) { + repeated int32 array; + } +} ``` ### Maps From 71a40e23b1b408600d7611e80ba71a25f3d026f7 Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Wed, 30 Oct 2024 20:41:43 +0800 Subject: [PATCH 02/19] fix rule 2 --- LogicalTypes.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 85a91650..c4bcc9bb 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -691,14 +691,12 @@ should always be determined by the following rules: 1. If the repeated field is not a group, then its type is the element type and elements are required. 2. If the repeated field is a group with multiple fields, then its type is the - element type and elements are required. To be clear, if the group does not - have annotation, the element type resolves to a multi-field Tuple. If the - group is `LIST`-annotated or `MAP`-annotated, it should resolve to List or - Map type, respectively. + element type and elements are required. In this case, the element type is + a Struct type with multiple fields. 3. If the repeated field is a group (without annotation) with one `required` or `optional` field, and is named either `array` or uses the `LIST`-annotated group's name with `_tuple` appended, then the repeated type (a single-field - Tuple type) is the element type and elements are required. + Struct type) is the element type and elements are required. 4. Otherwise, the repeated field's type is the element type with the repeated field's repetition. @@ -710,7 +708,7 @@ optional group my_list (LIST) { repeated int32 element; } -// List> (nullable list, non-null elements) +// List> (nullable list, non-null elements) optional group my_list (LIST) { repeated group element { required binary str (STRING); @@ -718,14 +716,14 @@ optional group my_list (LIST) { }; } -// List> (nullable list, non-null elements) +// List> (nullable list, non-null elements) optional group my_list (LIST) { repeated group array { required binary str (STRING); }; } -// List> (nullable list, non-null elements) +// List> (nullable list, non-null elements) optional group my_list (LIST) { repeated group my_list_tuple { required binary str (STRING); From 767f64ec1b59b20ce16b99027b59aa1bfc0c59ec Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Fri, 1 Nov 2024 17:44:06 +0800 Subject: [PATCH 03/19] add more clarification --- LogicalTypes.md | 39 +++++++++++++++++++++++---------------- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index c4bcc9bb..028580d0 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -684,31 +684,37 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer. For -backward-compatibility, the type of elements in `LIST`-annotated structures -should always be determined by the following rules: +Some existing data does not include the inner element layer, meaning that +`LIST` annotates a 2-level structure. In contrast to 3-level structure, The +repetition of the outer level `LIST`-annotated 2-level structure can be +`optional`, `required`, or `required`. For backward-compatibility, the type of +elements in `LIST`-annotated structures should always be determined by the +following rules: 1. If the repeated field is not a group, then its type is the element type and - elements are required. + elements are required. In this case, `LIST` annotates a 2-level structure. 2. If the repeated field is a group with multiple fields, then its type is the - element type and elements are required. In this case, the element type is - a Struct type with multiple fields. -3. If the repeated field is a group (without annotation) with one `required` or - `optional` field, and is named either `array` or uses the `LIST`-annotated + element type and elements are required. In this case, `LIST` annotates a + 2-level structure and the element type is a Struct type with multiple fields. +3. If the repeated field is a group (not `LIST`-annotated) with one `required` + or `optional` field, and is named either `array` or uses the `LIST`-annotated group's name with `_tuple` appended, then the repeated type (a single-field - Struct type) is the element type and elements are required. + Struct type) is the element type and elements are required. This is a special + case of 3-level structure where the names are respected. 4. Otherwise, the repeated field's type is the element type with the repeated - field's repetition. + field's repetition. In this case, `LIST` annotates a 2-level structure. Note + that the repeated field cannot be a 3-level LIST whose repetition must be + `required` or `optional`. Examples that can be interpreted using these rules: ``` -// List (nullable list, non-null elements) +// Rule 1: List (nullable list, non-null elements) optional group my_list (LIST) { repeated int32 element; } -// List> (nullable list, non-null elements) +// Rule 2: List> (nullable list, non-null elements) optional group my_list (LIST) { repeated group element { required binary str (STRING); @@ -716,22 +722,23 @@ optional group my_list (LIST) { }; } -// List> (nullable list, non-null elements) +// Rule 3: List> (nullable list, non-null elements) optional group my_list (LIST) { repeated group array { required binary str (STRING); }; } -// List> (nullable list, non-null elements) +// Rule 3: List> (nullable list, non-null elements) optional group my_list (LIST) { repeated group my_list_tuple { required binary str (STRING); }; } -// List> (outer list is nullable with non-null elements, -// inner list is non-null with non-null elements) +// List> +// Rule 4: nullable outer list with non-null elements +// Rule 1: non-null inner list with non-null elements optional group my_list (LIST) { repeated group array (LIST) { repeated int32 array; From fae84cbd11b090ff4e0cd0576df7150772fe2e5c Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Sat, 2 Nov 2024 08:55:12 +0800 Subject: [PATCH 04/19] address comment --- LogicalTypes.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 028580d0..42e60340 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -685,7 +685,7 @@ optional group my_list (LIST) { ``` Some existing data does not include the inner element layer, meaning that -`LIST` annotates a 2-level structure. In contrast to 3-level structure, The +`LIST` annotates a 2-level structure. In contrast to 3-level structure, the repetition of the outer level `LIST`-annotated 2-level structure can be `optional`, `required`, or `required`. For backward-compatibility, the type of elements in `LIST`-annotated structures should always be determined by the @@ -699,12 +699,11 @@ following rules: 3. If the repeated field is a group (not `LIST`-annotated) with one `required` or `optional` field, and is named either `array` or uses the `LIST`-annotated group's name with `_tuple` appended, then the repeated type (a single-field - Struct type) is the element type and elements are required. This is a special - case of 3-level structure where the names are respected. + Struct type) is the element type and elements are required. 4. Otherwise, the repeated field's type is the element type with the repeated field's repetition. In this case, `LIST` annotates a 2-level structure. Note - that the repeated field cannot be a 3-level LIST whose repetition must be - `required` or `optional`. + that the repeated field cannot be a 3-level LIST, as such a LIST's repetition + must be `required` or `optional`. Examples that can be interpreted using these rules: @@ -746,6 +745,15 @@ optional group my_list (LIST) { } ``` +Some existing data does not even have the `LIST` annotation and simply uses +`repeated` repetition to annotate the element type. In this case, the element +type can be either a primitive type or a `LIST`-annotated 2-level group. + +``` +// List (non-null list, non-null elements) +repeated int32 num; +``` + ### Maps `MAP` is used to annotate types that should be interpreted as a map from keys From 6c6dcf765b251a61dd7b8c8aaf760d0c170ace21 Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Sun, 3 Nov 2024 22:47:06 +0800 Subject: [PATCH 05/19] fix typo --- LogicalTypes.md | 35 ++++++++++++----------------------- 1 file changed, 12 insertions(+), 23 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 42e60340..b1b96b04 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -684,25 +684,23 @@ optional group my_list (LIST) { } ``` -Some existing data does not include the inner element layer, meaning that -`LIST` annotates a 2-level structure. In contrast to 3-level structure, the -repetition of the outer level `LIST`-annotated 2-level structure can be -`optional`, `required`, or `required`. For backward-compatibility, the type of -elements in `LIST`-annotated structures should always be determined by the -following rules: +Some existing data does not include the inner element layer, meaning that `LIST` +annotates a 2-level structure. In contrast to 3-level structure, the repetition +of the outer level of 2-level structure can be `optional`, `required`, or +`repeated`. For backward-compatibility, the type of elements in `LIST`-annotated +2-level structures should always be determined by the following rules: 1. If the repeated field is not a group, then its type is the element type and - elements are required. In this case, `LIST` annotates a 2-level structure. -2. If the repeated field is a group with multiple fields, then its type is the - element type and elements are required. In this case, `LIST` annotates a - 2-level structure and the element type is a Struct type with multiple fields. + elements are required. +2. If the repeated field is a group with multiple fields, then its type (Struct + type with multiple fields) is the element type and elements are required. 3. If the repeated field is a group (not `LIST`-annotated) with one `required` or `optional` field, and is named either `array` or uses the `LIST`-annotated - group's name with `_tuple` appended, then the repeated type (a single-field - Struct type) is the element type and elements are required. + group's name with `_tuple` appended, then the repeated type (Struct type with + single field) is the element type and elements are required. 4. Otherwise, the repeated field's type is the element type with the repeated - field's repetition. In this case, `LIST` annotates a 2-level structure. Note - that the repeated field cannot be a 3-level LIST, as such a LIST's repetition + field's repetition. Note that the repeated field cannot be `LIST`-annotated + or `MAP`-annotated group with 3-level structure, as such a group's repetition must be `required` or `optional`. Examples that can be interpreted using these rules: @@ -745,15 +743,6 @@ optional group my_list (LIST) { } ``` -Some existing data does not even have the `LIST` annotation and simply uses -`repeated` repetition to annotate the element type. In this case, the element -type can be either a primitive type or a `LIST`-annotated 2-level group. - -``` -// List (non-null list, non-null elements) -repeated int32 num; -``` - ### Maps `MAP` is used to annotate types that should be interpreted as a map from keys From e9e5da4767cc2657827df0e8ee1230062f2c2bcb Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Sun, 3 Nov 2024 23:30:28 +0800 Subject: [PATCH 06/19] add new rule --- LogicalTypes.md | 53 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 42 insertions(+), 11 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index b1b96b04..6994dea7 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -670,6 +670,8 @@ optional group array_of_arrays (LIST) { #### Backward-compatibility rules +##### 3-level structure with different names + It is required that the repeated group of elements is named `list` and that its element field is named `element`. However, these names may not be used in existing data and should not be enforced as errors when reading. For example, @@ -684,24 +686,36 @@ optional group my_list (LIST) { } ``` +##### 2-level structure + Some existing data does not include the inner element layer, meaning that `LIST` annotates a 2-level structure. In contrast to 3-level structure, the repetition -of the outer level of 2-level structure can be `optional`, `required`, or -`repeated`. For backward-compatibility, the type of elements in `LIST`-annotated -2-level structures should always be determined by the following rules: +of 2-level structure can be `optional`, `required`, or `repeated`. + +``` + group (LIST) { + repeated ; +} +``` + +For backward-compatibility, the type of elements in `LIST`-annotated 2-level +structures should always be determined by the following rules: 1. If the repeated field is not a group, then its type is the element type and elements are required. 2. If the repeated field is a group with multiple fields, then its type (Struct type with multiple fields) is the element type and elements are required. -3. If the repeated field is a group (not `LIST`-annotated) with one `required` - or `optional` field, and is named either `array` or uses the `LIST`-annotated - group's name with `_tuple` appended, then the repeated type (Struct type with - single field) is the element type and elements are required. -4. Otherwise, the repeated field's type is the element type with the repeated - field's repetition. Note that the repeated field cannot be `LIST`-annotated - or `MAP`-annotated group with 3-level structure, as such a group's repetition - must be `required` or `optional`. +3. If the repeated field is a group with one `required` or `optional` field, + and is named either `array` or uses the `LIST`-annotated group's name with + `_tuple` appended, then the repeated type (Struct type with single field) is + the element type and elements are required. +4. If the repeated field is a `LIST`-annotated group with one `repeated` field, + then the element type is a list type with 2-level structure and elements are + required. +5. Otherwise, the repeated field's type is the element type with the repeated + field's repetition. Please note that the repeated field here (a group with + one field) cannot be `LIST`-annotated or `MAP`-annotated 3-level structure, + as such a group's repetition must be `required` or `optional`. Examples that can be interpreted using these rules: @@ -743,6 +757,23 @@ optional group my_list (LIST) { } ``` +##### 1-level structure without `LIST` annotation + +Some existing data does not even have the `LIST` annotation and simply uses +`repeated` repetition to annotate the element type. In this case, the element +type MUST be a primitive type and both the list and elements are required. + +``` +// List (non-null list, non-null elements) +repeated int32 num; + +// Struct, List> (non-null list, non-null elements) +optional group whatever { + repeated int32 num; + repeated binary str (STRING); +} +``` + ### Maps `MAP` is used to annotate types that should be interpreted as a map from keys From 335b9a412e8490e5bd07854278f69da5e13fde22 Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Tue, 5 Nov 2024 13:23:17 +0800 Subject: [PATCH 07/19] address feedback --- LogicalTypes.md | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 6994dea7..d0470194 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -753,7 +753,14 @@ optional group my_list (LIST) { optional group my_list (LIST) { repeated group array (LIST) { repeated int32 array; - } + }; +} + +// Rule 5: List>> (nullable outer list with non-null elements) +optional group my_list (LIST) { + repeated group foo { + repeated int32 bar; + }; } ``` @@ -767,11 +774,17 @@ type MUST be a primitive type and both the list and elements are required. // List (non-null list, non-null elements) repeated int32 num; -// Struct, List> (non-null list, non-null elements) +// Struct,List> (non-null list, non-null elements) optional group whatever { repeated int32 num; repeated binary str (STRING); } + +// List> (non-null list, non-null elements) +repeated group whatever { + required int32 num; + optional binary str (STRING); +} ``` ### Maps From 93e8a8356129362e718201ee94b3d130764488ae Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Wed, 6 Nov 2024 09:12:03 +0800 Subject: [PATCH 08/19] Update LogicalTypes.md Co-authored-by: Ed Seidl --- LogicalTypes.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index d0470194..2fcd0d9a 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -767,8 +767,8 @@ optional group my_list (LIST) { ##### 1-level structure without `LIST` annotation Some existing data does not even have the `LIST` annotation and simply uses -`repeated` repetition to annotate the element type. In this case, the element -type MUST be a primitive type and both the list and elements are required. +`repeated` repetition to annotate the element type. In this case both the list +and elements are required. ``` // List (non-null list, non-null elements) From dbb6550c1841c948c65259d7e337893131da0c65 Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Wed, 13 Nov 2024 10:13:21 +0800 Subject: [PATCH 09/19] address comments --- LogicalTypes.md | 56 ++++++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 31 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 2fcd0d9a..e0c0d32a 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -670,7 +670,7 @@ optional group array_of_arrays (LIST) { #### Backward-compatibility rules -##### 3-level structure with different names +##### 3-level structure with different field names It is required that the repeated group of elements is named `list` and that its element field is named `element`. However, these names may not be used in @@ -703,19 +703,15 @@ structures should always be determined by the following rules: 1. If the repeated field is not a group, then its type is the element type and elements are required. -2. If the repeated field is a group with multiple fields, then its type (Struct - type with multiple fields) is the element type and elements are required. -3. If the repeated field is a group with one `required` or `optional` field, - and is named either `array` or uses the `LIST`-annotated group's name with - `_tuple` appended, then the repeated type (Struct type with single field) is - the element type and elements are required. -4. If the repeated field is a `LIST`-annotated group with one `repeated` field, - then the element type is a list type with 2-level structure and elements are - required. +2. If the repeated field is a group with multiple fields, then its type is the + element type and elements are required. +3. If the repeated field is a group with a `repeated` field, then the repeated + field is the element type because the type cannot be a 3-level list. +4. If the repeated field is a group with one field and is named either `array` + or uses the `LIST`-annotated group's name with `_tuple` appended then the + repeated type is the element type and elements are required. 5. Otherwise, the repeated field's type is the element type with the repeated - field's repetition. Please note that the repeated field here (a group with - one field) cannot be `LIST`-annotated or `MAP`-annotated 3-level structure, - as such a group's repetition must be `required` or `optional`. + field's repetition. Examples that can be interpreted using these rules: @@ -725,7 +721,7 @@ optional group my_list (LIST) { repeated int32 element; } -// Rule 2: List> (nullable list, non-null elements) +// Rule 2: List> (nullable list, non-null elements) optional group my_list (LIST) { repeated group element { required binary str (STRING); @@ -733,30 +729,28 @@ optional group my_list (LIST) { }; } -// Rule 3: List> (nullable list, non-null elements) +// Rule 3: List> (nullable outer list, non-null elements) optional group my_list (LIST) { - repeated group array { - required binary str (STRING); + repeated group array (LIST) { + repeated int32 array; }; } -// Rule 3: List> (nullable list, non-null elements) +// Rule 4: List> (nullable list, non-null elements) optional group my_list (LIST) { - repeated group my_list_tuple { + repeated group array { required binary str (STRING); }; } -// List> -// Rule 4: nullable outer list with non-null elements -// Rule 1: non-null inner list with non-null elements +// Rule 4: List> (nullable list, non-null elements) optional group my_list (LIST) { - repeated group array (LIST) { - repeated int32 array; + repeated group my_list_tuple { + required binary str (STRING); }; } -// Rule 5: List>> (nullable outer list with non-null elements) +// Rule 5: List>> (nullable outer list, non-null elements) optional group my_list (LIST) { repeated group foo { repeated int32 bar; @@ -767,21 +761,21 @@ optional group my_list (LIST) { ##### 1-level structure without `LIST` annotation Some existing data does not even have the `LIST` annotation and simply uses -`repeated` repetition to annotate the element type. In this case both the list -and elements are required. +`repeated` repetition to annotate the element type. For backward-compatibility, +both the list and elements are `required`. ``` // List (non-null list, non-null elements) repeated int32 num; -// Struct,List> (non-null list, non-null elements) -optional group whatever { +// Tuple, List> (non-null list, non-null elements) +optional group my_list { repeated int32 num; repeated binary str (STRING); } -// List> (non-null list, non-null elements) -repeated group whatever { +// List> (non-null list, non-null elements) +repeated group my_list { required int32 num; optional binary str (STRING); } From 3f7238cd8895709a904737d4109108f918fd2591 Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Wed, 13 Nov 2024 20:31:31 +0800 Subject: [PATCH 10/19] relocate 1-level structure example --- LogicalTypes.md | 34 +++++++++++----------------------- 1 file changed, 11 insertions(+), 23 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index e0c0d32a..b8d19cde 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -609,6 +609,17 @@ that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted as a required list of required elements where the element type is the type of the field. +``` +// List (non-null list, non-null elements) +repeated int32 num; + +// List> (non-null list, non-null elements) +repeated group my_list { + required int32 num; + optional binary str (STRING); +} +``` + Implementations should use either `LIST` and `MAP` annotations _or_ unannotated repeated fields, but not both. When using the annotations, no unannotated repeated types are allowed. @@ -758,29 +769,6 @@ optional group my_list (LIST) { } ``` -##### 1-level structure without `LIST` annotation - -Some existing data does not even have the `LIST` annotation and simply uses -`repeated` repetition to annotate the element type. For backward-compatibility, -both the list and elements are `required`. - -``` -// List (non-null list, non-null elements) -repeated int32 num; - -// Tuple, List> (non-null list, non-null elements) -optional group my_list { - repeated int32 num; - repeated binary str (STRING); -} - -// List> (non-null list, non-null elements) -repeated group my_list { - required int32 num; - optional binary str (STRING); -} -``` - ### Maps `MAP` is used to annotate types that should be interpreted as a map from keys From d0a7212887514bbdb0a1754da82bf4309cbb7dc4 Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Fri, 15 Nov 2024 10:04:35 +0800 Subject: [PATCH 11/19] Update LogicalTypes.md Co-authored-by: Antoine Pitrou --- LogicalTypes.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/LogicalTypes.md b/LogicalTypes.md index b8d19cde..8b2a65ad 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -681,6 +681,11 @@ optional group array_of_arrays (LIST) { #### Backward-compatibility rules +Modern writers should always produce the 3-level LIST structure shown above. +However, historically data files have been produced that use different structures to +represent list-like data, and readers may include compatibility measures to interpret +them as intended. + ##### 3-level structure with different field names It is required that the repeated group of elements is named `list` and that From 76c3a8de873797eb4103a1cd7d405284dd55a9eb Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Fri, 15 Nov 2024 10:18:18 +0800 Subject: [PATCH 12/19] remove illegal example --- LogicalTypes.md | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 8b2a65ad..ad1164c9 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -620,9 +620,9 @@ repeated group my_list { } ``` -Implementations should use either `LIST` and `MAP` annotations _or_ unannotated -repeated fields, but not both. When using the annotations, no unannotated -repeated types are allowed. +For all fields in the schema, implementations should use either `LIST` and +`MAP` annotations _or_ unannotated repeated fields, but not both. When using +the annotations, no unannotated repeated types are allowed. ### Lists @@ -765,13 +765,6 @@ optional group my_list (LIST) { required binary str (STRING); }; } - -// Rule 5: List>> (nullable outer list, non-null elements) -optional group my_list (LIST) { - repeated group foo { - repeated int32 bar; - }; -} ``` ### Maps From f31454450dfefff8781ae3e58154796c503da420 Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Tue, 19 Nov 2024 10:51:49 +0800 Subject: [PATCH 13/19] address feedback --- LogicalTypes.md | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index ad1164c9..42cbc935 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -682,9 +682,9 @@ optional group array_of_arrays (LIST) { #### Backward-compatibility rules Modern writers should always produce the 3-level LIST structure shown above. -However, historically data files have been produced that use different structures to -represent list-like data, and readers may include compatibility measures to interpret -them as intended. +However, historically data files have been produced that use different structures +to represent list-like data, and readers may include compatibility measures to +interpret them as intended. ##### 3-level structure with different field names @@ -704,9 +704,11 @@ optional group my_list (LIST) { ##### 2-level structure -Some existing data does not include the inner element layer, meaning that `LIST` -annotates a 2-level structure. In contrast to 3-level structure, the repetition -of 2-level structure can be `optional`, `required`, or `repeated`. +Some existing data does not include the inner element layer, resulting in a +`LIST` that annotates a 2-level structure. Unlike the 3-level structure, the +repetition of a 2-level structure can be `optional`, `required`, or `repeated`. +When it is `repeated`, the `LIST`-annotated 2-level structure can only serve as +an element within another `LIST`-annotated 2-level structure. ``` group (LIST) { @@ -714,15 +716,17 @@ of 2-level structure can be `optional`, `required`, or `repeated`. } ``` -For backward-compatibility, the type of elements in `LIST`-annotated 2-level -structures should always be determined by the following rules: +For backward-compatibility, the type of elements in `LIST`-annotated structures +should always be determined by the following rules if they cannot be determined +as 3-level structures: 1. If the repeated field is not a group, then its type is the element type and elements are required. 2. If the repeated field is a group with multiple fields, then its type is the element type and elements are required. -3. If the repeated field is a group with a `repeated` field, then the repeated - field is the element type because the type cannot be a 3-level list. +3. If the repeated field is a group with one field and the repetition of that + field is `repeated`, then its type is the element type and elements are + required. 4. If the repeated field is a group with one field and is named either `array` or uses the `LIST`-annotated group's name with `_tuple` appended then the repeated type is the element type and elements are required. From 04b6bb09d9120410f2e2199faa25e633dcb23e3a Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Sat, 23 Nov 2024 14:49:25 +0800 Subject: [PATCH 14/19] refine rule 3 --- LogicalTypes.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 42cbc935..41080dd0 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -724,9 +724,8 @@ as 3-level structures: elements are required. 2. If the repeated field is a group with multiple fields, then its type is the element type and elements are required. -3. If the repeated field is a group with one field and the repetition of that - field is `repeated`, then its type is the element type and elements are - required. +3. If the repeated field is a group with one field with `repeated` repetition, + then its type is the element type and elements are required. 4. If the repeated field is a group with one field and is named either `array` or uses the `LIST`-annotated group's name with `_tuple` appended then the repeated type is the element type and elements are required. From a640115b153362d099aa5eb376769a25c61b09ff Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Sun, 24 Nov 2024 22:33:37 +0800 Subject: [PATCH 15/19] Update LogicalTypes.md Co-authored-by: emkornfield --- LogicalTypes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 41080dd0..a521d3f0 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -681,7 +681,7 @@ optional group array_of_arrays (LIST) { #### Backward-compatibility rules -Modern writers should always produce the 3-level LIST structure shown above. +New writer implementations should always produce the 3-level LIST structure shown above. However, historically data files have been produced that use different structures to represent list-like data, and readers may include compatibility measures to interpret them as intended. From 8e6d781054ccac4dfba35d8b69d75e97bc2288f5 Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Sun, 24 Nov 2024 22:35:41 +0800 Subject: [PATCH 16/19] wrap lines --- LogicalTypes.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index a521d3f0..8a3dd552 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -681,10 +681,10 @@ optional group array_of_arrays (LIST) { #### Backward-compatibility rules -New writer implementations should always produce the 3-level LIST structure shown above. -However, historically data files have been produced that use different structures -to represent list-like data, and readers may include compatibility measures to -interpret them as intended. +New writer implementations should always produce the 3-level LIST structure shown +above. However, historically data files have been produced that use different +structures to represent list-like data, and readers may include compatibility +measures to interpret them as intended. ##### 3-level structure with different field names From 535856beb6d0fc208ad7c3df7ea759ac7e5821bc Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Wed, 27 Nov 2024 14:01:41 +0800 Subject: [PATCH 17/19] remove non-recommended examples --- LogicalTypes.md | 24 +----------------------- 1 file changed, 1 insertion(+), 23 deletions(-) diff --git a/LogicalTypes.md b/LogicalTypes.md index 8a3dd552..bcdacd2c 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -609,17 +609,6 @@ that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted as a required list of required elements where the element type is the type of the field. -``` -// List (non-null list, non-null elements) -repeated int32 num; - -// List> (non-null list, non-null elements) -repeated group my_list { - required int32 num; - optional binary str (STRING); -} -``` - For all fields in the schema, implementations should use either `LIST` and `MAP` annotations _or_ unannotated repeated fields, but not both. When using the annotations, no unannotated repeated types are allowed. @@ -686,8 +675,6 @@ above. However, historically data files have been produced that use different structures to represent list-like data, and readers may include compatibility measures to interpret them as intended. -##### 3-level structure with different field names - It is required that the repeated group of elements is named `list` and that its element field is named `element`. However, these names may not be used in existing data and should not be enforced as errors when reading. For example, @@ -702,23 +689,14 @@ optional group my_list (LIST) { } ``` -##### 2-level structure - Some existing data does not include the inner element layer, resulting in a `LIST` that annotates a 2-level structure. Unlike the 3-level structure, the repetition of a 2-level structure can be `optional`, `required`, or `repeated`. When it is `repeated`, the `LIST`-annotated 2-level structure can only serve as an element within another `LIST`-annotated 2-level structure. -``` - group (LIST) { - repeated ; -} -``` - For backward-compatibility, the type of elements in `LIST`-annotated structures -should always be determined by the following rules if they cannot be determined -as 3-level structures: +should always be determined by the following rules: 1. If the repeated field is not a group, then its type is the element type and elements are required. From 47ed28e32325bce8b2828153f2b9e07f8eae41db Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Wed, 27 Nov 2024 17:59:09 +0800 Subject: [PATCH 18/19] add warning --- LogicalTypes.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/LogicalTypes.md b/LogicalTypes.md index bcdacd2c..21a92b29 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -609,6 +609,20 @@ that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated by `LIST` or `MAP` should be interpreted as a required list of required elements where the element type is the type of the field. +``` +WARNING: writers should not produce list types like these examples! They are +just for the purpose of reading existing data for backward-compatibility. + +// List (non-null list, non-null elements) +repeated int32 num; + +// List> (non-null list, non-null elements) +repeated group my_list { + required int32 num; + optional binary str (STRING); +} +``` + For all fields in the schema, implementations should use either `LIST` and `MAP` annotations _or_ unannotated repeated fields, but not both. When using the annotations, no unannotated repeated types are allowed. @@ -713,6 +727,9 @@ should always be determined by the following rules: Examples that can be interpreted using these rules: ``` +WARNING: writers should not produce list types like these examples! They are +just for the purpose of reading existing data for backward-compatibility. + // Rule 1: List (nullable list, non-null elements) optional group my_list (LIST) { repeated int32 element; From f825961ecf1fae3b532e2b343174185531fb1353 Mon Sep 17 00:00:00 2001 From: Gang Wu Date: Wed, 4 Dec 2024 18:00:40 +0800 Subject: [PATCH 19/19] add example for rule 5 --- LogicalTypes.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/LogicalTypes.md b/LogicalTypes.md index 21a92b29..72940152 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -763,6 +763,13 @@ optional group my_list (LIST) { required binary str (STRING); }; } + +// Rule 5: List (nullable list, nullable elements) +optional group my_list (LIST) { + repeated group element { + optional binary str (STRING); + }; +} ``` ### Maps