Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-465: Clarify backward-compatibility rules on LIST type #466

Merged
merged 19 commits into from
Dec 8, 2024
65 changes: 53 additions & 12 deletions LogicalTypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -609,9 +609,20 @@ that is neither contained by a `LIST`- or `MAP`-annotated group nor annotated
by `LIST` or `MAP` should be interpreted as a required list of required
elements where the element type is the type of the field.

Implementations should use either `LIST` and `MAP` annotations _or_ unannotated
repeated fields, but not both. When using the annotations, no unannotated
repeated types are allowed.
```
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
// List<Integer> (non-null list, non-null elements)
repeated int32 num;

// List<Tuple<Integer, String>> (non-null list, non-null elements)
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
repeated group my_list {
required int32 num;
optional binary str (STRING);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example is counter-productive. We don't want anyone using un-annotated lists and maps. While the paragraph above explains how to interpret un-annotated repeated fields, I don't want anyone to see an example here and think that it is something that should be copied. I think it is already clear enough and I would simply move on rather than drawing attention to this as a possibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense. Let me remove these examples first. I think a followup is to deprecate it by moving it to the backward compatibility section and adding strong words to discourage writers to emit it.

```

For all fields in the schema, implementations should use either `LIST` and
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
`MAP` annotations _or_ unannotated repeated fields, but not both. When using
the annotations, no unannotated repeated types are allowed.

### Lists

Expand Down Expand Up @@ -670,6 +681,13 @@ optional group array_of_arrays (LIST) {

#### Backward-compatibility rules
wgtmac marked this conversation as resolved.
Show resolved Hide resolved

wgtmac marked this conversation as resolved.
Show resolved Hide resolved
Modern writers should always produce the 3-level LIST structure shown above.
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
However, historically data files have been produced that use different structures
to represent list-like data, and readers may include compatibility measures to
interpret them as intended.

##### 3-level structure with different field names

It is required that the repeated group of elements is named `list` and that
its element field is named `element`. However, these names may not be used in
existing data and should not be enforced as errors when reading. For example,
Expand All @@ -684,44 +702,67 @@ optional group my_list (LIST) {
}
```

Some existing data does not include the inner element layer. For
backward-compatibility, the type of elements in `LIST`-annotated structures
should always be determined by the following rules:
##### 2-level structure

Some existing data does not include the inner element layer, resulting in a
`LIST` that annotates a 2-level structure. Unlike the 3-level structure, the
repetition of a 2-level structure can be `optional`, `required`, or `repeated`.
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
When it is `repeated`, the `LIST`-annotated 2-level structure can only serve as
an element within another `LIST`-annotated 2-level structure.

```
<list-repetition> group <name> (LIST) {
repeated <element-type> <element-name>;
}
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
```

For backward-compatibility, the type of elements in `LIST`-annotated structures
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
should always be determined by the following rules if they cannot be determined
as 3-level structures:

1. If the repeated field is not a group, then its type is the element type and
elements are required.
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
2. If the repeated field is a group with multiple fields, then its type is the
element type and elements are required.
3. If the repeated field is a group with one field and is named either `array`
3. If the repeated field is a group with one field with `repeated` repetition,
then its type is the element type and elements are required.
4. If the repeated field is a group with one field and is named either `array`
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
or uses the `LIST`-annotated group's name with `_tuple` appended then the
repeated type is the element type and elements are required.
4. Otherwise, the repeated field's type is the element type with the repeated
5. Otherwise, the repeated field's type is the element type with the repeated
pitrou marked this conversation as resolved.
Show resolved Hide resolved
field's repetition.

Examples that can be interpreted using these rules:

```
// List<Integer> (nullable list, non-null elements)
// Rule 1: List<Integer> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated int32 element;
}

// List<Tuple<String, Integer>> (nullable list, non-null elements)
// Rule 2: List<Tuple<String, Integer>> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated group element {
required binary str (STRING);
required int32 num;
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
};
}

// List<OneTuple<String>> (nullable list, non-null elements)
// Rule 3: List<List<Integer>> (nullable outer list, non-null elements)
optional group my_list (LIST) {
repeated group array (LIST) {
repeated int32 array;
mapleFU marked this conversation as resolved.
Show resolved Hide resolved
};
}

// Rule 4: List<OneTuple<String>> (nullable list, non-null elements)
optional group my_list (LIST) {
repeated group array {
required binary str (STRING);
};
}

// List<OneTuple<String>> (nullable list, non-null elements)
// Rule 4: List<OneTuple<String>> (nullable list, non-null elements)
wgtmac marked this conversation as resolved.
Show resolved Hide resolved
optional group my_list (LIST) {
repeated group my_list_tuple {
required binary str (STRING);
pitrou marked this conversation as resolved.
Show resolved Hide resolved
Expand Down