Skip to content

Commit

Permalink
Merge pull request #17 from bigo-sg/allow_map_key_optional
Browse files Browse the repository at this point in the history
Allow Parquet map key to be optional

(cherry picked from commit 0d6d07f)
  • Loading branch information
Avogar authored and nikitamikhaylov committed Oct 15, 2024
1 parent 90f01a1 commit 8b74b9c
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions cpp/src/parquet/arrow/schema.cc
Original file line number Diff line number Diff line change
Expand Up @@ -564,10 +564,23 @@ Status MapToSchemaField(const GroupNode& group, LevelInfo current_levels,
return Status::Invalid("Key-value map node must have 1 or 2 child elements. Found: ",
key_value.field_count());
}

/*
* If Parquet file was written by Flink, key type of map column is allowed to be optional, like this:
* optional group event_info (MAP) {
* repeated group key_value {
* optional binary key (UTF8);
* optional binary value (UTF8);
* }
* }
*
* Refer to: https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/types/#constructured-data-types
const Node& key_node = *key_value.field(0);
if (!key_node.is_required()) {
return Status::Invalid("Map keys must be annotated as required.");
}
*/

// Arrow doesn't support 1 column maps (i.e. Sets). The options are to either
// make the values column nullable, or process the map as a list. We choose the latter
// as it is simpler.
Expand Down

0 comments on commit 8b74b9c

Please sign in to comment.