-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: Enhance map handling to support NULL map values #18531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes handling of NULL map values (entire maps being NULL, not null keys/values within maps) in DataFusion's map functions. The changes address an issue where NULL map values caused incorrect "map key cannot be null" errors.
Key changes:
- Refactored
make_map_array_internalto properly track and preserve NULL map entries using nulls bitmap - Updated validation logic to distinguish between NULL maps and null keys within maps
- Added comprehensive test coverage for NULL map handling in both memory and Parquet storage
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| datafusion/sqllogictest/test_files/map.slt | Removed NULL values from duplicate key test; added extensive tests for NULL map handling including memory tables, map operations, and Parquet storage |
| datafusion/functions-nested/src/map.rs | Refactored map validation and array construction to handle NULL maps correctly by tracking nulls bitmap, building proper offsets, and handling empty array edge cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c0fa584 to
c735a53
Compare
d69074c to
2f715b0
Compare
2f715b0 to
87d6f93
Compare
Which issue does this PR close?
Closes #18530
Rationale for this change
The
make_mapfunction has an overly strict null check that cannot distinguish between:The premature null check at line 66 (
if keys.null_count() > 0) rejects ANY null in the keys array, even when it represents a valid NULL map value. This causes failures when directly callingmake_map_batchwith Arrow arrays containing NULL list elements.What changes are included in this PR?
1. Fixed
make_map_batchfunctioncan_evaluate_to_const && keys.null_count() > 0), routes tomake_map_array_internalwhich handles them correctlyvalidate_map_keys2. Enhanced
make_map_array_internalfunctionlist_to_arrays()transformationflattened_keys.null_count() > 0after concatenation to catch null keys within mapsAre these changes tested?
Yes, comprehensive tests are included:
Unit tests (map.rs):
test_make_map_with_null_maps(): Directly tests NULL map handling at the function leveltest_make_map_with_null_key_within_map_should_fail(): Verifies null keys are still rejectedExisting tests: All existing map-related tests pass, confirming no regression.
Are there any user-facing changes?
No