Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37072: [C++] MakeArrayOfNulls should respect Field::nullable #38252

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

bkietz
Copy link
Member

@bkietz bkietz commented Oct 12, 2023

Rationale for this change

MakeArrayOfNulls didn't examine Field::nullable, so it could produce nested arrays whose children were null even if the schema said they couldn't have nulls. Additionally validation didn't look at Field::nullable so these malformed arrays don't fail tests.

Are these changes tested?

Yes, mostly by fixtures already in place.

Behavior

This PR includes breaking changes to public APIs.

The Field::nullable flag is now enforce in array validation, so now (for example) an array with nulls which corresponds to a non nullable field will fail validation.

@bkietz bkietz requested a review from felipecrv October 12, 2023 20:58
@felipecrv
Copy link
Contributor

Many tests are failing now.

@bkietz bkietz force-pushed the 37072-make-array-of-nulls branch from 4152f7a to 807b476 Compare October 13, 2023 17:35
@bkietz bkietz requested a review from pitrou October 16, 2023 15:12
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Oct 16, 2023
Comment on lines +544 to +545
"Invalid: Values array invalid: Invalid: Expected 2 buffers in array "
"of type int32, got 3"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a bit weird to have "Invalid: " twice in the message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but this is the format of other child array validation failures (since the parent array's error message contains the child array's). Possibly a follow up issue is in order to simplify those; I think it'd be most valuable to provide a field path to the child which failed validation and the "leaf" error message.

class NullArrayFactory {
public:
struct GetBufferLength {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really delete it, just refactored its logic into NullArrayFactory. This way the code which requests preallocated zero buffer is in the same function as the code which uses the zero buffer, which seemed more clear to me.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, questions and suggestions below.

The PR description should note that this is a behavior change.

cpp/src/arrow/array/util.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/util.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/util.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/util.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/util.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/array_test.cc Show resolved Hide resolved
cpp/src/arrow/array/array_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/validate.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/validate.cc Outdated Show resolved Hide resolved
cpp/src/arrow/extension_type.h Outdated Show resolved Hide resolved
@pitrou pitrou added the Breaking Change Includes a breaking change to the API label Oct 18, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 18, 2023
@bkietz bkietz force-pushed the 37072-make-array-of-nulls branch from 2b2d779 to d73aa0b Compare November 16, 2023 14:28
@felipecrv
Copy link
Contributor

Getting merged and me rebasing the LIST_VIEW PR would be a good idea.

@bkietz bkietz force-pushed the 37072-make-array-of-nulls branch from d73aa0b to b34ace5 Compare November 27, 2023 17:46
@bkietz bkietz requested a review from mapleFU November 28, 2023 17:51
cpp/src/arrow/array/validate.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/validate.cc Show resolved Hide resolved
@@ -43,7 +43,7 @@ class ARROW_EXPORT ExtensionType : public DataType {
static constexpr const char* type_name() { return "extension"; }

/// \brief The type of array used to represent this extension type's data
const std::shared_ptr<DataType>& storage_type() const { return storage_type_; }
std::shared_ptr<DataType> storage_type() const override { return storage_type_; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not also override storage_type_ref?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to remove storage_type_ref altogether

cpp/src/arrow/type.h Show resolved Hide resolved
cpp/src/arrow/array/util.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/util.cc Outdated Show resolved Hide resolved
cpp/src/arrow/array/util.cc Show resolved Hide resolved
auto req = [](auto type) { return field("", std::move(type), /*nullable=*/false); };

// union with no nullable fields cannot represent a null
ASSERT_RAISES(Invalid, MakeArrayOfNull(dense_union({req(int8())}), length));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be successful if length is 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type cannot support any number of nulls, I would say that the special case of zero length is not worth allowing

cpp/src/arrow/array/array_test.cc Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Dec 12, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting change review Awaiting change review Breaking Change Includes a breaking change to the API Component: C++
Projects
None yet
Development

Successfully merging this pull request may close these issues.

null structarrays are poorly handled by cast
3 participants