-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allows concatenation to ignore validation and adds method on table to… #63
Conversation
mask = np.zeros(num_rows, dtype=bool) | ||
for name, validator in self._column_validators.items(): | ||
indices, _ = validator.failures(self.table.column(name)) | ||
mask[indices.to_numpy()] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy just makes things so easy sometimes. I was curious to see how you were going to do this with pyarrow compute functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented it both ways and this appeared the more efficient (both compute and memory) of the two.
@@ -1116,7 +1140,7 @@ def _encode_attr_dict(cls, attrs: dict[str, Any]) -> dict[bytes, bytes]: | |||
result[k.encode("utf8")] = descriptor.to_bytes(pytyped) | |||
return result | |||
|
|||
def apply_mask(self, mask: pa.BooleanArray | np.ndarray[bool, Any] | list[bool]) -> Self: | |||
def apply_mask(self, mask: pa.BooleanArray | npt.NDArray[np.bool_] | list[bool]) -> Self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, there are already tests for these three types:
Lines 1051 to 1080 in f7c92ae
def test_apply_mask_numpy(): | |
values = Pair.from_kwargs(x=[1, 2, 3], y=[4, 5, 6]) | |
mask = np.array([True, False, True]) | |
have = values.apply_mask(mask) | |
np.testing.assert_array_equal(have.x, [1, 3]) | |
def test_apply_mask_pylist(): | |
values = Pair.from_kwargs(x=[1, 2, 3], y=[4, 5, 6]) | |
mask = [True, False, True] | |
have = values.apply_mask(mask) | |
np.testing.assert_array_equal(have.x, [1, 3]) | |
def test_apply_mask_pyarrow(): | |
values = Pair.from_kwargs(x=[1, 2, 3], y=[4, 5, 6]) | |
mask = pa.array([True, False, True], pa.bool_()) | |
have = values.apply_mask(mask) | |
np.testing.assert_array_equal(have.x, [1, 3]) | |
def test_apply_mask_wrong_size(): | |
values = Pair.from_kwargs(x=[1, 2, 3], y=[4, 5, 6]) | |
mask = [True, False] | |
with pytest.raises(ValueError): | |
values.apply_mask(mask) |
… separate valid from invalid rows