GH-42112: [Python] Array gracefully fails on non-cpu device #42113

danepitkin · 2024-06-11T22:55:06Z

Rationale for this change

Common Array APIs should not segfault or abort on non-cpu devices.

What changes are included in this PR?

device_type and is_cpu methods added to the Array class
Any function that segfaults, aborts, or gives incorrect results on non-cpu devices now raises an exception

Are these changes tested?

Unit tests added

Are there any user-facing changes?

device_type and is_cpu methods added to the Array class
GitHub Issue: [Python] Array fails gracefully on non-cpu devices #42112

github-actions · 2024-06-11T22:55:32Z

⚠️ GitHub issue #42112 has been automatically assigned in GitHub to PR creator.

danepitkin · 2024-06-11T22:56:37Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-06-11T22:57:15Z

Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/9473856632

danepitkin · 2024-06-12T01:33:30Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-06-12T01:33:37Z

Revision: 534f33e

Submitted crossbow builds: ursacomputing/crossbow @ actions-ac535722c7

Task	Status
test-cuda-python

github-actions · 2024-06-12T01:35:53Z

Revision: 9f3efef

Submitted crossbow builds: ursacomputing/crossbow @ actions-8413b104ec

Task	Status
test-cuda-python

jorisvandenbossche

Thanks Dane, looking good!

jorisvandenbossche · 2024-06-12T07:09:16Z

python/pyarrow/array.pxi

+        self._assert_cpu()
+        cdef int64_t total_buffer_size
        total_buffer_size = TotalBufferSize(deref(self.ap))


My first thought was that getting the buffer size should be possible without looking at the actual data. But so it seems that to avoid counting identical buffers twice (if they are reused in a single array, which is possible), it uses the buffer's address to distinguish buffers. Currently that uses buffer->data() in DoTotalBufferSize. data() will return null for non-cpu data, but since it doesn't actually use that address, I think this could also use buffer->address() which will return the address always even for non-cpu data.

(but that could be fine for a follow-up as well)

That checks out! I tested it locally and witnessed a segfault, but hadn't dug into the reasoning.

jorisvandenbossche · 2024-06-12T07:10:23Z

python/pyarrow/array.pxi

        for i in range(len(self)):
            yield self.getitem(i)

    def __repr__(self):
+        self._assert_cpu()


This one should not be needed because this ends up calling to_string which already has it, I think

Good point!

jorisvandenbossche · 2024-06-12T07:10:41Z

python/pyarrow/array.pxi

@@ -1307,6 +1319,8 @@ cdef class Array(_PandasConvertible):
        -------
        bool
        """
+        self._assert_cpu()
+        self.other._assert_cpu()


Suggested change

self.other._assert_cpu()

other._assert_cpu()

LOL, thank you. Nice catch.

jorisvandenbossche · 2024-06-12T07:14:56Z

python/pyarrow/array.pxi

@@ -1404,8 +1424,9 @@ cdef class Array(_PandasConvertible):
        -------
        sliced : RecordBatch
        """
-        cdef:
-            shared_ptr[CArray] result
+        self._assert_cpu()


I think that slicing actually works?

You are right. I tried printing the resulting RecordBatch, which segfaults. But .slice() itself does not segfault.

jorisvandenbossche · 2024-06-12T07:19:05Z

python/pyarrow/array.pxi

+        return self.device_type == DeviceAllocationType.CPU
+
+    def _assert_cpu(self):
+        if not self.is_cpu:


If we would want to speed this up a bit (it's called in many places, although typically should only give tiny overhead), you could also inline similar code as you used above for the python attributes, like if self.sp_array.get().device_type() != CDeviceAllocationType_kCPU: .., and make it a cdef instead of def (not entirely sure this is worth it)

I think its worth doing, good suggestion

jorisvandenbossche · 2024-06-12T07:21:36Z

python/pyarrow/tests/test_array.py

+
+    # Supported
+    arr.validate()
+    arr.validate(full=True)


I am wondering if this will actually be supported for more complex data types. For example for variable size binary, it will check the actual offsets if they are correct numbers (eg not negative, not out of bounds, increasing, etc)

Great catch, let me experiment and find out.

You are right! There is an abort with validate(full=True) when using variable binary type.

Check failed: is_cpu() not a CPU buffer (device: CudaDevice(device_number=0, name="NVIDIA RTX A5000")) Aborted

python/pyarrow/tests/test_array.py

jorisvandenbossche · 2024-06-12T07:22:48Z

python/pyarrow/tests/test_array.py

+    with pytest.raises(NotImplementedError):
+        arr.fill_null(0)
+    with pytest.raises(NotImplementedError):
+        arr.__getitem__(0)


Suggested change

arr.__getitem__(0)

arr[0]

jorisvandenbossche · 2024-06-12T07:25:33Z

python/pyarrow/tests/test_array.py

+    with pytest.raises(NotImplementedError):
+        arr.getitem(0)


This one can be removed (getitem is a cdef function which is not callable from python, so this will error for that reason), this is tested through testing __getitem__

jorisvandenbossche · 2024-06-12T07:26:30Z

python/pyarrow/tests/test_array.py

+        arr.__dlpack__()
+    with pytest.raises(NotImplementedError):
+        arr.__dlpack_device__()


Maybe add a TODO comment here that this should be supported in the future

danepitkin · 2024-06-24T21:53:34Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-06-24T21:55:48Z

Revision: ace94ec

Submitted crossbow builds: ursacomputing/crossbow @ actions-570bed1718

Task	Status
test-cuda-python

danepitkin · 2024-06-24T23:05:22Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-06-24T23:07:34Z

Revision: abe942e

Submitted crossbow builds: ursacomputing/crossbow @ actions-26aaa9352b

Task	Status
test-cuda-python

danepitkin · 2024-06-24T23:25:32Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-06-24T23:27:50Z

Revision: d5a547f

Submitted crossbow builds: ursacomputing/crossbow @ actions-b8e234a0f3

Task	Status
test-cuda-python

jorisvandenbossche

Updates look good! There is still some test failure

python/pyarrow/tests/test_array.py

Co-authored-by: Joris Van den Bossche <[email protected]>

danepitkin · 2024-06-25T20:28:54Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-06-25T20:31:10Z

Revision: 1c903e2

Submitted crossbow builds: ursacomputing/crossbow @ actions-2a8fac0c0e

Task	Status
test-cuda-python

danepitkin · 2024-06-25T21:40:50Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-06-25T21:43:05Z

Revision: c12765a

Submitted crossbow builds: ursacomputing/crossbow @ actions-599aea54a2

Task	Status
test-cuda-python

danepitkin · 2024-06-25T21:55:51Z

@github-actions crossbow submit test-cuda-python

github-actions · 2024-06-25T21:58:04Z

Revision: 9740a90

Submitted crossbow builds: ursacomputing/crossbow @ actions-6cbf11bdde

Task	Status
test-cuda-python

jorisvandenbossche

Looks good, thanks!

conbench-apache-arrow · 2024-06-26T16:20:34Z

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit a42ec1d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them.

…ache#42113) ### Rationale for this change Common `Array` APIs should not segfault or abort on non-cpu devices. ### What changes are included in this PR? * `device_type` and `is_cpu` methods added to the `Array` class * Any function that segfaults, aborts, or gives incorrect results on non-cpu devices now raises an exception ### Are these changes tested? * Unit tests added ### Are there any user-facing changes? * `device_type` and `is_cpu` methods added to the `Array` class * GitHub Issue: apache#42112 Lead-authored-by: Dane Pitkin <[email protected]> Co-authored-by: Dane Pitkin <[email protected]> Co-authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>

danepitkin requested a review from jorisvandenbossche June 11, 2024 22:55

github-actions bot added Component: Python awaiting review Awaiting review labels Jun 11, 2024

danepitkin mentioned this pull request Jun 11, 2024

MINOR: [Dev] Remove Dane from collaborators list #41589

Merged

jorisvandenbossche reviewed Jun 12, 2024

View reviewed changes

github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jun 12, 2024

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 24, 2024

danepitkin added 6 commits June 24, 2024 16:24

[Python] Array gracefully fails on non-cpu device

2fa2f74

Lint

e6d8181

Remove from staticmethod

b5ecd31

Cleanup

7e21746

Fix cdef function

55640d1

Fix test case

d5a547f

danepitkin force-pushed the danepitkin/python-array-is-cpu branch from ff03596 to d5a547f Compare June 24, 2024 23:25

jorisvandenbossche reviewed Jun 25, 2024

View reviewed changes

python/pyarrow/tests/test_array.py Outdated Show resolved Hide resolved

github-actions bot removed the awaiting change review Awaiting change review label Jun 25, 2024

github-actions bot added the awaiting changes Awaiting changes label Jun 25, 2024

Update python/pyarrow/tests/test_array.py

1c903e2

Co-authored-by: Joris Van den Bossche <[email protected]>

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 25, 2024

Fix test case

c12765a

Check for CPU if validate(full=True)

9740a90

jorisvandenbossche approved these changes Jun 26, 2024

View reviewed changes

jorisvandenbossche merged commit a42ec1d into apache:main Jun 26, 2024
12 of 14 checks passed

jorisvandenbossche removed the awaiting change review Awaiting change review label Jun 26, 2024

jorisvandenbossche mentioned this pull request Jun 26, 2024

[Python] Array fails gracefully on non-cpu devices #42112

Closed

github-actions bot added the awaiting merge Awaiting merge label Jun 26, 2024

jorisvandenbossche mentioned this pull request Aug 1, 2024

[Python] Current assertion of CPU-accessible data in Array methods is specific to CPU device type #43511

Open

felipecrv mentioned this pull request Aug 2, 2024

[C++] Compute functions should fail gracefully when given non-CPU resident Arrays #43541

Open

GH-42112: [Python] Array gracefully fails on non-cpu device #42113

GH-42112: [Python] Array gracefully fails on non-cpu device #42113

Conversation

danepitkin commented Jun 11, 2024 • edited by github-actions bot Loading

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

github-actions bot commented Jun 11, 2024

danepitkin commented Jun 11, 2024

github-actions bot commented Jun 11, 2024

danepitkin commented Jun 12, 2024

github-actions bot commented Jun 12, 2024

github-actions bot commented Jun 12, 2024

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danepitkin commented Jun 24, 2024

github-actions bot commented Jun 24, 2024

danepitkin commented Jun 24, 2024

github-actions bot commented Jun 24, 2024

danepitkin commented Jun 24, 2024

github-actions bot commented Jun 24, 2024

jorisvandenbossche left a comment

Choose a reason for hiding this comment

danepitkin commented Jun 25, 2024

github-actions bot commented Jun 25, 2024

danepitkin commented Jun 25, 2024

github-actions bot commented Jun 25, 2024

danepitkin commented Jun 25, 2024

github-actions bot commented Jun 25, 2024

jorisvandenbossche left a comment

Choose a reason for hiding this comment

conbench-apache-arrow bot commented Jun 26, 2024

danepitkin commented Jun 11, 2024 •

edited by github-actions bot

Loading