Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-42112: [Python] Array gracefully fails on non-cpu device #42113

Merged

Conversation

danepitkin
Copy link
Member

@danepitkin danepitkin commented Jun 11, 2024

Rationale for this change

Common Array APIs should not segfault or abort on non-cpu devices.

What changes are included in this PR?

  • device_type and is_cpu methods added to the Array class
  • Any function that segfaults, aborts, or gives incorrect results on non-cpu devices now raises an exception

Are these changes tested?

  • Unit tests added

Are there any user-facing changes?

Copy link

⚠️ GitHub issue #42112 has been automatically assigned in GitHub to PR creator.

@danepitkin
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

Copy link

Only contributors can submit requests to this bot. Please ask someone from the community for help with getting the first commit in.
The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/9473856632

@danepitkin
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

Copy link

Revision: 534f33e

Submitted crossbow builds: ursacomputing/crossbow @ actions-ac535722c7

Task Status
test-cuda-python GitHub Actions

Copy link

Revision: 9f3efef

Submitted crossbow builds: ursacomputing/crossbow @ actions-8413b104ec

Task Status
test-cuda-python GitHub Actions

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dane, looking good!

Comment on lines +1221 to 1223
self._assert_cpu()
cdef int64_t total_buffer_size
total_buffer_size = TotalBufferSize(deref(self.ap))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought was that getting the buffer size should be possible without looking at the actual data. But so it seems that to avoid counting identical buffers twice (if they are reused in a single array, which is possible), it uses the buffer's address to distinguish buffers. Currently that uses buffer->data() in DoTotalBufferSize. data() will return null for non-cpu data, but since it doesn't actually use that address, I think this could also use buffer->address() which will return the address always even for non-cpu data.

(but that could be fine for a follow-up as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That checks out! I tested it locally and witnessed a segfault, but hadn't dug into the reasoning.

for i in range(len(self)):
yield self.getitem(i)

def __repr__(self):
self._assert_cpu()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should not be needed because this ends up calling to_string which already has it, I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

@@ -1307,6 +1319,8 @@ cdef class Array(_PandasConvertible):
-------
bool
"""
self._assert_cpu()
self.other._assert_cpu()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.other._assert_cpu()
other._assert_cpu()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, thank you. Nice catch.

@@ -1404,8 +1424,9 @@ cdef class Array(_PandasConvertible):
-------
sliced : RecordBatch
"""
cdef:
shared_ptr[CArray] result
self._assert_cpu()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that slicing actually works?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I tried printing the resulting RecordBatch, which segfaults. But .slice() itself does not segfault.

return self.device_type == DeviceAllocationType.CPU

def _assert_cpu(self):
if not self.is_cpu:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would want to speed this up a bit (it's called in many places, although typically should only give tiny overhead), you could also inline similar code as you used above for the python attributes, like if self.sp_array.get().device_type() != CDeviceAllocationType_kCPU: .., and make it a cdef instead of def (not entirely sure this is worth it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its worth doing, good suggestion


# Supported
arr.validate()
arr.validate(full=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if this will actually be supported for more complex data types. For example for variable size binary, it will check the actual offsets if they are correct numbers (eg not negative, not out of bounds, increasing, etc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, let me experiment and find out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! There is an abort with validate(full=True) when using variable binary type.

Check failed: is_cpu() not a CPU buffer (device: CudaDevice(device_number=0, name="NVIDIA RTX A5000"))
Aborted

python/pyarrow/tests/test_array.py Show resolved Hide resolved
with pytest.raises(NotImplementedError):
arr.fill_null(0)
with pytest.raises(NotImplementedError):
arr.__getitem__(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
arr.__getitem__(0)
arr[0]

Comment on lines 4063 to 4064
with pytest.raises(NotImplementedError):
arr.getitem(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one can be removed (getitem is a cdef function which is not callable from python, so this will error for that reason), this is tested through testing __getitem__

Comment on lines 4084 to 4086
arr.__dlpack__()
with pytest.raises(NotImplementedError):
arr.__dlpack_device__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a TODO comment here that this should be supported in the future

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jun 12, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 24, 2024
@danepitkin
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

Copy link

Revision: ace94ec

Submitted crossbow builds: ursacomputing/crossbow @ actions-570bed1718

Task Status
test-cuda-python GitHub Actions

@danepitkin
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

Copy link

Revision: abe942e

Submitted crossbow builds: ursacomputing/crossbow @ actions-26aaa9352b

Task Status
test-cuda-python GitHub Actions

@danepitkin danepitkin force-pushed the danepitkin/python-array-is-cpu branch from ff03596 to d5a547f Compare June 24, 2024 23:25
@danepitkin
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

Copy link

Revision: d5a547f

Submitted crossbow builds: ursacomputing/crossbow @ actions-b8e234a0f3

Task Status
test-cuda-python GitHub Actions

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good! There is still some test failure

python/pyarrow/tests/test_array.py Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the awaiting change review Awaiting change review label Jun 25, 2024
@github-actions github-actions bot added the awaiting changes Awaiting changes label Jun 25, 2024
Co-authored-by: Joris Van den Bossche <[email protected]>
@danepitkin
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 25, 2024
Copy link

Revision: 1c903e2

Submitted crossbow builds: ursacomputing/crossbow @ actions-2a8fac0c0e

Task Status
test-cuda-python GitHub Actions

@danepitkin
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

Copy link

Revision: c12765a

Submitted crossbow builds: ursacomputing/crossbow @ actions-599aea54a2

Task Status
test-cuda-python GitHub Actions

@danepitkin
Copy link
Member Author

@github-actions crossbow submit test-cuda-python

Copy link

Revision: 9740a90

Submitted crossbow builds: ursacomputing/crossbow @ actions-6cbf11bdde

Task Status
test-cuda-python GitHub Actions

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@jorisvandenbossche jorisvandenbossche merged commit a42ec1d into apache:main Jun 26, 2024
12 of 14 checks passed
@jorisvandenbossche jorisvandenbossche removed the awaiting change review Awaiting change review label Jun 26, 2024
@github-actions github-actions bot added the awaiting merge Awaiting merge label Jun 26, 2024
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit a42ec1d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them.

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Jul 9, 2024
…ache#42113)

### Rationale for this change

Common `Array` APIs should not segfault or abort on non-cpu devices.

### What changes are included in this PR?

* `device_type` and `is_cpu` methods added to the `Array` class
* Any function that segfaults, aborts, or gives incorrect results on non-cpu devices now raises an exception 

### Are these changes tested?

* Unit tests added

### Are there any user-facing changes?

* `device_type` and `is_cpu` methods added to the `Array` class
* GitHub Issue: apache#42112

Lead-authored-by: Dane Pitkin <[email protected]>
Co-authored-by: Dane Pitkin <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants