Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry with SpillableHostBuffer to GPU scans #12233

Open
wants to merge 2 commits into
base: branch-25.04
Choose a base branch
from

Conversation

firestarman
Copy link
Collaborator

@firestarman firestarman commented Feb 26, 2025

contribute to #8890

This PR adds the support of retry with SpillableHostBuffer into GPU scans when reading the data from files to host, to try to resolve some CPU OOMs in GPU scans, which are reported from some customer queries. And these host memory allocations run without the protection of the retry framework.

The OOM error is like as the below.

Caused by: com.nvidia.spark.rapids.jni.CpuRetryOOM: Could not complete allocation after 1000 retries
        at com.nvidia.spark.rapids.HostAlloc.alloc(HostAlloc.scala:244)
        at com.nvidia.spark.rapids.HostAlloc.allocate(HostAlloc.scala:250)
        at ai.rapids.cudf.HostMemoryBuffer.allocate(HostMemoryBuffer.java:138)
        at ai.rapids.cudf.HostMemoryBuffer.allocate(HostMemoryBuffer.java:149)
        at com.nvidia.spark.rapids.ParquetPartitionReaderBase.$anonfun$readPartFile$1(GpuParquetScan.scala:1938)
        at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:30)
        at com.nvidia.spark.rapids.ParquetPartitionReaderBase.readPartFile(GpuParquetScan.scala:1936)

This change can be covered by existing tests.

@binmahone
Copy link
Collaborator

related to #8890

@firestarman
Copy link
Collaborator Author

build

winningsix
winningsix previously approved these changes Feb 26, 2025
@@ -1134,7 +1139,10 @@ abstract class MultiFileCoalescingPartitionReaderBase(
logDebug(s"$getFileFormatShortName Coalescing reading estimates the initTotalSize:" +
s" $initTotalSize, and the true size: $finalBufferSize")
withResource(finalBuffer) { _ =>
finalBuffer.slice(0, finalBufferSize)
SpillableHostBuffer(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related to this PR, but it reminds me that generally speaking, SpillableHostBuffer on a sliced HostMemoryBuffer could be, IMHO, buggy in some cases @abellina @revans2 . Let's say originally I have a HostMemoryBuffer x, then another y sliced from x. When y is spilled and read from disk again, y is no longer sharing underlying buffer with x anymore, right? If the code writer assumes any content written to x is visible to y, then SpillingFramework is causing a bug? Should we make sliced HostMemoryBuffer non-spillable at all?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But marking sliced HostMemoryBuffer non-spillable at all has side effects, too. This will essentially make the original HostMemoryBuffer x also unspillable, because it always have another reference coming from the sliced HostMemoryBuffer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, SpillableHostBuffer on a sliced HostMemoryBuffer would mistakenly increase totalSize of SpillableHostStore:

def trackNoSpill(handle: HostSpillableHandle[_]): Unit = {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we are slicing only to "right size" the buffer. The alternative is to memcpy it out to another right-sized buffer or to create a spillable buffer that knows about the slice, or track the "right size" alongside it for later slicing.

In terms of tracking the memory, we'll be undercounting it because we have 0->finalBufferSize, not the whole length. We will let it go and spill it because we hold onto the only reference in the spill framework. When we bring it back after spill it will be of size finalBufferSize, not the original, but I think this is OK because we didn't mean to keep the original.

So I think the main issue is undercounting.

Copy link
Collaborator Author

@firestarman firestarman Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the discussion, so should I make any update to fix this "undercounting"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abellina For this case I think it's okay to keep the code as it is, because we can assume finalBufferSize is close to totalBufferSize. But I brought up this question to discuss the general cases ?

Copy link
Collaborator Author

@firestarman firestarman Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I fully get what is going on here. i will resolve this by "tracking the 'right size' alongside it for later slicing".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

binmahone
binmahone previously approved these changes Feb 27, 2025
Signed-off-by: Firestarman <[email protected]>
@firestarman
Copy link
Collaborator Author

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants