You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We met the OOM error as below in some customer queries, which led to the Spark task retries.
1) Caused by: com.nvidia.spark.rapids.jni.GpuSplitAndRetryOOM: GPU OutOfMemory: could not split inputs and retry
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$NoInputSpliterator.split(RmmRapidsRetryIterator.scala:386)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:588)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:517)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.drainSingleWithVerification(RmmRapidsRetryIterator.scala:291)
at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRetryNoSplit(RmmRapidsRetryIterator.scala:185)
at com.nvidia.spark.rapids.cudf_utils.HostConcatResultUtil$.getColumnarBatch(HostConcatResultUtil.scala:54)
at com.nvidia.spark.rapids.GpuShuffleCoalesceIterator.$anonfun$next$4(GpuShuffleCoalesceExec.scala:229)
The process of moving to coalesced buffer to GPU currently support only retry with no split, however we can try to implement the split-and-retry to improve its stability.
The text was updated successfully, but these errors were encountered:
We met the OOM error as below in some customer queries, which led to the Spark task retries.
The process of moving to coalesced buffer to GPU currently support only retry with no split, however we can try to implement the split-and-retry to improve its stability.
The text was updated successfully, but these errors were encountered: