You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nightly cudf_udf test builds recently started failing with exceptions like the following:
INFO: Process 1469 found CUDA visible device(s): 0
Traceback (most recent call last):
File "/home/...../jars/rapids-4-spark_2.12-24.12.0-SNAPSHOT-cuda11.jar/rapids/daemon.py", line 131, in manag
er
File "/home/...../jars/rapids-4-spark_2.12-24.12.0-SNAPSHOT-cuda11.jar/rapids/worker.py", line 37, in initia
lize_gpu_mem
from cudf import rmm
File "/opt/conda/lib/python3.10/site-packages/cudf/__init__.py", line 19, in <module>
_setup_numba()
File "/opt/conda/lib/python3.10/site-packages/cudf/utils/_numba.py", line 121, in _setup_numba
shim_ptx_cuda_version = _get_cuda_build_version()
File "/opt/conda/lib/python3.10/site-packages/cudf/utils/_numba.py", line 16, in _get_cuda_build_version
from cudf._lib import strings_udf
File "/opt/conda/lib/python3.10/site-packages/cudf/_lib/__init__.py", line 4, in <module>
from . import (
File "avro.pyx", line 1, in init cudf._lib.avro
File "utils.pyx", line 1, in init cudf._lib.utils
File "column.pyx", line 1, in init cudf._lib.column
File "/opt/conda/lib/python3.10/site-packages/rmm/__init__.py", line 17, in <module>
from rmm import mr
File "/opt/conda/lib/python3.10/site-packages/rmm/mr.py", line 14, in <module>
from rmm.pylibrmm.memory_resource import (
File "/opt/conda/lib/python3.10/site-packages/rmm/pylibrmm/__init__.py", line 15, in <module>
from .device_buffer import DeviceBuffer
File "device_buffer.pyx", line 1, in init rmm.pylibrmm.device_buffer
AttributeError: module 'cuda.ccudart' has no attribute '__pyx_capi__'
INFO: Process 1503 found CUDA visible device(s): 0
24/11/05 14:10:10 ERROR Executor: Exception in task 2.0 in stage 1.0 (TID 8)
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:121)
at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:137)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:136)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:106)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:121)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:162)
at org.apache.spark.sql.rapids.execution.python.GpuArrowEvalPythonExec.$anonfun$internalDoExecuteColumnar$2(GpuArrowEvalPythonExec.scala:456)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
[.....]
The text was updated successfully, but these errors were encountered:
We need to update our test environment to pin the version of cuda-python.
The cudf-udf pipeline is designed to monitor nightly CUDF-py changes.
I recommend keeping it running against the latest nightly CUDF build unless we decide not to wait for the fix in this release, thanks
Nightly cudf_udf test builds recently started failing with exceptions like the following:
The text was updated successfully, but these errors were encountered: