-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No module named 'cudf' while running spark-rapids with AWS EMR-7.3 #11668
Comments
rapids-4-spark jar only provides the minimum Java binding to be able to run SparkSQL/DataFrame API queries. For Pandas-like Python |
@gerashegalov Thanks for the guidance. I also want to ask that I want to control the number of concurrent gpu tasks which are created by udf method ( i am using predict_batch_udf). I have tried spark.rapids.sql.concurrentGpuTasks but it doesnt control the number of concurrent task in GPU. Currently, the number of tasks in gpu equals to the 1/spark.task.resource.gpu.amount . Can you please help me with that ? |
You can edit and add a version of this init script to run after the spark_rapids one to install the cudf python library: https://github.com/NVIDIA/spark-rapids-ml/blob/branch-24.10/notebooks/aws-emr/init-bootstrap-action.sh The number of concurrent predict_batch_udf tasks is determined by the resource per task and resource per executor settings, as you say. Are you hoping to have different task concurrency per stage? The |
Thanks for such a great work and awesome library.
I am using spark-rapids with EMR-7.3 for the deep learning model inference with predict_batch_udf.
I have been following the provided documentation for AWS-EMR. And for enabling GPU-scheduling with pandas_udf, as described in the link. I am providing --py-files ${SPARK_RAPIDS_PLUGIN_JAR} in the spark-submit command, and also have added in the config.json file
"spark.rapids.sql.python.gpu.enabled": "true"
to enable gpu-scheduling for the pandas-udf.The instances I am using are m5.4xlarge ( master ), and g4dn.12xlarge ( core ).
However, this task fails giving the error for no cudf module found.
-- spark-submit-command --
spark-submit --deploy-mode client --py-files /usr/lib/spark/jars/rapids-4-spark_2.12-24.06.1-amzn-0.jar s3://<my-bucket>/rapids-code.py
Following lines are from the logged error of emr.
The text was updated successfully, but these errors were encountered: