-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnsatisfiedLinkError in KMeansDAL during training with OneCCL in OAP MLlib #399
Comments
@madhushreeb39250 You can re-pull from master, I merged the new code |
Hi @minmingzhu, Thank you for your response. I tried re-pull and build OAP Mllib. Build was successful but it fails while running kmeans algorithm. I found the following error in the spark worker log:
Could you please help me with this issue? |
@madhushreeb39250 Can you provide the stdout and stderr of the corresponding error worker? And can you also provide spark conf likes spark-env.sh and spark-defaults.conf. |
Thank you for the response @minmingzhu. Here are the log and conf files: |
Hi madhushreeb39250, I would like to know which version of Intel GPU you are using. |
Maybe you can try add this in spark-default.conf
And add this in spark-env.sh
I assume you use Intel GPUs, if you want to know how many GPUs on nodes, you just "source $ONEAPI_ROOT, sycl-ls" by the ways , The spark.worker.resourcesFile file format is as follows Also you can refer to spark doc(https://spark.apache.org/docs/latest/spark-standalone.html#resource-allocation-and-configuration-overview) |
Thank you for suggesting @minmingzhu. I am not trying to use the GPU here. There are no GPUs available to the machine. |
Hi @madhushreeb39250
add this in spark-default.conf
|
Hi @minmingzhu, I tried re-pulling the master and still face the same issues. I even tried with both Java 8 and 11. I tried with building with and without CPU-ONLY options. Also, I am using the recommended version of oneAPI. The same error persists. Request you to please suggest how to proceed from here. |
Hi, @madhushreeb39250 Did you add env in spark-env.sh and spark-default.conf? And maybe you can try this version OneAPI, You can wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/20f4e6a1-6b0b-4752-b8c1-e5eacba10e01/l_BaseKit_p_2024.0.0.49564_offline.sh && ./l_BaseKit_p_2024.0.0.49564_offline.sh. |
Hi @minmingzhu, Thank you for suggesting. The error in the worker log:
|
Hi @madhushreeb39250 I haven't met this problem. Could you provide the stdout and stderr of the spark worker? |
Hi @madhushreeb39250 I found Kmeans print the result from your spark master log. |
Hi @minmingzhu, Yes, I am able to get the centroids of kmeans clusters. I was just not sure if this is the right behaviour of the algorithm as I was also getting error along with the kmeans results. Here are the stderr and stdout files of the spark worker which is failing. |
I encountered an error while running KMeans clustering with OAP MLlib, specifically when using the KMeansDAL implementation. The application fails with an UnsatisfiedLinkError, which points to an issue with loading the native CCL library in OneCCL$.c_init.
Here’s the full error trace:
Caused by: java.lang.UnsatisfiedLinkError: com.intel.oap.mllib.OneCCL$.c_init(IILjava/lang/String;Lcom/intel/oap/mllib/CCLParam;)I
at com.intel.oap.mllib.OneCCL$.c_init(Native Method)
at com.intel.oap.mllib.OneCCL$.init(OneCCL.scala:32)
at com.intel.oap.mllib.clustering.KMeansDALImpl.$anonfun$train$4(KMeansDALImpl.scala:71)
at com.intel.oap.mllib.clustering.KMeansDALImpl.$anonfun$train$4$adapted(KMeansDALImpl.scala:70)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:907)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:907)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Environment:
OAP MLlib version: 1.6.0
Spark version: 3.3.3
OneAPI CCL version: 2021.8.0
Java version: OpenJDK 8
Additional Information:
The native libraries for CCL are installed under /opt/intel/oneapi/ccl/2021.8.0/lib/cpu/ and /opt/intel/oneapi/ccl/2021.8.0/lib/cpu_gpu_dpcpp/.
Java environment variables seem correctly configured.
Could you please assist in resolving this issue? Thank you!
The text was updated successfully, but these errors were encountered: