-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-5074][VL] fix: UDF load error in yarn-cluster mode #5075
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
if (!canAccessSparkFiles) { | ||
throw new IllegalArgumentException( | ||
"On yarn-client mode, driver only accepts absolute paths, but got " + f) | ||
val uri = Utils.resolveURI(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like f
can also be a relative filename or file tag here without a scheme. Would this logic be able to handle such case?
e.g.
--files /path/to/gluten/cpp/build/velox/udf/examples/libmyudf.so
--conf spark.gluten.sql.columnar.backend.velox.udfLibraryPaths=libmyudf.so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes,if f is libmyudf.so
, Utils.resolveURI() will return file://${PWD}/libmyudf.so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But file://${PWD}/libmyudf.so
is not the expected path, right? It should be /path/to/gluten/cpp/build/velox/udf/examples/libmyudf.so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to Utils.resolveURI() is used to determine whether the file is local or remote.
In your example, libmyudf.so is a local file, and we will not use uri.path directly.
The --files argument copies this file to a different destination directory. When --master=yarn is specified, the file is copied to the working directory on all nodes (both the driver and executors). In local mode, the files are added using SparkContext.addFile, and they can then be accessed using SparkFiles.get(f).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your example, libmyudf.so is a local file
That doesn't make sense to me. libmyudf.so
here should refer to the file uploaded by --files
. But here it's resolved as a relative path to the runtime directory.
The call to Utils.resolveURI() is used to determine whether the file is local or remote.
It would be better to check whether it's a relative path first. Although it's resolved as a local file and pass the if
condition, the path doesn't even exist.
When --master=yarn is specified, the file is copied to the working directory on all nodes (both the driver and executors).
Based on my previous experience, for yarn client mode, the files will be copied to all executor container + AM container, so they won't be copied to the driver node. In this case, if we only use the two configurations below
--files /path/to/gluten/cpp/build/velox/udf/examples/libmyudf.so
--conf spark.gluten.sql.columnar.backend.velox.udfLibraryPaths=libmyudf.so
the driver will fail to get the actual path for libmyudf.so
, and that's the reason for adding spark.gluten.sql.columnar.backend.velox.driver.udfLibraryPaths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your patient explanation, Rong!
I previously thought that under yarn-client mode, spark.yarn.dist.files
would also copy the files to the driver. I got the AM and driver mixed up.
Let's fix this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest preserving 2 kinds of configuration for spark.gluten.sql.columnar.backend.velox.udfLibraryPaths
- relative path
- URI
And make `spark.gluten.sql.columnar.backend.velox.udfLibraryPaths=/path/to/..." invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marin-ma I have already fix the code according to the previous discussion in the comments, could you help review it again?
backends-velox/src/main/scala/org/apache/spark/sql/expression/UDFResolver.scala
Outdated
Show resolved
Hide resolved
4c3a092
to
76bde19
Compare
@@ -82,7 +82,8 @@ class VeloxUdfSuiteLocal extends VeloxUdfSuite { | |||
override val master: String = "local[2]" | |||
override protected def sparkConf: SparkConf = { | |||
super.sparkConf | |||
.set("spark.gluten.sql.columnar.backend.velox.udfLibraryPaths", udfLibPath) | |||
.set("spark.files", udfLibPath) | |||
.set("spark.gluten.sql.columnar.backend.velox.udfLibraryPaths", "libmyudf.so") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not hard code "libmyudf.so" here. Use a helper function to extract the filename from udfLibPath
. Note the udfLibPath
can also be a comma-separated string with multiple paths.
Could you also update the examples in the document with either relative or URI path? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
What changes were proposed in this pull request?
driverUdfLibPath
and retain onlyudfLibPath
, as there is no need to differentiate between the driver and executor. The method used to access the file is determined by whether --master=yarn is specified.VeloxBackend::initUdf
in both the driver and the executor.UdfResolver
does not load the UDF repeatedly; it only retrieves the function signatures.(Fixes: #5074)
How was this patch tested?
I tested with and without the --files/--archives arguments in local, yarn-client, and yarn-cluster modes.