Skip to content

Commit

Permalink
fix
Browse files Browse the repository at this point in the history
  • Loading branch information
kecookier committed Mar 23, 2024
1 parent 1086721 commit d4af23e
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ object BackendSettings extends BackendSettingsApi {
val SHUFFLE_SUPPORTED_CODEC = Set("lz4", "zstd")

val GLUTEN_VELOX_UDF_LIB_PATHS = getBackendConfigPrefix() + ".udfLibraryPaths"
val GLUTEN_VELOX_DRIVER_UDF_LIB_PATHS = getBackendConfigPrefix() + ".driver.udfLibraryPaths"

val MAXIMUM_BATCH_SIZE: Int = 32768

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@ object UDFResolver extends Logging {

private val LIB_EXTENSION = ".so"

private lazy val isDriver: Boolean =
"driver".equals(SparkEnv.get.executorId)


def registerUDF(name: String, bytes: Array[Byte]): Unit = {
registerUDF(name, TypeConverter.from(bytes))
}
Expand Down Expand Up @@ -114,7 +118,13 @@ object UDFResolver extends Logging {

def resolveUdfConf(conf: java.util.Map[String, String]): Unit = {
val sparkConf = SparkEnv.get.conf
val udfLibPaths = sparkConf.getOption(BackendSettings.GLUTEN_VELOX_UDF_LIB_PATHS)
val udfLibPaths = if (isDriver) {
sparkConf
.getOption(BackendSettings.GLUTEN_VELOX_DRIVER_UDF_LIB_PATHS)
.orElse(sparkConf.getOption(BackendSettings.GLUTEN_VELOX_UDF_LIB_PATHS))
} else {
sparkConf.getOption(BackendSettings.GLUTEN_VELOX_UDF_LIB_PATHS)
}

udfLibPaths match {
case Some(paths) =>
Expand Down
6 changes: 6 additions & 0 deletions docs/get-started/Velox.md
Original file line number Diff line number Diff line change
Expand Up @@ -435,16 +435,22 @@ target_link_libraries(myudf PRIVATE ${VELOX_LIBRARY})

Gluten loads the UDF libraries at runtime. You can upload UDF libraries via `--files` or `--archives`, and configure the libray paths using the provided Spark configuration, which accepts comma separated list of library paths.

Note if running on Yarn client mode, the uploaded files are not reachable on driver side. Users should copy those files to somewhere reachable for driver and set `spark.gluten.sql.columnar.backend.velox.driver.udfLibraryPaths`. This configuration is also useful when the `udfLibraryPaths` is different between driver side and executor side.

- Use `--files`
```shell
--files /path/to/gluten/cpp/build/velox/udf/examples/libmyudf.so
--conf spark.gluten.sql.columnar.backend.velox.udfLibraryPaths=libmyudf.so
# Needed for Yarn client mode
--conf spark.gluten.sql.columnar.backend.velox.driver.udfLibraryPaths=file:///path/to/libmyudf.so
```

- Use `--archives`
```shell
--archives /path/to/udf_archives.zip#udf_archives
--conf spark.gluten.sql.columnar.backend.velox.udfLibraryPaths=udf_archives
# Needed for Yarn client mode
--conf spark.gluten.sql.columnar.backend.velox.driver.udfLibraryPaths=file:///path/to/udf_archives.zip
```

- Specify URI
Expand Down

0 comments on commit d4af23e

Please sign in to comment.