-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build fails with ROCm on Gentoo Linux #10793
Comments
@ekuznetsov139 @draganmladjenovic @pemeliya maybe one of you can help? |
@Eiji7 As for using clang the eneblement for it is not yet upstreamed so the gcc is only working option. Anything that can build the latest llvm should suffice (I think, anyway the oldest I've used is gcc 9.) As for not using /opt/rocm. Not sure how far would that get you. You either hack rocm_configure or recreate /opt/rocm via symlinks. There are some pieces of runtime also that assume /opt/rocm if ROCM_PATH, so be sure to have it set up all the time. I think the first step is to have working g++ that can compile cpp hello world (with std::cout and such). What else. You can set gcc path via https://github.com/openxla/xla/blob/main/.bazelrc#L253 and not realy on PATH if it helps. |
@draganmladjenovic I have removed everything and started it all over again this time with the latest commits. Looks like something have changed, but I still have some errors.
I have no idea how to fix next problem:
I have same case as other people i.e. Here are things I have modified/added:
So here is updated load("//xla/stream_executor:build_defs.bzl", "if_cuda_or_rocm",)
load("@local_config_cuda//cuda:build_defs.bzl", "if_cuda",)
load("@local_config_rocm//rocm:build_defs.bzl", "if_rocm",)
load("@tsl//tsl:tsl.bzl", "if_with_tpu_support")
load("@tsl//tsl:tsl.bzl", "tsl_grpc_cc_dependencies",)
load("@tsl//tsl:tsl.bzl", "transitive_hdrs",)
load("@rules_pkg//pkg:tar.bzl", "pkg_tar")
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_file")
package(default_visibility=["//visibility:private"])
# Shared library which contains the subset of XLA required for EXLA
cc_binary(
name = "libxla_extension.so",
deps = [
"//xla:xla_proto_cc_impl",
"//xla:xla_data_proto_cc_impl",
"//xla/service:hlo_proto_cc_impl",
"//xla/service/memory_space_assignment:memory_space_assignment_proto_cc_impl",
"//xla/service:buffer_assignment_proto_cc_impl",
"//xla/service/gpu:backend_configs_cc_impl",
"//xla/service/gpu/model:hlo_op_profile_proto_cc_impl",
"//xla/stream_executor:device_description_proto_cc_impl",
"//xla:autotune_results_proto_cc_impl",
"//xla/stream_executor:stream_executor_impl",
"//xla/stream_executor/gpu:gpu_init_impl",
"//xla/stream_executor/host:host_platform",
"//xla:literal",
"//xla:shape_util",
"//xla:status",
"//xla:statusor",
"//xla:types",
"//xla:util",
"//xla/client:xla_computation",
"//xla/mlir/utils:error_util",
"//xla/mlir_hlo",
"//xla/mlir_hlo:all_passes",
"//xla/pjrt:mlir_to_hlo",
"//xla/client/lib:lu_decomposition",
"//xla/client/lib:math",
"//xla/client/lib:qr",
"//xla/client/lib:svd",
"//xla/client/lib:self_adjoint_eig",
"//xla/client/lib:sorting",
"//xla/mlir_hlo:mhlo_passes",
"//xla/translate/hlo_to_mhlo:hlo_to_mlir_hlo",
"//xla/pjrt:interpreter_device",
"//xla/pjrt:pjrt_client",
"//xla/pjrt:pjrt_compiler",
"//xla/pjrt:tfrt_cpu_pjrt_client",
"//xla/pjrt:pjrt_c_api_client",
"//xla/pjrt/distributed",
"//xla/pjrt/gpu:se_gpu_pjrt_client",
"//xla/pjrt/distributed:client",
"//xla/pjrt/distributed:service",
"//xla:autotuning_proto_cc_impl",
"@com_google_absl//absl/types:span",
"@com_google_absl//absl/types:optional",
"@com_google_absl//absl/base:log_severity",
"@com_google_protobuf//:protobuf",
"@llvm-project//llvm:Support",
"@llvm-project//mlir:FuncDialect",
"@llvm-project//mlir:IR",
"@llvm-project//mlir:Parser",
"@llvm-project//mlir:Pass",
"@llvm-project//mlir:ReconcileUnrealizedCasts",
"@llvm-project//mlir:SparseTensorDialect",
"@tsl//tsl/platform:errors",
"@tsl//tsl/platform:fingerprint",
# "@tsl//tsl/platform:float8",
"@tsl//tsl/platform:statusor",
"@tsl//tsl/platform:env_impl",
"@tsl//tsl/platform:tensor_float_32_utils",
"@tsl//tsl/profiler/utils:time_utils_impl",
"@tsl//tsl/profiler/backends/cpu:annotation_stack_impl",
"@tsl//tsl/profiler/backends/cpu:traceme_recorder_impl",
"@tsl//tsl/protobuf:protos_all_cc_impl",
"@tsl//tsl/protobuf:dnn_proto_cc_impl",
"@tsl//tsl/framework:allocator",
"@tsl//tsl/framework:allocator_registry_impl",
# "@tsl//tsl/util:determinism",
]
# GRPC Dependencies (needed for PjRt distributed)
+ tsl_grpc_cc_dependencies()
+ if_cuda_or_rocm([
"//xla/service:gpu_plugin",
])
+ if_cuda([
"//xla/stream_executor:cuda_platform"
])
+ if_rocm([
"//xla/stream_executor:rocm_platform"
]),
copts = ["-fvisibility=default"],
linkopts= select({
"@tsl//tsl:macos": [
# We set the install_name, such that the library is looked up
# in the RPATH at runtime, otherwise the install_name is an
# arbitrary path within bazel workspace
"-Wl,-install_name,@rpath/libxla_extension.so",
# We set RPATH to the same dir as libxla_extension.so, so that
# loading PjRt plugins in the same directory works out of the box
"-Wl,-rpath,@loader_path/",
],
"//conditions:default": [
"-Wl,-soname,libxla_extension.so",
"-Wl,-rpath='$$ORIGIN'",
],
}),
features = ["-use_header_modules"],
linkshared = 1,
)
# Transitive hdrs gets all headers required by deps, including
# transitive dependencies, it seems though it generates a lot
# of unused headers as well
transitive_hdrs(
name = "xla_extension_dep_headers",
deps = [
":libxla_extension.so",
]
)
# This is the genrule used by TF install headers to correctly
# map headers into a directory structure
genrule(
name = "xla_extension_headers",
srcs = [
":xla_extension_dep_headers",
],
outs = ["include"],
cmd = """
mkdir $@
for f in $(SRCS); do
d="$${f%/*}"
d="$${d#bazel-out/*/genfiles/}"
d="$${d#bazel-out/*/bin/}"
if [[ $${d} == *local_config_* ]]; then
continue
fi
if [[ $${d} == external* ]]; then
extname="$${d#*external/}"
extname="$${extname%%/*}"
if [[ $${TF_SYSTEM_LIBS:-} == *$${extname}* ]]; then
continue
fi
d="$${d#*external/farmhash_archive/src}"
d="$${d#*external/$${extname}/}"
fi
# Remap third party paths
d="$${d/third_party\\/llvm_derived\\/include\\/llvm_derived/llvm_derived}"
# Remap llvm paths
d="$${d/llvm\\/include\\/llvm/llvm}"
d="$${d/llvm\\/include\\/llvm-c/llvm-c}"
# Remap mlir paths
d="$${d/mlir\\/include\\/mlir/mlir}"
# Remap google path
d="$${d/src\\/google/google}"
# Remap grpc paths
d="$${d/include\\/grpc/grpc}"
# Remap tfrt paths
d="$${d/include\\/tfrt/tfrt}"
# Remap ml_dtypes paths
d="$${d/_virtual_includes\\/int4\\/ml_dtypes/ml_dtypes}"
d="$${d/_virtual_includes\\/float8\\/ml_dtypes/ml_dtypes}"
mkdir -p "$@/$${d}"
cp "$${f}" "$@/$${d}/"
done
# Files in xla/mlir_hlo include sibling headers from mhlo, so we
# need to mirror them in includes
cp -r $@/xla/mlir_hlo/mhlo $@
""",
)
genrule(
name = "libtpu_whl",
outs = ["libtpu.whl"],
cmd = """
libtpu_version="0.1.dev20231102"
libtpu_storage_path="https://storage.googleapis.com/cloud-tpu-tpuvm-artifacts/wheels/libtpu-nightly/libtpu_nightly-$${libtpu_version}-py3-none-any.whl"
wget -O "$@" "$$libtpu_storage_path"
"""
)
genrule(
name = "libtpu_so",
srcs = [
":libtpu_whl"
],
outs = ["libtpu.so"],
cmd = """
unzip -p "$(SRCS)" libtpu/libtpu.so > "$@"
"""
)
# This genrule remaps libxla_extension.so to lib/libxla_extension.so
genrule(
name = "xla_extension_lib",
srcs = [
":libxla_extension.so",
]
+ if_with_tpu_support([
":libtpu_so"
]),
outs = ["lib"],
cmd = """
mkdir $@
mv $(SRCS) $@
"""
)
# See https://github.com/bazelbuild/rules_pkg/issues/517#issuecomment-1492917994
genrule(
name = "xla_extension",
outs = ["xla_extension.tar.gz"],
srcs = [
":xla_extension_lib",
":xla_extension_headers",
],
cmd = """
mkdir xla_extension
cp -r $(SRCS) xla_extension
tar czf "$@" xla_extension
"""
) The command has not been changed: bazel build --define "framework_shared_object=false" -c opt --config=rocm --action_env=HIP_PLATFORM=hcc --action_env=TF_ROCM_AMDGPU_TARGETS="gfx1100" //xla/extension:xla_extension --verbose_failures Here is a complete log: INFO: Reading 'startup' options from $HOME/tmp/extension/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=129
INFO: Reading rc options for 'build' from $HOME/tmp/extension/.bazelrc:
Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from $HOME/tmp/extension/.bazelrc:
'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Found applicable config definition build:short_logs in file $HOME/tmp/extension/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file $HOME/tmp/extension/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:rocm in file $HOME/tmp/extension/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm_hipcc=true --define=tensorflow_mkldnn_contraction_kernel=0 --repo_env TF_NEED_ROCM=1 --config=no_tfrt
INFO: Found applicable config definition build:no_tfrt in file $HOME/tmp/extension/.bazelrc: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/ir,tensorflow/compiler/mlir/tfrt/ir/mlrt,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ifrt,tensorflow/compiler/mlir/tfrt/tests/mlrt,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/compiler/mlir/tfrt/transforms/mlrt,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/runtime_fallback/test,tensorflow/core/runtime_fallback/test/gpu,tensorflow/core/runtime_fallback/test/saved_model,tensorflow/core/runtime_fallback/test/testdata,tensorflow/core/tfrt/stubs,tensorflow/core/tfrt/tfrt_session,tensorflow/core/tfrt/mlrt,tensorflow/core/tfrt/mlrt/attribute,tensorflow/core/tfrt/mlrt/kernel,tensorflow/core/tfrt/mlrt/bytecode,tensorflow/core/tfrt/mlrt/interpreter,tensorflow/compiler/mlir/tfrt/translate/mlrt,tensorflow/compiler/mlir/tfrt/translate/mlrt/testdata,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils,tensorflow/core/tfrt/utils/debug,tensorflow/core/tfrt/saved_model/python,tensorflow/core/tfrt/graph_executor/python,tensorflow/core/tfrt/saved_model/utils
INFO: Found applicable config definition build:linux in file $HOME/tmp/extension/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file $HOME/tmp/extension/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
INFO: Analyzed target //xla/extension:xla_extension (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: $HOME/.cache/bazel/_bazel_$USER/6b422d8728a5f643d1627bf83880014b/external/com_github_grpc_grpc/src/compiler/BUILD:80:18: Linking external/com_github_grpc_grpc/src/compiler/grpc_cpp_plugin [for tool] failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target @com_github_grpc_grpc//src/compiler:grpc_cpp_plugin)
(cd $HOME/.cache/bazel/_bazel_$USER/6b422d8728a5f643d1627bf83880014b/execroot/xla && \
exec env - \
PATH=$HOME/.asdf/installs/bazel/6.5.0/bin:$HOME/.asdf/shims:$HOME/.asdf/bin:$HOME/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/bin:/usr/lib/llvm/17/bin:/etc/eselect/wine/bin:/usr/libexec/gcc/x86_64-pc-linux-gnu/13 \
PWD=/proc/self/cwd \
external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt-exec-50AE0418/bin/external/com_github_grpc_grpc/src/compiler/grpc_cpp_plugin-2.params)
# Configuration: 3d0bdd74a8dc039c68e3f05fc81ecf844634421361581216e7d11eb3409b5a52
# Execution platform: @local_execution_config_platform//:platform
$HOME/.cache/bazel/_bazel_$USER/6b422d8728a5f643d1627bf83880014b/execroot/xla/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc:162: SyntaxWarning: invalid escape sequence '\.'
re.search('\.cpp$|\.cc$|\.c$|\.cxx$|\.C$', f)]
$HOME/.cache/bazel/_bazel_$USER/6b422d8728a5f643d1627bf83880014b/execroot/xla/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc:23: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13
import pipes
gcc: fatal error: ‘-fuse-linker-plugin’, but liblto_plugin.so not found
compilation terminated.
Target //xla/extension:xla_extension failed to build
ERROR: $HOME/tmp/extension/xla/extension/BUILD:126:8 Executing genrule //xla/extension:xla_extension_headers failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target @com_github_grpc_grpc//src/compiler:grpc_cpp_plugin)
(cd $HOME/.cache/bazel/_bazel_$USER/6b422d8728a5f643d1627bf83880014b/execroot/xla && \
exec env - \
PATH=$HOME/.asdf/installs/bazel/6.5.0/bin:$HOME/.asdf/shims:$HOME/.asdf/bin:$HOME/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/bin:/usr/lib/llvm/17/bin:/etc/eselect/wine/bin:/usr/libexec/gcc/x86_64-pc-linux-gnu/13 \
PWD=/proc/self/cwd \
external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt-exec-50AE0418/bin/external/com_github_grpc_grpc/src/compiler/grpc_cpp_plugin-2.params)
# Configuration: 3d0bdd74a8dc039c68e3f05fc81ecf844634421361581216e7d11eb3409b5a52
# Execution platform: @local_execution_config_platform//:platform
INFO: Elapsed time: 0.266s, Critical Path: 0.06s
INFO: 33 processes: 33 internal.
FAILED: Build did NOT complete successfully |
@Eiji7 I would still suggest that you get a working gcc instalation outside of bazel. Try building and hello world with -flto and see it it works. |
@draganmladjenovic On Some information about installed gcc (list installed, files, USE flags)
Anyway with or without
|
Hmm. Try setting action_env=CROSSTOOL_VERBOSE=1 to see line the gcc is invoked with. Maybe that could give us some clue. |
@draganmladjenovic Here is the command:
I can confirm that running it standalone gives same error. I have tried to use
|
This is not only about Gentoo, I am not able to reproduce the ROCM build even in the provided rocm dockerfile with Ubuntu 20 + gcc 9 I also tried in Fedora 40 (which included rocm) and it becomes a rabbithole of unmet dependencies |
I see seam work done for spack on this front https://github.com/spack/spack/pull/44095/files. Maybe this helps you. |
What was the issue? Which commit did you use? It is very hard to find a working build ox xla for ROCM due to upstream not having an ROCM CI. |
I am interested in same stack than @Eiji7 elixir-nx Here is where I am stuck right now: elixir-nx/xla#81 (comment) |
@draganmladjenovic I do not see anything related to my recent error: gcc: fatal error: ‘-fuse-linker-plugin’, but liblto_plugin.so not found As already wrote all $ file /usr/libexec/gcc/x86_64-pc-linux-gnu/13/liblto_plugin.so
/usr/libexec/gcc/x86_64-pc-linux-gnu/13/liblto_plugin.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, stripped I have found something interesting …
|
Hi, I would like to use
ROCm
inElixir
language. For this I have started playing withxla
fromelixir-nx
which uses this repository under the hood within it'sMakefile
. I have already described my case and problems I was trying to solve in this issue: elixir-nx/xla#81Looks like that I have installed all the required dependencies as the bazel call:
passed all
ROCm
-related checks and fails on compilation ofcpp
files. Because of that I feel that I did most of the work and a simple step or two is just missing. Sincegfx1100
AMD target is not well tested yet I got a suggestion to use adocker
. 😞Unfortunately I'm that type of developer who is politically incorrect i.e. wants to solve a problem rather than using a workaround, someone who is curious how things work and someone who want to do things "right" without going the "easy" way (as if
Gentoo
itself is not enough). 😅Hopefully there is someone in the community who already tried
ROCm
onGentoo
or someone who could help me investigate the root problem in my environment. As wrote in linked issue I'm fine helping with tests, so maybe I can put my 2 cents here and there once I would be able to compilexla
-related stuff. 🤝The text was updated successfully, but these errors were encountered: