XLA_TARGET=rocm compilation failed with "crosstool_wrapper_driver_is_not_gcc failed" #68

Awlexus · 2023-12-27T22:44:34Z

Hi, I've been trying to get GPU support running, but I keep running into this issue. I was first looking at this issue to get it running. I added the dependencies like this:

# mix.exs
      {:nx, github: "elixir-nx/nx", sparse: "nx", override: true},
      {:exla, github: "elixir-nx/nx", sparse: "exla", override: true}

I made sure to install the dependencies mentioned in this comment (adjusted for arch linux):

$ sudo pacman -S miopen-hip hipfft rocrand \
    hipsparse  hipsolver hipsparse rccl hip-runtime-amd \
    rocfft roctracer hipblas rocm-device-libs rocsolver rocblas

And then tried to compile it with $ XLA_BUILD=true XLA_TARGET=rocm mix compile

Compilation logs


==> xla

Compiling 2 files (.ex)

Generated xla app

mkdir -p /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb && 

cd /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb && 

git init && 

git remote add origin https://github.com/openxla/xla.git && 

git fetch --depth 1 origin 771e38178340cbaaef8ff20f44da5407c15092cb && 

git checkout FETCH_HEAD && 

rm /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelversion

Initialized empty Git repository in /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.git/

From https://github.com/openxla/xla

branch            771e38178340cbaaef8ff20f44da5407c15092cb -> FETCH_HEAD

Note: switching to 'FETCH_HEAD'.

You are in 'detached HEAD' state. You can look around, make experimental

changes and commit them, and you can discard any commits you make in this

state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may

do so (now or later) by using -c with the switch command. Example:
git switch -c 
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at 771e381 [XLA:GPU] Check tensor_float_32_execution_enabled() in Triton codegen too

rm -f /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/xla/extension && 

ln -s "/hdd/programming/elixir/fusemega/deps/xla/extension" /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/xla/extension && 

cd /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb && 

bazel build --define "framework_shared_object=false" -c opt   --config=rocm --action_env=HIP_PLATFORM=hcc --action_env=TF_ROCM_AMDGPU_TARGETS="gfx900,gfx906,gfx908,gfx90a,gfx1030" //xla/extension:xla_extension && 

mkdir -p /home/awlex/.cache/xla/0.6.0/cache/build/ && 

cp -f /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/bazel-bin/xla/extension/xla_extension.tar.gz /home/awlex/.cache/xla/0.6.0/cache/build/xla_extension-x86_64-linux-gnu-rocm.tar.gz

Starting local Bazel server and connecting to it...

INFO: Reading 'startup' options from /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --windows_enable_symlinks

INFO: Options provided by the client:

Inherited 'common' options: --isatty=0 --terminal_columns=80

INFO: Reading rc options for 'build' from /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc:

Inherited 'common' options: --experimental_repo_remote_exec

INFO: Reading rc options for 'build' from /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc:

'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility

INFO: Found applicable config definition build:short_logs in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --output_filter=DONT_MATCH_ANYTHING

INFO: Found applicable config definition build:v2 in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1

INFO: Found applicable config definition build:rocm in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm_hipcc=true --define=tensorflow_mkldnn_contraction_kernel=0 --repo_env TF_NEED_ROCM=1 --config=no_tfrt

INFO: Found applicable config definition build:no_tfrt in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/ir,tensorflow/compiler/mlir/tfrt/ir/mlrt,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ifrt,tensorflow/compiler/mlir/tfrt/tests/mlrt,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/compiler/mlir/tfrt/transforms/mlrt,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/runtime_fallback/test,tensorflow/core/runtime_fallback/test/gpu,tensorflow/core/runtime_fallback/test/saved_model,tensorflow/core/runtime_fallback/test/testdata,tensorflow/core/tfrt/stubs,tensorflow/core/tfrt/tfrt_session,tensorflow/core/tfrt/mlrt,tensorflow/core/tfrt/mlrt/attribute,tensorflow/core/tfrt/mlrt/kernel,tensorflow/core/tfrt/mlrt/bytecode,tensorflow/core/tfrt/mlrt/interpreter,tensorflow/compiler/mlir/tfrt/translate/mlrt,tensorflow/compiler/mlir/tfrt/translate/mlrt/testdata,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils,tensorflow/core/tfrt/utils/debug,tensorflow/core/tfrt/saved_model/python,tensorflow/core/tfrt/graph_executor/python,tensorflow/core/tfrt/saved_model/utils

INFO: Found applicable config definition build:linux in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes

INFO: Found applicable config definition build:dynamic_kernels in file /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS

Loading:

Loading: 0 packages loaded

DEBUG: /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/third_party/repo.bzl:132:14:

Warning: skipping import of repository 'llvm-raw' because it already exists.

Loading: 0 packages loaded

Loading: 0 packages loaded

Loading: 0 packages loaded

Loading: 0 packages loaded

Loading: 0 packages loaded

Loading: 0 packages loaded

currently loading: xla/extension

Analyzing: target //xla/extension:xla_extension (1 packages loaded, 0 targets configured)

Analyzing: target //xla/extension:xla_extension (36 packages loaded, 14 targets configured)

Analyzing: target //xla/extension:xla_extension (36 packages loaded, 14 targets configured)

Analyzing: target //xla/extension:xla_extension (179 packages loaded, 13765 targets configured)

INFO: Analyzed target //xla/extension:xla_extension (182 packages loaded, 16076 targets configured).

INFO: Found 1 target...

[0 / 196] [Prepa] Writing script xla/extension/xla_extension_headers.genrule_script.sh

[44 / 4,961] Compiling src/google/protobuf/compiler/cpp/file.cc; 3s local ... (12 actions, 11 running)

[71 / 4,961] Compiling src/google/protobuf/compiler/cpp/field.cc; 1s local ... (12 actions, 11 running)

[97 / 4,961] Compiling src/google/protobuf/compiler/command_line_interface.cc; 4s local ... (12 actions, 11 running)

[133 / 4,961] Compiling src/google/protobuf/util/internal/protostream_objectsource.cc; 2s local ... (12 actions, 11 running)

[173 / 4,961] Compiling src/google/protobuf/descriptor.cc; 8s local ... (12 actions, 11 running)

[256 / 5,172] Compiling llvm/lib/TableGen/TGParser.cpp [for host]; 3s local ... (12 actions, 11 running)

[339 / 5,172] Compiling llvm/lib/Support/KnownBits.cpp [for host]; 2s local ... (12 actions, 11 running)

[448 / 5,400] Compiling llvm/lib/Support/VirtualFileSystem.cpp; 4s local ... (12 actions, 11 running)

[542 / 5,400] Compiling llvm/lib/Support/Caching.cpp; 1s local ... (12 actions, 11 running)

[897 / 6,732] Compiling mlir/tools/mlir-tblgen/OpDocGen.cpp; 4s local ... (12 actions, 11 running)

[958 / 6,732] Compiling llvm/utils/TableGen/GlobalISelCombinerEmitter.cpp [for host]; 12s local ... (12 actions, 11 running)

[1,130 / 6,732] Compiling mlir/tools/mlir-tblgen/OpDefinitionsGen.cpp [for host]; 10s local ... (12 actions, 11 running)

[1,362 / 6,732] Generating code from table: lib/Target/AMDGPU/AMDGPU.td @llvm-project//llvm:AMDGPUCommonTableGen__gen_register_bank_genrule; 22s local ... (12 actions, 11 running)

[1,813 / 6,732] Generating code from table: lib/Target/AMDGPU/AMDGPU.td @llvm-project//llvm:AMDGPUCommonTableGen__gen_asm_matcher_genrule; 16s local ... (12 actions, 11 running)

[2,160 / 6,907] Compiling xla/hlo/utils/hlo_sharding_util.cc; 5s local ... (12 actions running)

[2,285 / 6,907] Compiling xla/service/hlo_rematerialization.cc; 12s local ... (12 actions running)

[2,446 / 6,907] Compiling llvm/lib/IR/AutoUpgrade.cpp; 7s local ... (12 actions running)

[2,624 / 6,907] Compiling xla/service/gpu/cub_sort_kernel.cu.cc; 14s local ... (12 actions, 11 running)

[2,758 / 6,907] Compiling xla/service/gpu/cub_sort_kernel.cu.cc; 12s local ... (12 actions, 11 running)

[2,942 / 6,907] Compiling src/cpu/x64/gemm/f32/jit_avx2_f32_copy_an_kern_autogen.cpp; 8s local ... (12 actions, 11 running)

[3,150 / 6,907] Compiling src/cpu/x64/jit_uni_resampling_kernel.cpp; 13s local ... (12 actions, 11 running)

[3,366 / 6,907] Compiling src/cpu/x64/jit_brgemm_conv.cpp; 45s local ... (12 actions, 11 running)

[3,642 / 6,908] Compiling llvm/lib/Passes/PassBuilder.cpp; 45s local ... (12 actions, 11 running)

[3,929 / 6,908] Compiling mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp; 56s local ... (12 actions, 11 running)

[4,340 / 6,908] Compiling stablehlo/dialect/StablehloOps.cpp; 41s local ... (12 actions, 11 running)

[4,628 / 6,908] Compiling mlir/lib/Dialect/SPIRV/IR/SPIRVOpDefinition.cpp; 33s local ... (12 actions, 11 running)

[4,938 / 6,908] Compiling llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp; 13s local ... (12 actions, 11 running)

[5,317 / 6,908] Compiling llvm/lib/Target/X86/X86ISelLowering.cpp; 22s local ... (12 actions, 11 running)

[5,761 / 6,908] Compiling mlir/lib/Dialect/Linalg/IR/LinalgDialect.cpp; 33s local ... (12 actions, 11 running)

[6,253 / 6,908] Compiling xla/mlir_hlo/mhlo/IR/hlo_ops.cc; 67s local ... (12 actions, 11 running)

[6,854 / 6,909] Compiling xla/service/gpu/runtime/fused_attention.cc; 64s local ... (12 actions, 11 running)

ERROR: /home/awlex/.cache/xla_extension/xla-771e38178340cbaaef8ff20f44da5407c15092cb/xla/service/gpu/BUILD:257:11: Compiling xla/service/gpu/ir_emitter_unnested.cc failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer ... (remaining 356 arguments skipped)

/home/awlex/.cache/bazel/_bazel_awlex/74b6e6c2abb213e1ba59aee5534c65a2/execroot/xla/external/local_config_rocm/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc:23: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13

import pipes

In file included from ./xla/shape_util.h:36,

from ./xla/index_util.h:25,

from ./xla/literal.h:41,

from ./xla/hlo/ir/dfs_hlo_visitor.h:26,

from ./xla/hlo/ir/hlo_computation.h:32,

from ./xla/service/gpu/ir_emitter_unnested.h:31,

from xla/service/gpu/ir_emitter_unnested.cc:16:

external/com_google_absl/absl/log/check.h:57: warning: "CHECK" redefined

57 | #define CHECK(condition) ABSL_LOG_INTERNAL_CHECK_IMPL((condition), #condition)

|

In file included from external/tsl/tsl/platform/logging.h:26,

from external/tsl/tsl/platform/status.h:34,

from ./xla/status.h:19,

from ./xla/statusor.h:18,

from ./xla/hlo/ir/hlo_opcode.h:24,

from ./xla/hlo/ir/dfs_hlo_visitor.h:25:

external/tsl/tsl/platform/default/logging.h:308: note: this is the location of the previous definition

308 | #define CHECK(condition)              

|

external/com_google_absl/absl/log/check.h:65: warning: "QCHECK" redefined

65 | #define QCHECK(condition) ABSL_LOG_INTERNAL_QCHECK_IMPL((condition), #condition)

|

external/tsl/tsl/platform/default/logging.h:542: note: this is the location of the previous definition

542 | #define QCHECK(condition) CHECK(condition)

|

external/com_google_absl/absl/log/check.h:88: warning: "DCHECK" redefined

88 | #define DCHECK(condition) ABSL_LOG_INTERNAL_DCHECK_IMPL((condition), #condition)

|

external/tsl/tsl/platform/default/logging.h:521: note: this is the location of the previous definition

521 | #define DCHECK(condition) 

|

external/com_google_absl/absl/log/check.h:116: warning: "CHECK_EQ" redefined

116 | #define CHECK_EQ(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:499: note: this is the location of the previous definition

499 | #define CHECK_EQ(val1, val2) CHECK_OP(Check_EQ, ==, val1, val2)

|

external/com_google_absl/absl/log/check.h:118: warning: "CHECK_NE" redefined

118 | #define CHECK_NE(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:500: note: this is the location of the previous definition

500 | #define CHECK_NE(val1, val2) CHECK_OP(Check_NE, !=, val1, val2)

|

external/com_google_absl/absl/log/check.h:120: warning: "CHECK_LE" redefined

120 | #define CHECK_LE(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:501: note: this is the location of the previous definition

501 | #define CHECK_LE(val1, val2) CHECK_OP(Check_LE, <=, val1, val2)

|

external/com_google_absl/absl/log/check.h:122: warning: "CHECK_LT" redefined

122 | #define CHECK_LT(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:502: note: this is the location of the previous definition

502 | #define CHECK_LT(val1, val2) CHECK_OP(Check_LT, <, val1, val2)

|

external/com_google_absl/absl/log/check.h:124: warning: "CHECK_GE" redefined

124 | #define CHECK_GE(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:503: note: this is the location of the previous definition

503 | #define CHECK_GE(val1, val2) CHECK_OP(Check_GE, >=, val1, val2)

|

external/com_google_absl/absl/log/check.h:126: warning: "CHECK_GT" redefined

126 | #define CHECK_GT(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:504: note: this is the location of the previous definition

504 | #define CHECK_GT(val1, val2) CHECK_OP(Check_GT, >, val1, val2)

|

external/com_google_absl/absl/log/check.h:128: warning: "QCHECK_EQ" redefined

128 | #define QCHECK_EQ(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:543: note: this is the location of the previous definition

543 | #define QCHECK_EQ(x, y) CHECK_EQ(x, y)

|

external/com_google_absl/absl/log/check.h:130: warning: "QCHECK_NE" redefined

130 | #define QCHECK_NE(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:544: note: this is the location of the previous definition

544 | #define QCHECK_NE(x, y) CHECK_NE(x, y)

|

external/com_google_absl/absl/log/check.h:132: warning: "QCHECK_LE" redefined

132 | #define QCHECK_LE(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:545: note: this is the location of the previous definition

545 | #define QCHECK_LE(x, y) CHECK_LE(x, y)

|

external/com_google_absl/absl/log/check.h:134: warning: "QCHECK_LT" redefined

134 | #define QCHECK_LT(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:546: note: this is the location of the previous definition

546 | #define QCHECK_LT(x, y) CHECK_LT(x, y)

|

external/com_google_absl/absl/log/check.h:136: warning: "QCHECK_GE" redefined

136 | #define QCHECK_GE(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:547: note: this is the location of the previous definition

547 | #define QCHECK_GE(x, y) CHECK_GE(x, y)

|

external/com_google_absl/absl/log/check.h:138: warning: "QCHECK_GT" redefined

138 | #define QCHECK_GT(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:548: note: this is the location of the previous definition

548 | #define QCHECK_GT(x, y) CHECK_GT(x, y)

|

external/com_google_absl/absl/log/check.h:140: warning: "DCHECK_EQ" redefined

140 | #define DCHECK_EQ(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:531: note: this is the location of the previous definition

531 | #define DCHECK_EQ(x, y) _TF_DCHECK_NOP(x, y)

|

external/com_google_absl/absl/log/check.h:142: warning: "DCHECK_NE" redefined

142 | #define DCHECK_NE(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:532: note: this is the location of the previous definition

532 | #define DCHECK_NE(x, y) _TF_DCHECK_NOP(x, y)

|

external/com_google_absl/absl/log/check.h:144: warning: "DCHECK_LE" redefined

144 | #define DCHECK_LE(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:533: note: this is the location of the previous definition

533 | #define DCHECK_LE(x, y) _TF_DCHECK_NOP(x, y)

|

external/com_google_absl/absl/log/check.h:146: warning: "DCHECK_LT" redefined

146 | #define DCHECK_LT(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:534: note: this is the location of the previous definition

534 | #define DCHECK_LT(x, y) _TF_DCHECK_NOP(x, y)

|

external/com_google_absl/absl/log/check.h:148: warning: "DCHECK_GE" redefined

148 | #define DCHECK_GE(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:535: note: this is the location of the previous definition

535 | #define DCHECK_GE(x, y) _TF_DCHECK_NOP(x, y)

|

external/com_google_absl/absl/log/check.h:150: warning: "DCHECK_GT" redefined

150 | #define DCHECK_GT(val1, val2) 

|

external/tsl/tsl/platform/default/logging.h:536: note: this is the location of the previous definition

536 | #define DCHECK_GT(x, y) _TF_DCHECK_NOP(x, y)

|

xla/service/gpu/ir_emitter_unnested.cc: In member function ‘tsl::Status xla::gpu::IrEmitterUnnested::EmitCubDeviceRadixSort(mlir::Operation*)’:

xla/service/gpu/ir_emitter_unnested.cc:1472:33: error: ‘CubSortThunk’ was not declared in this scope

1472 |   auto thunk = std::make_unique(

|                                 ^~~~~~~~~~~~

xla/service/gpu/ir_emitter_unnested.cc:1472:46: error: no matching function for call to ‘make_unique< >(xla::gpu::Thunk::ThunkInfo, xla::PrimitiveType, std::optionalxla::PrimitiveType, std::vectorxla::BufferAllocation::Slice&, std::vectorxla::BufferAllocation::Slice&, xla::BufferAllocation::Slice&, bool)’

1472 |   auto thunk = std::make_unique(

|                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

1473 |       Thunk::ThunkInfo::WithProfileAnnotation(op),

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1474 |       GetShape(op->getOperand(0)).element_type(),

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1475 |       radix_sort_op.getInputs().size() == 2

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1476 |           ? std::optional(GetShape(op->getOperand(1)).element_type())

|           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1477 |           : std::nullopt,

|           ~~~~~~~~~~~~~~~

1478 |       operands, results, scratch, radix_sort_op.getDescending());

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/memory:78,

from ./xla/service/gpu/ir_emitter_unnested.h:21:

/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1069:5: note: candidate: ‘template<class _Tp, class ... _Args> std::__detail::__unique_ptr_t<_Tp> std::make_unique(_Args&& ...)’

1069 |     make_unique(_Args&&... __args)

|     ^~~~~~~~~~~

/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1069:5: note:   template argument deduction/substitution failed:

xla/service/gpu/ir_emitter_unnested.cc:1472:46: error: template argument 1 is invalid

1472 |   auto thunk = std::make_unique(

|                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

1473 |       Thunk::ThunkInfo::WithProfileAnnotation(op),

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1474 |       GetShape(op->getOperand(0)).element_type(),

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1475 |       radix_sort_op.getInputs().size() == 2

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1476 |           ? std::optional(GetShape(op->getOperand(1)).element_type())

|           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1477 |           : std::nullopt,

|           ~~~~~~~~~~~~~~~

1478 |       operands, results, scratch, radix_sort_op.getDescending());

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1084:5: note: candidate: ‘template std::__detail::__unique_ptr_array_t<_Tp> std::make_unique(size_t)’

1084 |     make_unique(size_t __num)

|     ^~~~~~~~~~~

/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1084:5: note:   candidate expects 1 argument, 7 provided

/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1094:5: note: candidate: ‘template<class _Tp, class ... _Args> std::__detail::_invalid_make_unique_t<Tp> std::make_unique(Args&& ...)’ (deleted)

1094 |     make_unique(Args&&...) = delete;

|     ^~~~~~~~~~~

/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/../../../../include/c++/13.2.1/bits/unique_ptr.h:1094:5: note:   template argument deduction/substitution failed:

xla/service/gpu/ir_emitter_unnested.cc:1472:46: error: template argument 1 is invalid

1472 |   auto thunk = std::make_unique(

|                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^

1473 |       Thunk::ThunkInfo::WithProfileAnnotation(op),

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1474 |       GetShape(op->getOperand(0)).element_type(),

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1475 |       radix_sort_op.getInputs().size() == 2

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1476 |           ? std::optional(GetShape(op->getOperand(1)).element_type())

|           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1477 |           : std::nullopt,

|           ~~~~~~~~~~~~~~~

1478 |       operands, results, scratch, radix_sort_op.getDescending());

|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In file included from /usr/include/unistd.h:226,

from external/com_google_absl/absl/base/internal/thread_identity.h:27,

from external/com_google_absl/absl/synchronization/mutex.h:70,

from external/com_google_absl/absl/strings/internal/cordz_info.h:31,

from external/com_google_absl/absl/strings/cord.h:91,

from external/com_google_absl/absl/container/internal/hash_function_defaults.h:56,

from external/com_google_absl/absl/container/flat_hash_map.h:41,

from ./xla/hlo/ir/hlo_computation.h:26:

external/tsl/tsl/concurrency/async_value.h: In instantiation of ‘static void tsl::internal::ConcreteAsyncValue::VerifyOffsets() [with T = tsl::DummyValueForErrorAsyncValue]’:

external/tsl/tsl/concurrency/async_value.h:536:18:   required from ‘tsl::internal::ConcreteAsyncValue::ConcreteAsyncValue(absl::lts_20230802::Status) [with T = tsl::DummyValueForErrorAsyncValue]’

external/tsl/tsl/concurrency/async_value.h:727:30:   required from here

external/tsl/tsl/concurrency/async_value.h:702:28: warning: ‘offsetof’ within non-standard-layout type ‘tsl::internal::ConcreteAsyncValuetsl::DummyValueForErrorAsyncValue’ is conditionally-supported [-Winvalid-offsetof]

702 |     static_assert(offsetof(ConcreteAsyncValue, data_store.data) ==

|                            ^

external/tsl/tsl/concurrency/async_value.h:706:28: warning: ‘offsetof’ within non-standard-layout type ‘tsl::internal::ConcreteAsyncValuetsl::DummyValueForErrorAsyncValue’ is conditionally-supported [-Winvalid-offsetof]

706 |     static_assert(offsetof(ConcreteAsyncValue, data_store.error) ==

|                            ^

Target //xla/extension:xla_extension failed to build

Use --verbose_failures to see the command lines of failed build steps.

INFO: Elapsed time: 2466.368s, Critical Path: 158.87s

INFO: 6899 processes: 469 internal, 6430 local.

FAILED: Build did NOT complete successfully

FAILED: Build did NOT complete successfully

make: *** [Makefile:26: /home/awlex/.cache/xla/0.6.0/cache/build/xla_extension-x86_64-linux-gnu-rocm.tar.gz] Error 1

could not compile dependency :xla, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile xla --force", update it with "mix deps.update xla" or clean it with "mix deps.clean xla"

==> fusemega

** (Mix) Could not compile with "make" (exit status: 2).

You need to have gcc and make installed. If you are using

Ubuntu or any other Debian-based system, install the packages

"build-essential". Also install "erlang-dev" package if not

included in your Erlang/OTP version. If you're on Fedora, run

"dnf group install 'Development Tools'".

The text was updated successfully, but these errors were encountered:

jonatanklosko · 2023-12-28T06:33:42Z

Hey @Awlexus, this could be an issue with the build environment. To be sure, you can alternatively use the Docker scripts (./build.sh rocm), then use XLA_ARCHIVE_URL=file:///path/to/build.tzr.gz accordingly.

In case your GPU uses gfx1100 (7900 XTX), you may need to use a more recent XLA revision as per #63 (comment) (either by setting OPENXLA_GIT_REV with mix compile or changing the Makefile directly in case of the Docker build).

Awlexus · 2023-12-28T22:49:54Z

Thanks @jonatanklosko, I was able to compile it by using a a more recently xla git ref, but I could not get it to start GPU. I tried again by using the docker script to build it (which took a long time) and experienced the same error. It was able to allocate the memory, but the program would soon after be stopped by the operating system. Not sure where exactly this error comes from.

Error log


2023-12-28 23:43:05.394087: E xla/stream_executor/plugin_registry.cc:90] Invalid plugin kind specified: DNN
[info] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[info] XLA service 0x7fa4c018dc30 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
[info]   StreamExecutor device (0): AMD Radeon RX 6900 XT, AMDGPU ISA version: gfx1030
[info] Using BFC allocator.
[info] XLA backend allocating 15446782771 bytes on device 0 for BFCAllocator.
fish: Job 1, 'iex -S mix phx.server $argv' terminated by signal SIGSEGV (Address boundary error)

jonatanklosko · 2023-12-29T05:49:42Z

Hmm, do you do any Nx stuff on boot? Does the error happen every time? I assume it doesn't happen if you use CPU only? You can also try ELIXIR_ERL_OPTIONS="+sssdio 128 +sssdcpu 128", though it rather helps with segfaults.

Awlexus · 2023-12-31T18:09:24Z

Sorry for the late reply, I was away for a bit.

I'm not sure what changed since then, but now I'm getting a different error message. I already tried to write out a reply, before I noticed the change, so I added it at the end in case it could be helpful.

I now ran into the error message (RuntimeError) bitcode module not found at ./opencl.bc, which I was able to resolve by setting ROCM_PATH=/opt/rocm (Mentioning this in case someone else runs into this)

Now I'm running into the following error that soon afterwards causes the OS to send a SIGABRT

2023-12-31 18:56:44.607676: E xla/stream_executor/plugin_registry.cc:90] Invalid plugin kind specified: DNN
[info] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
[info] XLA service 0x7fe7ac1707a0 initialized for platform ROCM (this does not guarantee that XLA will be used). Devices:
[info]   StreamExecutor device (0): AMD Radeon RX 6900 XT, AMDGPU ISA version: gfx1030
[info] Using BFC allocator.
[info] XLA backend allocating 15446782771 bytes on device 0 for BFCAllocator.

...

beam.smp: /usr/src/debug/hip-runtime-amd/clr-rocm-5.7.1/hipamd/src/hip_code_object.cpp:762: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion `err == hipSuccess' failed.

Old Reply

do you do any Nx stuff on boot?

I've added a serving of openai/whisper to my application's supervision tree, but that should be all

    {:ok, model_info} = Bumblebee.load_model({:hf, @whisper_model})
    {:ok, featurizer} = Bumblebee.load_featurizer({:hf, @whisper_model})
    {:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, @whisper_model})
    {:ok, generation_config} = Bumblebee.load_generation_config({:hf, @whisper_model})
    generation_config = Bumblebee.configure(generation_config, max_new_tokens: 100)

    serving =
      Bumblebee.Audio.speech_to_text_whisper(
        model_info,
        featurizer,
        tokenizer,
        generation_config,
        compile: [batch_size: 4],
        chunk_num_seconds: 30,
        stream: true,
        defn_options: [compiler: EXLA]
      )

Does the error happen every time? I assume it doesn't happen if you use CPU only?

Yes, it happens every time, before the serving is able to complete a single run

jonatanklosko · 2024-01-02T09:04:13Z

Hmm, this looks like /opt/rocm is likely a symlink to a more specific version like /opt/rocm-5.7.1, let's set ROCM_PATH to that just to be sure. Otherwise maybe there's a certain ROCM HIP package missing in the environment?

Awlexus · 2024-01-04T13:29:30Z

I'm running Arch Linux and rely on the packages provided there, so I'm not sure what I could be missing. I have installed every package that pops up when I search for rocm, but just to be sure I've provided a list of the installed packages below.

Hmm, this looks like /opt/rocm is likely a symlink to a more specific version

/opt/rocm really just links to the packages installed on my system.

$ ls -lah /opt
drwxr-xr-x 34 root root 4.0K Dec 31 18:53  rocm/

Installed rocm packages

comgr 5.7.1-1
Compiler support library for ROCm LLVM
hip-runtime-amd 5.7.1-1
Heterogeneous Interface for Portability ROCm
hipblas 5.7.1-1
ROCm BLAS marshalling library
hsa-rocr 5.7.1-1
HSA Runtime API and runtime for ROCm
magma-hip 2.7.2-2
Matrix Algebra on GPU and Multicore Architectures (with ROCm/HIP)
python-pytorch-rocm 2.1.2-1
Tensors and Dynamic neural networks in Python with strong GPU acceleration (with ROCm)
python-torchvision-rocm 0.16.2-1
Datasets, transforms, and models specific to computer vision (with ROCM support)
rccl 5.7.1-1
ROCm Communication Collectives Library
rocalution 5.7.1-1
Next generation library for iterative sparse solvers for ROCm platform
rocblas 5.7.1-1
Next generation BLAS implementation for ROCm platform
rocfft 5.7.1-1
Next generation FFT implementation for ROCm
rocm-clang-ocl 5.7.1-1
OpenCL compilation with clang compiler
rocm-cmake 5.7.1-1
CMake modules for common build tasks needed for the ROCm software stack
rocm-core 5.7.1-1
AMD ROCm core package (version files)
rocm-device-libs 5.7.1-1
ROCm Device Libraries
rocm-hip-libraries 5.7.1-2
Develop certain applications using HIP and libraries for AMD platforms
rocm-hip-runtime 5.7.1-2
Packages to run HIP applications on the AMD platform
rocm-hip-sdk 5.7.1-2
Develop applications using HIP and libraries for AMD platforms
rocm-language-runtime 5.7.1-2
ROCm runtime
rocm-llvm 5.7.1-1
Radeon Open Compute - LLVM toolchain (llvm, clang, lld)
rocm-ml-libraries 5.7.1-2
Packages for key Machine Learning libraries
rocm-ml-sdk 5.7.1-2
develop and run Machine Learning applications optimized for AMD platforms
rocm-opencl-runtime 5.7.1-1
OpenCL implementation for AMD
rocm-opencl-sdk 5.7.1-2
Develop OpenCL-based applications for AMD platforms
rocm-smi-lib 5.7.1-1
ROCm System Management Interface Library
rocminfo 5.7.1-1
ROCm Application for Reporting System Info
rocrand 5.7.1-1
Pseudo-random and quasi-random number generator on ROCm
rocsolver 5.7.1-1
Subset of LAPACK functionality on the ROCm platform
rocsparse 5.7.1-1
BLAS for sparse computation on top of ROCm
rocthrust 5.7.1-1
Port of the Thrust parallel algorithm library atop HIP/ROCm
roctracer 5.7.1-1
ROCm tracer library for performance tracing

jonatanklosko · 2024-01-05T04:26:25Z

I see. It must be something environment related, given that others managed to run it with that revision, but I don't have any more guesses right now.

One alternative would be running stuff inside Docker, though that's not exactly convenient. Or you could try building with the latest openxla revision to see if it's something fixed upstream, but note that this usually requires some adjustments in the build file or/and in exla (depending on how much the xla APIs changed).

jonatanklosko · 2024-05-22T08:55:43Z

We just had a new release, see #82 (comment). You can try it with ROCm 6.0, and if there are issues, leave a comment on #82 :)

jonatanklosko closed this as completed May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XLA_TARGET=rocm compilation failed with "crosstool_wrapper_driver_is_not_gcc failed" #68

XLA_TARGET=rocm compilation failed with "crosstool_wrapper_driver_is_not_gcc failed" #68

Awlexus commented Dec 27, 2023

jonatanklosko commented Dec 28, 2023

Awlexus commented Dec 28, 2023

jonatanklosko commented Dec 29, 2023

Awlexus commented Dec 31, 2023

jonatanklosko commented Jan 2, 2024

Awlexus commented Jan 4, 2024 •

edited

Loading

jonatanklosko commented Jan 5, 2024 •

edited

Loading

jonatanklosko commented May 22, 2024

XLA_TARGET=rocm compilation failed with "crosstool_wrapper_driver_is_not_gcc failed" #68

XLA_TARGET=rocm compilation failed with "crosstool_wrapper_driver_is_not_gcc failed" #68

Comments

Awlexus commented Dec 27, 2023

jonatanklosko commented Dec 28, 2023

Awlexus commented Dec 28, 2023

jonatanklosko commented Dec 29, 2023

Awlexus commented Dec 31, 2023

jonatanklosko commented Jan 2, 2024

Awlexus commented Jan 4, 2024 • edited Loading

jonatanklosko commented Jan 5, 2024 • edited Loading

jonatanklosko commented May 22, 2024

Awlexus commented Jan 4, 2024 •

edited

Loading

jonatanklosko commented Jan 5, 2024 •

edited

Loading