Testing s390x with RUST_BACKTRACE=1 in QEMU crashes #9719

alexcrichton · 2024-12-03T19:06:13Z

In landing #9702 I was wrestling with a s390x-specific failure on CI. The problem seems to stem from RUST_BACKTRACE=1 and using std::backtrace which an updated version of anyhow uses. This code is now all landed on main so the current main branch of Wasmtime fails with:

$ export RUST_BACKTRACE=1 
$ export CARGO_PROFILE_DEV_OPT_LEVEL=2 
$ cargo test -p wasmtime-wasi --target s390x-unknown-linux-gnu -p wasmtime-wasi
...
    Finished `test` profile [optimized + debuginfo] target(s) in 56.55s
     Running unittests src/lib.rs (target/s390x-unknown-linux-gnu/debug/deps/wasmtime_wasi-e36f6fb1833db8d7)

running 15 tests
test host::filesystem::test::table_readdir_works ... ok
test stdio::test::memory_stdin_stream ... ok
test random::test::deterministic ... ok
test stdio::test::async_stdout_stream_unblocks ... ok
test pipe::test::backpressure_read_stream ... ok
test stdio::test::async_stdin_stream ... ok
test pipe::test::infinite_read_stream ... ok
test pipe::test::finite_read_stream ... ok
test pipe::test::empty_read_stream ... ok
test pipe::test::sink_write_stream ... ok
test pipe::test::closed_write_stream ... ok
test pipe::test::multiple_chunks_write_stream ... ok
test pipe::test::multiple_chunks_read_stream ... ok
test pipe::test::backpressure_write_stream ... ok
test pipe::test::backpressure_write_stream_with_flush ... ok

test result: ok. 15 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.36s

     Running tests/all/main.rs (target/s390x-unknown-linux-gnu/debug/deps/all-d771c2793612b780)

running 196 tests
test async_::preview1_clock_time_get ... ok
test async_::preview1_fd_filestat_get ... ok
test api::api_time ... ok
test api::api_reactor ... ok
test async_::preview1_big_random_buf ... ok
qemu: uncaught target signal 11 (Segmentation fault) - core dumped
error: test failed, to rerun pass `-p wasmtime-wasi --test all`

Caused by:
  process didn't exit successfully: `qemu-s390x -L /usr/s390x-linux-gnu -E LD_LIBRARY_PATH=/usr/s390x-linux-gnu/lib -E WASMTIME_TEST_NO_HOG_MEMORY=1 /home/alex/code/wasmtime/target/s390x-unknown-linux-gnu/debug/deps/all-d771c2793612b780` (signal: 11, SIGSEGV: invalid memory reference)

@uweigand would you be able to help take a closer look at this? I'm not sure if this is a Wasmtime/jit code issue (maybe unwind info?) or something else in rustc perhaps

The text was updated successfully, but these errors were encountered:

uweigand · 2024-12-03T22:01:15Z

Good news is I can reproduce natively, so it's not a qemu issue. The segfault happens in MD_FALLBACK_FRAME_STATE_FOR in libgcc, because of an invalid PC. This typically indicates some problem with unwind info in a lower frame. And indeed GDB also isn't able to unwind fully. I'll need to look where this comes from.

uweigand · 2024-12-04T00:36:36Z

This is a regression introduced with the tail-call ABI. If we have incoming tail-call stack arguments, the DWARF rule to unwind the caller's SP is incorrect. I'm working on a fix.

On s390x, the unwound SP is always at current CFA - 160. Therefore, the default rule used on most other platforms (which sets the unwound SP to the current CFA) is incorrect, so we need to provide an explicit DWARF CFI rule to unwind SP. With the platform ABI, the caller's SP is always stored in the register save area like other call-saved GPRs, so we can simply use a normal DW_CFA_offset rule. However, with the new tail-call ABI, the value saved in that slot is incorrect - it is not corrected for the incoming tail-call stack arguments that will have been removed as the tail call returns. To fix this without introducing unnecessary run-time overhead, we can simply use a DW_CFA_val_offset rule that will set the unwound SP to CFA - 160, which is always correct. However, the current UnwindInst abstraction does not allow any way to generate this DWARF CFI instruction. Therefore, we introduce a new UnwindInst::RegStackOffset rule for this purpose. Fixes: bytecodealliance#9719

uweigand · 2024-12-04T01:57:47Z

The above PR fixes the issue for me.

On s390x, the unwound SP is always at current CFA - 160. Therefore, the default rule used on most other platforms (which sets the unwound SP to the current CFA) is incorrect, so we need to provide an explicit DWARF CFI rule to unwind SP. With the platform ABI, the caller's SP is always stored in the register save area like other call-saved GPRs, so we can simply use a normal DW_CFA_offset rule. However, with the new tail-call ABI, the value saved in that slot is incorrect - it is not corrected for the incoming tail-call stack arguments that will have been removed as the tail call returns. To fix this without introducing unnecessary run-time overhead, we can simply use a DW_CFA_val_offset rule that will set the unwound SP to CFA - 160, which is always correct. However, the current UnwindInst abstraction does not allow any way to generate this DWARF CFI instruction. Therefore, we introduce a new UnwindInst::RegStackOffset rule for this purpose. Fixes: #9719

alexcrichton added the cranelift:area:s390x Issues related to Cranelift's s390x backend label Dec 3, 2024

uweigand mentioned this issue Dec 4, 2024

[s390x] Fix SP unwind rule for the tail-call ABI #9725

Merged

alexcrichton closed this as completed in #9725 Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing s390x with RUST_BACKTRACE=1 in QEMU crashes #9719

Testing s390x with RUST_BACKTRACE=1 in QEMU crashes #9719

alexcrichton commented Dec 3, 2024

uweigand commented Dec 3, 2024

uweigand commented Dec 4, 2024

uweigand commented Dec 4, 2024

Testing s390x with RUST_BACKTRACE=1 in QEMU crashes #9719

Testing s390x with RUST_BACKTRACE=1 in QEMU crashes #9719

Comments

alexcrichton commented Dec 3, 2024

uweigand commented Dec 3, 2024

uweigand commented Dec 4, 2024

uweigand commented Dec 4, 2024