tcmulib_get_next_command is broken on aarch64 #688

geordieintheshellcode · 2022-10-04T16:35:06Z

Repro steps

Grab this branch (https://github.com/geordieintheshellcode/tcmu-runner/tree/tcmu-get-next-command-broken). It's just master with some scripts and a fix to consumer.c to make it not when receiving IO.
Build and run the consumer example. consumer handles any TCMU backstores with the "foo" handler. For WRITE SCSI commands we'll simply drop them on the floor and return TCMU_STS_OK.
Run "create_foo.sh" to create a TCMU backstore backed by the foo handler and hook it up to a SCSI HBA/LUN.
You'll see a bunch of benign error messages in dmesg, that's purely because the "foo" handler is only partially implement and will reject read SCSI commands with an error.
Now find the block device and run a fio test against it: sudo FILE=/dev/<dev_name> BS=4k QD=128 fio randwrite.fio

Almost immediately you'll see the "cmd_id X not found, ring is broken" message in dmesg, and a bunch of writes will fail:

[Tue Oct  4 15:45:50 2022] cmd_id 0 not found, ring is broken
[Tue Oct  4 15:45:50 2022] ring broken, not handling completions
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#174 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#174 Sense Key : Not Ready [current]
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#174 Add. Sense: Logical unit communication failure
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#174 CDB: Write(10) 2a 00 00 17 69 70 00 00 08 00
[Tue Oct  4 15:45:50 2022] blk_update_request: I/O error, dev sda, sector 1534320 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#175 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#175 Sense Key : Not Ready [current]
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#175 Add. Sense: Logical unit communication failure
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#175 CDB: Write(10) 2a 00 00 19 a3 18 00 00 08 00
[Tue Oct  4 15:45:50 2022] blk_update_request: I/O error, dev sda, sector 1680152 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#176 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#176 Sense Key : Not Ready [current]
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#176 Add. Sense: Logical unit communication failure
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#176 CDB: Write(10) 2a 00 00 1c bb 80 00 00 08 00
[Tue Oct  4 15:45:50 2022] blk_update_request: I/O error, dev sda, sector 1883008 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#177 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#177 Sense Key : Not Ready [current]
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#177 Add. Sense: Logical unit communication failure
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#177 CDB: Write(10) 2a 00 00 15 b0 a0 00 00 08 00
[Tue Oct  4 15:45:50 2022] blk_update_request: I/O error, dev sda, sector 1421472 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#178 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#178 Sense Key : Not Ready [current]
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#178 Add. Sense: Logical unit communication failure
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#178 CDB: Write(10) 2a 00 00 0c 4c d0 00 00 08 00
[Tue Oct  4 15:45:50 2022] blk_update_request: I/O error, dev sda, sector 806096 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#179 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#179 Sense Key : Not Ready [current]
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#179 Add. Sense: Logical unit communication failure
[Tue Oct  4 15:45:50 2022] sd 0:0:1:0: [sda] tag#179 CDB: Write(10) 2a 00 00 17 98 38 00 00 08 00

The same test happily runs for hours on x86.

Hypothesis

I think the issue is caused by us not reading the command ring head index with the correct memory barrier semantics. On the kernel side, in target_core_user.c, we can see that every time a new command is enqueued we first fill the entry and then update the ring buffer head index using the UPDATE_HEAD macro (https://elixir.bootlin.com/linux/v4.18/source/drivers/target/target_core_user.c#L1008). UPDATE_HEAD does an smp_store_release() to the head index. From my understanding, a store release barrier should be paired with a load acquire barrier. i.e. the read of the cmd head index in tcmulib_get_next_command should use acquire semantics. Running with the following patch causes the issue to go away, but I don't know if it's purely fortuitous or actually fixes the issue. The idea is to read the head index using an atomic load - which should (I think) be equivalent to a load acquire barrier:

diff --git a/libtcmu.c b/libtcmu.c
index 46a6fbc..ec36309 100644
--- a/libtcmu.c
+++ b/libtcmu.c
@@ -1109,7 +1109,9 @@ device_cmd_head(struct tcmu_device *dev)
 {
        struct tcmu_mailbox *mb = dev->map;

-       return (struct tcmu_cmd_entry *) ((char *) mb + mb->cmdr_off + mb->cmd_head);
+       uint32_t mb_head = __atomic_load_n(&mb->cmd_head, __ATOMIC_SEQ_CST);
+       struct tcmu_cmd_entry* e = (struct tcmu_cmd_entry *) ((char *) mb + mb->cmdr_off + mb_head);
+       return e;
 }

 static inline struct tcmu_cmd_entry *

The patch also appears to work with a less punitive __ATOMIC_ACQUIRE load.

I'd appreciate someone checking my working here as I'm not massively clued up on this area! BTW I found this document useful in describing the expected memory barrier semantics for ring buffers: https://www.kernel.org/doc/Documentation/core-api/circular-buffers.rst.

The text was updated successfully, but these errors were encountered:

Use the appropriate atomic load instruction to ensure we synchronize the read of the command ring head index with any previous stores. Not doing this causes command ring corruption on aarch64: [Tue Oct 4 15:45:50 2022] cmd_id 0 not found, ring is broken [Tue Oct 4 15:45:50 2022] ring broken, not handling completions See issue open-iscsi#688. PR open-iscsi#693. Signed-off-by: Xiubo Li <[email protected]>

suuyaoo · 2023-05-26T04:55:59Z

The macro TCMU_UPDATE_RB_TAIL to update cmd_tail should also use atomic store ?

l11990 · 2024-07-09T03:22:29Z

I think it is the problem of tcmulib_processing_start and tcmulib_processing_complete.
tcmulib_processing_start and during the process of all commands, there may be another tcmulib_processing_start. When the current tcmulib_processing_start is not finished, the next tcmulib_processing_start may come. When the current tcmulib_processing_start processes all commands and finish to call tcmulib_processing_complete, it may cause the kernel's ring buffer broken
Using reference count to solve this problem, I have fixed this issue in my job on aarch64

This was referenced Oct 4, 2022

Potential race condition in aio_command_finish? #687

Open

bugfix/dp 636 tcmulib get next command broken aarch64 ondat/tcmu-runner#1

Merged

geordieintheshellcode mentioned this issue Jan 18, 2023

Fix tcmulib get next command on aarch64 #693

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tcmulib_get_next_command is broken on aarch64 #688

tcmulib_get_next_command is broken on aarch64 #688

geordieintheshellcode commented Oct 4, 2022 •

edited

Loading

suuyaoo commented May 26, 2023

l11990 commented Jul 9, 2024

tcmulib_get_next_command is broken on aarch64 #688

tcmulib_get_next_command is broken on aarch64 #688

Comments

geordieintheshellcode commented Oct 4, 2022 • edited Loading

Repro steps

Hypothesis

suuyaoo commented May 26, 2023

l11990 commented Jul 9, 2024

geordieintheshellcode commented Oct 4, 2022 •

edited

Loading