vmmplatsupport, vpci: Fix unaligned writes #123

elmankku · 2024-01-17T05:05:48Z

Data of unaligned configuration space writes were masked out because of incorrect mask.

lsf37 · 2024-01-27T23:53:21Z

libsel4vmmplatsupport/src/arch/arm/devices/vpci.c

-    mask = get_vcpu_fault_data_mask(vcpu);
+    seL4_Word s = (get_vcpu_fault_address(vcpu) & 0x3) * 8;
+    mask = get_vcpu_fault_data_mask(vcpu) >> s;
    value = get_vcpu_fault_data(vcpu) & mask;


Should this be corrected inside the function get_vcpu_fault_data_mask? Otherwise other occurrences will still get the same wrong mask.

Could you explain a bit more why the new mask is correct? It appears to take the bottom two bits of the fault address, shifting these left by 3 and then uses that value to shift right the mask. Why does that work as a mask? (It appears to make the previous mask smaller, masking out more data, but I'm probably missing something)

Could you explain a bit more why the new mask is correct?

Sure.

As a background, this was discovered while I was debugging why MSI(-X) didn't work with the vPCI. Turns out unaligned writes are fairly uncommon (aside from MSI(-X), only one on a regular device - writing 0 there apparently was not critical). MSI-X message control register is 16bit register at offset 2 of the MSI-X capability (talking about PCI capabilities here :)):

https://github.com/torvalds/linux/blob/41bccc98fb7931d63d03f326a746ac4d429c1dd3/include/uapi/linux/pci_regs.h#L324

The vPCI was incorrectly writing 0 to the virtual device, and that caused guest hang when attempting to use MSIs.

According to fault.h documentation:

Retrieve a mask for the data within the aligned word that the fault is attempting to access

Indeed, it is shifting the mask to the aligned word:

seL4_projects_libs/libsel4vm/src/arch/arm/fault.c

Line 494 in 4662171

mask <<= (addr & 0x3) * 8;

For 8bit and 16bit sized accesses we get following masks:

8 bit 0xC0000000 => 0xFF << (0xC0000000 & 0x3) * 8 == 0x000000FF 0xC0000001 => 0xFF << (0xC0000001 & 0x3) * 8 == 0x0000FF00 0xC0000002 => 0xFF << (0xC0000002 & 0x3) * 8 == 0x00FF0000 0xC0000003 => 0xFF << (0xC0000003 & 0x3) * 8 == 0xFF000000 16 bit 0xC0000000 => 0xFFFF << (0xC0000000 & 0x3) * 8 == 0x0000FFFF 0xC0000002 => 0xFFFF << (0xC0000002 & 0x3) * 8 == 0xFFFF0000

However for the writes the the fault data seems not to be shifted like the mask is. Under the hood, it seems to directly get the fault data from the vcpu registers. I think the confusing part is that this is where fault data handling for write and read faults differ: when setting the fault data to the read, it must be shifted to properly handle the alignment while it is not shifted with the writes.

I traced unaligned reads and writes to the PCI to clarify:

... bootlog .. vpci: read: value read from device 0x0000c001 vpci: read: fault addr 0xe001009a vpci: read: fault data 0xc0010000 // see how the data read from the device must is shifted for the fault vpci: read: fault mask 0xffff0000 [ 6.664614] kernel: read msi-x control: 0xc001 [ 6.706475] kernel: write msi-x control: 0x8001 vpci: write: fault addr 0xe001009a vpci: write: fault data 0x00008001 // this is the fault data vpci: write: fault mask 0xffff0000 // this is the fault mask from get_vcpu_fault_mask vpci: write: mask shifted 0x0000ffff // this is with the fix applied vpci: write: value 0x00008001 // this is the data that is written to the device. Would be 0x00000000 without the fix which is incorrect ... bootlog ...

Within the same file, fault.c, seL4_Word fault_emulate(fault_t *f, seL4_Word o) handles the mask like here in this patch:

seL4_projects_libs/libsel4vm/src/arch/arm/fault.c

Line 439 in 4662171

s = (f->addr & 0x3) * 8;

However, we can see that the read/write difference is not handled f.ex. here:

seL4_projects_libs/libsel4vm/src/sel4_arch/aarch64/fault.c

Line 136 in 4662171

data = fault_get_data(fault) & fault_get_data_mask(fault);

and here:

seL4_projects_libs/libsel4vm/src/sel4_arch/arm_hyp/fault.c

Line 162 in 4662171

data = fault_get_data(fault) & fault_get_data_mask(fault);

Should this be corrected inside the function get_vcpu_fault_data_mask?

As such, no, that would break the reads. However the API in its current form is very prone to produce errors. I think there should be convenience functions to get/set fault data without having to deal with the alignment.

lsf37 · 2024-01-27T23:55:57Z

Do you have a test case that shows the error and that we can use to make sure the fix works and the error doesn't reappear down the line?

elmankku · 2024-01-29T14:01:45Z

Do you have a test case that shows the error and that we can use to make sure the fix works and the error doesn't reappear down the line?

Unfortunately no, not at the moment. Can't think any trivial way to test this.

Data of unaligned configuration space writes were masked out because of incorrect mask. Signed-off-by: Markku Ahvenjärvi <[email protected]>

lsf37 reviewed Jan 27, 2024

View reviewed changes

vmmplatsupport, vpci: Fix unaligned writes

d050fb3

Data of unaligned configuration space writes were masked out because of incorrect mask. Signed-off-by: Markku Ahvenjärvi <[email protected]>

elmankku force-pushed the upstream/pci-config-unaligned-writes branch from ec97e18 to d050fb3 Compare February 23, 2024 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vmmplatsupport, vpci: Fix unaligned writes #123

vmmplatsupport, vpci: Fix unaligned writes #123

elmankku commented Jan 17, 2024

lsf37 Jan 27, 2024 •

edited

Loading

elmankku Jan 29, 2024

lsf37 commented Jan 27, 2024

elmankku commented Jan 29, 2024

vmmplatsupport, vpci: Fix unaligned writes #123

Are you sure you want to change the base?

vmmplatsupport, vpci: Fix unaligned writes #123

Conversation

elmankku commented Jan 17, 2024

lsf37 Jan 27, 2024 • edited Loading

Choose a reason for hiding this comment

elmankku Jan 29, 2024

Choose a reason for hiding this comment

lsf37 commented Jan 27, 2024

elmankku commented Jan 29, 2024

lsf37 Jan 27, 2024 •

edited

Loading