Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acle-fmv-features.c fails on Fedora #109304

Closed
nikic opened this issue Sep 19, 2024 · 11 comments · Fixed by llvm/llvm-test-suite#167
Closed

acle-fmv-features.c fails on Fedora #109304

nikic opened this issue Sep 19, 2024 · 11 comments · Fixed by llvm/llvm-test-suite#167

Comments

@nikic
Copy link
Contributor

nikic commented Sep 19, 2024

This test from llvm-test-suite (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/UnitTests/AArch64/acle-fmv-features.c) fails on Fedora with the following output:


flagm
flagm2
dotprod
sha3
rdm
lse
sha2
sha1
aes
pmull
rcpc
rcpc2
fcma
jscvt
dpb
dpb2
bf16
i8mm
dit
fp16
ssbs2
	UPASS
bti
simd
fp
crc
sme
sme2

Note the UPASS on ssbs2. This indicates that the ssbs feature was not detected as available, but executing the assembly code for it did not crash. The assembly is (https://github.com/llvm/llvm-test-suite/blob/e77ce1bfc8dbe38b9bf5a3098cb954e58d125020/SingleSource/UnitTests/AArch64/acle-fmv-features.c#L189-L195):

    asm volatile (
        "mrs x0, SSBS" "\n"
        "msr SSBS, x0" "\n"
        : : : "x0"
    );

I'm not sure whether this indicates some kind of kernel bug where a trap is incorrectly suppressed, or an incorrect assumption in the test that this necessarily has to trap (rather than, say, being silently ignored).

Failure was observed on at least 6.10.10-100.fc39.aarch64 and 6.10.9-200.fc40.aarch64 kernels.

@llvmbot
Copy link
Member

llvmbot commented Sep 19, 2024

@llvm/issue-subscribers-backend-aarch64

Author: Nikita Popov (nikic)

This test from llvm-test-suite (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/UnitTests/AArch64/acle-fmv-features.c) fails on Fedora with the following output:

flagm
flagm2
dotprod
sha3
rdm
lse
sha2
sha1
aes
pmull
rcpc
rcpc2
fcma
jscvt
dpb
dpb2
bf16
i8mm
dit
fp16
ssbs2
	UPASS
bti
simd
fp
crc
sme
sme2

Note the UPASS on ssbs2. This indicates that the ssbs feature was not detected as available, but executing the assembly code for it did not crash. The assembly is (https://github.com/llvm/llvm-test-suite/blob/e77ce1bfc8dbe38b9bf5a3098cb954e58d125020/SingleSource/UnitTests/AArch64/acle-fmv-features.c#L189-L195):

    asm volatile (
        "mrs x0, SSBS" "\n"
        "msr SSBS, x0" "\n"
        : : : "x0"
    );

I'm not sure whether this indicates some kind of kernel bug where a trap is incorrectly suppressed, or an incorrect assumption in the test that this necessarily has to trap (rather than, say, being silently ignored).

@efriedma-quic
Copy link
Collaborator

From the manual: "This register is present only when FEAT_SSBS is implemented. Otherwise, direct accesses to SSBS are UNDEFINED." So there's probably something wrong with the feature detection (maybe a bug in the code that parses cpuinfo).

@labrinea
Copy link
Collaborator

The feature detection uses hwcaps (see

), which supposedly checks that ID_AA64PFR1_EL1.SSBS == 0b0010. It could be that the hardware it run on had ID_AA64PFR1_EL1.SSBS == 0b0001, where the register is still available.

@labrinea
Copy link
Collaborator

It's perhaps my bad because of this patch #95149 (FYI @Wilco1)

@labrinea
Copy link
Collaborator

labrinea commented Sep 20, 2024

Actually even if we detected ssbs the test would still fail I think? I am not sure. Perhaps we need a different test for ssbs2 and use the existing one for ssbs.

@labrinea labrinea self-assigned this Sep 20, 2024
@andrewcarlotti
Copy link

andrewcarlotti commented Sep 20, 2024

It could be that the hardware it run on had ID_AA64PFR1_EL1.SSBS == 0b0001, where the register is still available.

Accesses to the register with the MSR/MRS instructions are not possible without FEAT_SSBS2. The latest Arm ARM still says FEAT_SSBS, but the correction appears in the published list of known issues.

I suspect that the test failures might result from the kernel masking out support for the feature bit, without trapping the instructions. In particular, I notice that the feature is masked out when ARM64_WORKAROUND_SPECULATIVE_SSBS is set (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/cpufeature.c#n2314). This workaround affects Cortex-X4 and Neoverse-V3 cores.

@nikic
Copy link
Contributor Author

nikic commented Sep 20, 2024

I suspect that the test failures might result from the kernel masking out support for the feature bit, without trapping the instructions. In particular, I notice that the feature is masked out when ARM64_WORKAROUND_SPECULATIVE_SSBS is set (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/cpufeature.c#n2314). This workaround affects Cortex-X4 and Neoverse-V3 cores.

In the log I see "CPU features: detected: SSBS not fully self-synchronizing", so it looks like ARM64_WORKAROUND_SPECULATIVE_SSBS is indeed active here.

@labrinea
Copy link
Collaborator

labrinea commented Sep 20, 2024

Could the value of the PSTATE.SSBS bit read by mrs be of any use here?

@Wilco1
Copy link

Wilco1 commented Sep 20, 2024

Could the value of the PSTATE.SSBS bit read by mrs be of any use here?

I don't think that makes a difference. You could try ID_AA64PFR1_EL1.SSBS but the kernel might have set it back to 1 as well.

Either way, this proves you can't write these negative tests since the kernel can disable features. As this example shows, it doesn't mean underlying instructions will always trap - it just means the feature is turned off by the kernel. As long as we correctly detect that, the test should pass.

@nikic
Copy link
Contributor Author

nikic commented Sep 30, 2024

So is the conclusion here that we should just not check the UPASS case for the ssbs2 feature?

@labrinea
Copy link
Collaborator

Yes, something like that. I see value in keeping the negative tests for other features so perhaps create an exception for SSBS?

labrinea added a commit to labrinea/llvm-test-suite that referenced this issue Sep 30, 2024
When running try_ssbs2() on hardware which is affected by the "SSBS
not fully self-synchronizing" errata, the linux kernel mutes the
detection of ssbs2 via hardware caps. As a result the default version
ends up running the ssbs2 code which was expected to trap originally.

To work around this UPASS I have created a whitelist of functions
which are expected to return a different status result.

This fixes llvm/llvm-project#109304.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants