Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Add support for register-based IBF and EBF #29

Open
mbitsnbites opened this issue May 16, 2023 · 0 comments
Open

Add support for register-based IBF and EBF #29

mbitsnbites opened this issue May 16, 2023 · 0 comments

Comments

@mbitsnbites
Copy link
Member

In register-based IBF/EBF, the <offset:width> arguments are packed in a register as follows:

Bits Meaning
0..4 offset
8..12 width

Use an IBF (or PACK.H?) instruction to pack the arguments into a single register before issuing the intended instruction (i.e. only two instructions in total).

See define_insn "*ibfsi3" and friends in mrisc32.md for the current immediate-based versions.

Example C++ code: https://godbolt.org/z/6GWzq8hxa

mbitsnbites pushed a commit that referenced this issue Jun 20, 2023
Hi all,

We noticed that calls to the vadcq and vsbcq intrinsics, both of
which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in
the FPSCR, would produce the following code:

```
< r2 is the *carry input >
vmrs	r3, FPSCR_nzcvqc
bic	r3, r3, #536870912
orr	r3, r3, r2, lsl #29
vmsr	FPSCR_nzcvqc, r3
```

when the MVE ACLE instead gives a different instruction sequence of:
```
< Rt is the *carry input >
VMRS Rs,FPSCR_nzcvqc
BFI Rs,Rt,#29,#1
VMSR FPSCR_nzcvqc,Rs
```

the bic + orr pair is slower and it's also wrong, because, if the
*carry input is greater than 1, then we risk overwriting the top two
bits of the FPSCR register (the N and Z flags).

This turned out to be a problem in the header file and the solution was
to simply add a `& 1x0u` to the `*carry` input: then the compiler knows
that we only care about the lowest bit and can optimise to a BFI.

Ok for trunk?

Thanks,
Stam Markianos-Wright

gcc/ChangeLog:

	* config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic.
	(__arm_vadcq_u32): Likewise.
	(__arm_vadcq_m_s32): Likewise.
	(__arm_vadcq_m_u32): Likewise.
	(__arm_vsbcq_s32): Likewise.
	(__arm_vsbcq_u32): Likewise.
	(__arm_vsbcq_m_s32): Likewise.
	(__arm_vsbcq_m_u32): Likewise.
	* config/arm/mve.md (get_fpscr_nzcvqc): Make unspec_volatile.

gcc/testsuite/ChangeLog:
	* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: New.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant