microblaze: Fix -Os right shift optimization is allowed into delay slot #37

alpsayin · 2024-10-18T16:53:54Z

During picolibc testing, it's found that -Os produces assembly code that compiler
squeezes into a single delay slot. Thus, only the first instruction emitted by this optimization is run and the rest is skipped.
Optimization is generated by

  [(set (match_operand:SI 0 "register_operand" "=&d")
       (ashiftrt:SI (match_operand:SI 1 "register_operand"  "d")
                   (match_operand:SI 2 "immediate_operand" "I")))]
  "(INTVAL (operands[2]) > 5 && optimize_size)"
  {
    operands[3] = gen_rtx_REG (SImode, MB_ABI_ASM_TEMP_REGNUM);

    output_asm_insn ("ori\t%3,r0,%2", operands);
    if (REGNO (operands[0]) != REGNO (operands[1]))
        output_asm_insn ("addk\t%0,%1,r0", operands);

    output_asm_insn ("addik\t%3,%3,-1", operands);
    output_asm_insn ("bneid\t%3,.-4", operands);
    return "sra\t%0,%0";
  }
  [(set_attr "type"    "arith")
  (set_attr "mode"    "SI")
  (set_attr "length"  "20")]

But arith type is NOT disallowed from going into delay slot (see below):

gcc/gcc/config/microblaze/microblaze.md

Line 466 in 428d8d7

    
           [(and (eq_attr "type" "!branch,call,jump,icmp,multi,no_delay_arith,no_delay_load,no_delay_store,no_delay_imul,no_delay_move,darith")

"Optimized" code is between [191b8-191c8]

   191a8:	bc830194 	bgti	r3, 404		// 1933c
    if (subnormal_y) { /* subnormal y */
   191ac:	b0007ff0 	imm	32752
   191b0:	a47c0000 	andi	r3, r28, 0
   191b4:	be2301ec 	bneid	r3, 492		// 193a0
   191b8:	a2400014 	ori	r18, r0, 20
   191bc:	131e0000 	addk	r24, r30, r0
   191c0:	3252ffff 	addik	r18, r18, -1
   191c4:	be32fffc 	bneid	r18, -4		// 191c0
   191c8:	93180001 	sra	r24, r24
...
        iy = (hy >> 20) - 1023;
   193a0:	b810fe40 	brid	-448		// 191e0
   193a4:	3318fc01 	addik	r24, r24, -1023

where operands are:

operands[0] = r24
operands[1] = r29
operands[2] = 20
operands[3] = r18

As a result, this code returns a iy (r24) value of (whatever was in r24) - 1023`

The fix is simple. I've redeclared this size-optimization as multi which is

Not delay-slot allowed
Also the same type for other shift optimizations (they're left shift optimizations)
gcc/gcc/config/microblaze/microblaze.md

Line 2070 in 428d8d7

[(set_attr "type" "multi")
gcc/gcc/config/microblaze/microblaze.md

Line 2483 in 428d8d7

[(set_attr "type" "multi")

For context, non-size-optimized code is N copies of:

sra	r24, r24 /* arithmetic 1-bit shift r24 and put result into r24 */

And as per the above optimization rule, if N <= 5 we'll still get 5 copies of sra instruction.

Currently under test via zephyrproject-rtos/sdk-ng#647

alpsayin · 2024-10-19T14:26:34Z

Fix seems to have worked, new disassembly as below:

previously	now
if (subnormal_y) { /* subnormal y */	if (subnormal_y) { /* subnormal y */
191ac: b0007ff0 imm 32752	19198: b0007ff0 imm 32752
191b0: a47c0000 andi r3, r28, 0	1919c: a47c0000 andi r3, r28, 0
191b4: be2301ec bneid r3, 492 // 193a0	191a0: bc2301d8 bnei r3, 472 // 19378
/* jump to `else` but also somehow loop 20x single bit right shift */	/* jump to `else` no delay slot */
191b8: a2400014 ori r18, r0, 20
191bc: 131e0000 addk r24, r30, r0
191c0: 3252ffff addik r18, r18, -1
191c4: be32fffc bneid r18, -4 // 191c0
191c8: 93180001 sra r24, r24

After the jump

previously	now
iy = (hy >> 20) - 1023;	iy = (hy >> 20) - 1023;
/jump to whatever but with delay-slot /	/* loop 20: single bit right shift */
193a0: b810fe40 brid -448 // 191e0	19378: a2400014 ori r18, r0, 20
193a4: 3318fc01 addik r24, r24, -1023	1937c: 131e0000 addk r24, r30, r0
/* -1023 executed and branched */	19380: 3252ffff addik r18, r18, -1
	19384: be32fffc bneid r18, -4 // 19380
	19388: 93180001 sra r24, r24
	1938c: b810fe2c brid -468 // 191b8
	19390: 3318fc01 addik r24, r24, -1023

CC @keith-packard you can grab a copy of the toolchain from https://github.com/zephyrproject-rtos/sdk-ng/actions/runs/11407925329?pr=647 if it's of any interest.

p.s. new disassembly again but monospace cuz it hurts my eyes:

   19194:	bc830180 	bgti	r3, 384		// 19314
    if (subnormal_y) { /* subnormal y */
   19198:	b0007ff0 	imm	32752
   1919c:	a47c0000 	andi	r3, r28, 0
   191a0:	bc2301d8 	bnei	r3, 472		// 19378
...
        iy = (hy >> 20) - 1023;
   19378:	a2400014 	ori	r18, r0, 20
   1937c:	131e0000 	addk	r24, r30, r0
   19380:	3252ffff 	addik	r18, r18, -1
   19384:	be32fffc 	bneid	r18, -4		// 19380
   19388:	93180001 	sra	r24, r24
   1938c:	b810fe2c 	brid	-468		// 191b8
   19390:	3318fc01 	addik	r24, r24, -1023

compared to:

   191a8:	bc830194 	bgti	r3, 404		// 1933c
    if (subnormal_y) { /* subnormal y */
   191ac:	b0007ff0 	imm	32752
   191b0:	a47c0000 	andi	r3, r28, 0
   191b4:	be2301ec 	bneid	r3, 492		// 193a0
   191b8:	a2400014 	ori	r18, r0, 20
   191bc:	131e0000 	addk	r24, r30, r0
   191c0:	3252ffff 	addik	r18, r18, -1
   191c4:	be32fffc 	bneid	r18, -4		// 191c0
   191c8:	93180001 	sra	r24, r24
...
        iy = (hy >> 20) - 1023;
   193a0:	b810fe40 	brid	-448		// 191e0
   193a4:	3318fc01 	addik	r24, r24, -1023

During picolibc testing, it's found that `-Os` produces assembly code that compiler squeezes into a single delay slot. Thus, only the first instruction (the one in delay slot) emitted by this optimization is executed and the rest is skipped. Signed-off-by: Alp Sayin <[email protected]>

This was referenced Oct 18, 2024

microblaze: pull in gcc/bintuils/gdb patches from meta-xilinx zephyrproject-rtos/sdk-ng#647

Open

MicroBlaze Port zephyrproject-rtos/zephyr#53576

Draft

alpsayin force-pushed the zephyr-gcc-12.2.0-bad-Os-shift-optimisation branch from 39eb6ac to 0e3007a Compare October 18, 2024 17:06

alpsayin marked this pull request as ready for review October 19, 2024 14:26

alpsayin force-pushed the zephyr-gcc-12.2.0-bad-Os-shift-optimisation branch from 0e3007a to 815ae62 Compare October 19, 2024 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

microblaze: Fix -Os right shift optimization is allowed into delay slot #37

microblaze: Fix -Os right shift optimization is allowed into delay slot #37

alpsayin commented Oct 18, 2024 •

edited

Loading

alpsayin commented Oct 19, 2024 •

edited

Loading

microblaze: Fix -Os right shift optimization is allowed into delay slot #37

Are you sure you want to change the base?

microblaze: Fix -Os right shift optimization is allowed into delay slot #37

Conversation

alpsayin commented Oct 18, 2024 • edited Loading

alpsayin commented Oct 19, 2024 • edited Loading

alpsayin commented Oct 18, 2024 •

edited

Loading

alpsayin commented Oct 19, 2024 •

edited

Loading