Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify sign-extension of immediates during instruction decoding #43

Open
michael-swan opened this issue Jun 3, 2024 · 3 comments
Open

Comments

@michael-swan
Copy link

michael-swan commented Jun 3, 2024

Your code contains several parts that looks like:

uint32_t imm = ir >> 20;
int32_t imm_se = imm | (( imm & 0x800 )?0xfffff000:0);

or

uint32_t immm4 = ((ir & 0xf00)>>7) | ((ir & 0x7e000000)>>20) | ((ir & 0x80) << 4) | ((ir >> 31)<<12);
if( immm4 & 0x1000 ) immm4 |= 0xffffe000;

but this could instead be:

int32_t imm = (int32_t)ir >> 20;

and

uint32_t immm4 = ((ir & 0xf00)>>7) | ((ir & 0x7e000000)>>20) | ((ir & 0x80) << 4) | (uint32_t)((int32_t)ir >> 19);

respectively.

This generates simpler code by omitting branches and replacing shr with sar. The compiler cannot reason as well as you might think in this situation, it does indeed introduce branches in both cases in the extent code. This change should behave exactly the same but be simpler code and possibly a little bit faster. Someone should go through all such sign-extension logic on instruction decoding operations and apply such fixes.

@cnlohr
Copy link
Owner

cnlohr commented Jun 3, 2024

Woah... I never thought about that! That's very cool. I won't be able to get to this in the next day or so - I will check into it soon after.

@cnlohr
Copy link
Owner

cnlohr commented Jun 4, 2024

For completeness sake, this is the correct phrasing for the second one:

uint32_t immm4 = ((ir & 0xf00)>>7) | ((ir & 0x7e000000)>>20) | ((ir & 0x80) << 4) | ((uint32_t)((int32_t)ir >> 19) & 0xfffff000);

I am at a bit of an impass. I like the code the way you put it better, but doing optimizations takes the code from 15036 compiled to the same size for the second optimization (And appears to be the same code)... and if I do the first optimization, the code 15148 bytes!! And diffing the .lst's, I can't even figure out why. The compiler just like loses its mind when it goes through a signed typecast.

The first seems like it should be such an obvious win, but the compiler's fighting me and it's got hands.

@cnlohr
Copy link
Owner

cnlohr commented Jun 4, 2024

#44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants