Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86 MLIL code does not split variable for same register used with multiple widths #6364

Open
SlidyBat opened this issue Jan 26, 2025 · 0 comments
Labels
State: Awaiting Triage Issue is waiting for more in-depth triage from a developer

Comments

@SlidyBat
Copy link

SlidyBat commented Jan 26, 2025

Version and Platform (required):

  • Binary Ninja Version: 4.3.6738-dev Personal (1e4f6eac)
  • OS: macOS
  • OS Version: Sonoma 14.6.1
  • CPU Architecture: M1

Bug Description:
Seems that when an x86 register is used multiple times with different widths, it is not splitting into multiple variables in MLIL/HLIL. This results in confusing HLIL.

For this assembly:

00000000  488b4730           mov     rax, qword [rdi+0x30]
00000004  8a4002             mov     al, byte [rax+0x2]
00000007  8806               mov     byte [rsi], al
00000009  c3                 retn

This MLIL is generated by default:

00000000    void* sub_0(void* arg1, char* arg2)
   0 @ 00000000  result = [arg1 + 0x30].q
   1 @ 00000004  result.al = [result + 2].b
   2 @ 00000007  [arg2].b = result.al
   3 @ 00000009  return result

It is setting the type of result to void* and then reusing the same variable for the byte dereference.

Although this is semantically correct, it leads to very confusing HLIL:

00000000    void* sub_0(void* arg1, char* arg2)
00000004        void* result
00000004        result.b = *(*(arg1 + 0x30) + 2)
00000007        *arg2 = result.b
00000009        return result

From this code, it looks like result has been erroneously set to void* type even though it is only using it as a uint8_t.

Trying to "fix" this by changing result to a uint8_t makes things even worse (note the .b happens after 1st deref instead of 2nd):

00000000    uint8_t sub_0(void* arg1, uint8_t* arg2)
00000004        uint8_t result = *((*(arg1 + 0x30)).b + 2)
00000007        *arg2 = result
00000009        return result

Steps To Reproduce:
Please provide all steps required to reproduce the behavior:

  1. Create a new window and paste the following x86-64 assembly: 488b47308a40028806c3
  2. Make an x86-64 function and view MLIL. Observe reuse of result as both byte/pointer value.

Expected Behavior:

I would expect result to be split into 2 variables, one for the pointer value and one for the byte value.

It is possible to fix this manually by using the "Split Variable at Definition" feature, but this feels like something that binja should be able to handle automatically.

For comparison, here is how the IDA pseudo-code for the same assembly looks:

char __fastcall sub_0(__int64 a1, _BYTE *a2)
{
  char result; // al

  result = *(_BYTE *)(*(_QWORD *)(a1 + 48) + 2LL);
  *a2 = result;
  return result;
}

Screenshots/Video Recording:
Can record a video if needed, although hopefully explanation above is clear enough.

@xusheng6 xusheng6 added the State: Awaiting Triage Issue is waiting for more in-depth triage from a developer label Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
State: Awaiting Triage Issue is waiting for more in-depth triage from a developer
Projects
None yet
Development

No branches or pull requests

2 participants