Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLIL SSA Lifting error when an intrinsic returns two values which both get used later #6265

Open
SlidyBat opened this issue Dec 20, 2024 · 3 comments
Assignees
Labels
Component: Core Issue needs changes to the core Effort: Low Issue should take < 1 week Impact: Medium Issue is impactful with a bad, or no, workaround Type: Bug Issue is a non-crashing bug with repro steps
Milestone

Comments

@SlidyBat
Copy link

Version and Platform (required):

  • Binary Ninja Version: 4.3.6594-dev Personal (c7f7c5e4)
  • OS: macOS Sonoma
  • OS Version: 14.6.1
  • CPU Architecture: M1

Bug Description:
For x86/x64 binaries, the rdtsc instruction isn't lifted very nicely, and produces hard to read HLIL.

For this assembly:

00000000  0f31               rdtsc   
00000002  89c7               mov     edi, eax
00000004  48c1e220           shl     rdx, 0x20
00000008  4809d7             or      rdi, rdx
0000000b  4889f8             mov     rax, rdi
0000000e  c3                 retn     {__return_addr}

This HLIL is produced:

00000000        int0_t tsc
00000000        int32_t temp0
00000000        temp0, temp0 = _rdtsc(tsc)
0000000e        int32_t temp1
0000000e        return zx.q(temp0) | zx.q(temp1) << 0x20

I would say there are 3 issues with this code:

  1. Unnecessary tsc variable/argument, adds a line that just clutters output. This issue is already reported in RDTSC intrinsic takes TSC register as an argument #4032.
  2. temp0/temp1 are meant to represent eax/edx, but output makes it seem like only temp0 is assigned to by the rdtsc call and temp1 is just some uninitialized variable. Should probably be temp0, temp1 = _rdtsc()
  3. The output is overly verbose, it would be nicer if the output of _rdtsc was just a single 64-bit variable and the HLIL was simply return _rdtsc()

For comparison, here is the IDA pseudo-code for the same function:

unsigned __int64 sub_0()
{
  return __rdtsc();
}

Steps To Reproduce:

  1. Create new empty window in Binary Ninja using Ctrl+N
  2. Paste following assembly from raw hex: 0f3189c748c1e2204809d74889f8c3
  3. Create x64 function
  4. Observe HLIL

Expected Behavior:
Ideally, the HLIL would just be return _rdtsc().

@xusheng6
Copy link
Member

I am repurposing this issue specifically to track the analysis issue in the second bulletin. 1) already has its own issue, and 3) would not be possible before we have the infrastructure for #3349. I have also added a comment there suggesting the case of rdtsc in addition to functions like strcmp

@xusheng6 xusheng6 added this to the Gallifrey milestone Dec 23, 2024
@xusheng6 xusheng6 changed the title Improve lifting of RDTSC on x86 Lifting error when an intrinsic returns two values and both get used later Dec 23, 2024
@xusheng6
Copy link
Member

Looking at the LLIL and LLIL (SSA), it seems that we somehow screw it up when we do the LLIL SSA:

Screenshot 2024-12-23 at 12 46 57 PM

MLIL for completeness:

Screenshot 2024-12-23 at 12 47 51 PM

@xusheng6 xusheng6 changed the title Lifting error when an intrinsic returns two values and both get used later LLIL SSA Lifting error when an intrinsic returns two values and both get used later Dec 23, 2024
@xusheng6 xusheng6 changed the title LLIL SSA Lifting error when an intrinsic returns two values and both get used later LLIL SSA Lifting error when an intrinsic returns two values which both get used later Dec 23, 2024
@xusheng6 xusheng6 added Type: Bug Issue is a non-crashing bug with repro steps Component: Core Issue needs changes to the core Effort: Low Issue should take < 1 week Impact: Medium Issue is impactful with a bad, or no, workaround labels Dec 24, 2024
@xusheng6 xusheng6 self-assigned this Dec 25, 2024
@xusheng6
Copy link
Member

xusheng6 commented Dec 25, 2024

I made a local fix that produces the following output:

sub_0:
   0 @ 00000000  int0_t tsc
   1 @ 00000000  int32_t temp0
   2 @ 00000000  int32_t temp1
   3 @ 00000000  temp0, temp1 = _rdtsc(tsc)
   4 @ 0000000e  return zx.q(temp0) | zx.q(temp1) << 0x20

While this is semantically correct and it serves as an acceptable fix, I am not sure the usage of the temporary variables is the best. Currently, for the rdtsc instruction, we generate the following LLIL(SSA):

   0 @ 00000000  temp0#1, temp1#1 = _rdtsc(tsc#0)
   1 @ 00000000  rax#1 = zx.q(temp0#1.d)
   2 @ 00000000  rdx#1 = zx.q(temp1#1.d)

I am wondering if it can be changed to something like:

   0 @ 00000000  rax#1.eax, rdx#1.edx = _rdtsc(tsc#0)
   1 @ 00000000  rax#2 = zx.q(rax#1.eax)
   2 @ 00000000  rdx#2 = zx.q(rdx#1.edx)

Unfortunately, the no. 1 and 2 instruction are necessary, otherwise, the semantic of the lifting is wrong (it only writes the low bits of rax, but do not clear the high bits)

The benefit is the output would no longer involve the usage of temporary variables, they would use rax and rdx instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Core Issue needs changes to the core Effort: Low Issue should take < 1 week Impact: Medium Issue is impactful with a bad, or no, workaround Type: Bug Issue is a non-crashing bug with repro steps
Projects
None yet
Development

No branches or pull requests

2 participants