Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type issue in function.instructions generator #6386

Open
sagittarius-a opened this issue Feb 3, 2025 · 2 comments
Open

Type issue in function.instructions generator #6386

sagittarius-a opened this issue Feb 3, 2025 · 2 comments
Labels
Component: API Issue needs changes to the API Effort: Trivial Issue should take < 1 day Impact: Low Issue is a papercut or has a good, supported workaround

Comments

@sagittarius-a
Copy link

Version and Platform (required):

  • Binary Ninja Version: 4.3.6766-dev (c559a23d)
  • OS: macOS
  • OS Version: Sequoia 15.1.1
  • CPU Architecture: M1

Bug Description:

The function.instructions property generator is not properly typed.

It seems that the second item of the tuple have been previously the address of the instruction, as stated in the documentation. According to my tests, it is now the size of the instruction.


Steps To Reproduce:

  1. Open /bin/ls (x86_64 or aarch64, it does not matter).
  2. Wait for the analysis to complete
  3. Type the following code in the Python console:
function = bv.get_function_at(here)

for block in function:
    for instruction in block:
        print(instruction)
  1. Observe the output:
(['push', '    ', 'rbp'], 1)
(['mov', '     ', 'rbp', ', ', 'rsp'], 3)
(['push', '    ', 'r15'], 2)
(['push', '    ', 'r14'], 2)
...

Here is the expected behavior, according to the documentation (addresses are shown as hex here for convenience):

(['push', '    ', 'rbp'], 0x100000b7c)
(['mov', '     ', 'rbp', ', ', 'rsp'], 0x100000b7d)
(['push', '    ', 'r15'], 0x100000b80)
(['push', '    ', 'r14'], 0x100000b82)
...

Note

IMHO, having both the size of the instruction and its address when iterating would be handy.

@xusheng6
Copy link
Member

xusheng6 commented Feb 4, 2025

To start with, function.instruction actually return a tuple of tokens and instruction address, as can be seen below:

>>> list(current_function.instructions)
[(['endbr64', ' '], 4519), (['push', '    ', 'rbp'], 4523), (['mov', '     ', 'rbp', ', ', 'rsp'], 4524), (['sub', '     ', 'rsp', ', ', '0x30'], 4527), (['mov', '     ', '', 'dword ', '[', 'rbp', '-', '0x24', '', ']', ', ', 'edi'], 4531), (['mov', '     ', '', 'qword ', '[', 'rbp', '-', '0x30', '', ']', ', ', 'rsi'], 4534), (['mov', '     ', 'edi', ', ', '0x64'], 4538), (['call', '    ', '0x1070'], 4543), (['mov', '     ', '', 'qword ', '[', 'rbp', '-', '0x18', '', ']', ', ', 'rax'], 4548), (['mov', '     ', 'edi', ', ', '0x12c'], 4552), (['call', '    ', '0x1070'], 4557), (['mov', '     ', '', 'qword ', '[', 'rbp', '-', '0x10', '', ']', ', ', 'rax'], 4562), (['mov', '     ', 'edi', ', ', '0x190'], 4566), (['call', '    ', '0x1070'], 4571), (['mov', '     ', '', 'qword ', '[', 'rbp', '-', '0x8', '', ']', ', ', 'rax'], 4576), (['mov', '     ', 'eax', ', ', '0x0'], 4580), (['leave', '   '], 4585), (['retn', '    '], 4586)]

The code snippet you showed is actually NOT iterating over function.instruction -- it first gets each basic block and then iterates over the instructions in it. There is a subtle difference though, that the iter of BasicBlock is returning the tokens along with the length of the instruction:

def __iter__(self) -> Generator[Tuple[List['_function.InstructionTextToken'], int], None, None]:
. And this agrees with what you are seeing.

I am not entirely sure whether there is a problem, and if so, what is the correct fix here, we could fix the iter of the BasicBlock to also return the address of the instruction, or maybe do something as you suggested, e.g., providing both the length and address of the instruction. I will bring this up with the team later

@xusheng6 xusheng6 added Component: API Issue needs changes to the API Impact: Low Issue is a papercut or has a good, supported workaround Effort: Trivial Issue should take < 1 day labels Feb 4, 2025
@sagittarius-a
Copy link
Author

Thank you Xusheng for your answer.

Indeed I was not iterating over the function so the issue I described is invalid.
I guess it was a bit too late in the night on this side of the planet.

However, I still wish I had both the address and the size of the instruction 😇

I let you close the issue if need be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: API Issue needs changes to the API Effort: Trivial Issue should take < 1 day Impact: Low Issue is a papercut or has a good, supported workaround
Projects
None yet
Development

No branches or pull requests

2 participants