Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diaphora 3.2.1 is much slower than diaphora 2.1.0 when exporting large binaries #305

Open
ddf8196 opened this issue Jun 14, 2024 · 5 comments

Comments

@ddf8196
Copy link

ddf8196 commented Jun 14, 2024

I'm trying to export a large binary with about 180,000 functions using the latest diaphora, and I've noticed that the export time has become very long compared to diaphora 2.1.0.
And I noticed that the export speed seems to slow down as more functions are exported, so I did the following test:

  1. Set DIAPHORA_PROFILE=1 to enable profiling.

  2. Set the export range to 0x140001000-0x140800000 to export only the first 30,000 functions. This took 32 minutes.
    Log: 2-0x140001000-0x140800000.log

  3. set the export range to 0x140800000-0x141000000 to export about 25000 functions afterwards. This took 39 minutes.
    Log: 3-0x140800000-0x141000000.log

  4. Set the export range to 0x140001000-0x141000000 to export the first 55,000 functions. This took over two hours, almost twice as slow as exporting the two parts separately.
    Log: 4-0x140001000-0x141000000.log

  5. For comparison, using diaphora 2.1.0 to export the first 55000 functions, this took only 15 minutes, faster than either of the above.
    Log: 5-0x140001000-0x141000000-diaphora2.1.0.log

The above tests basically confirms that diaphora 3.2.1 export speed slows down as the number of exported functions increases, and is significantly slower than diaphora 2.1.0.

@ddf8196
Copy link
Author

ddf8196 commented Jun 14, 2024

By looking at the profiler logs, I found that the main change in diaphora 3.2.1 compared to diaphora 2.1.0 is that sqlite3.Cursor.execute is taking a lot of time, which may be the cause of this performance issue.
By looking at the cumtime column in the log, I found that it seems to be the cur.execute in the get_bb_id function that is causing the problem, this function is taking a high amount of time compared to 2.1.0, and with the increase in the number of exported functions, the time per call to this function is getting longer.
diaphora 3.2.1 (first 30000 functions):
image
diaphora 3.2.1 (25000 functions):
image
diaphora 3.2.1 (first 55000 functions):
image
diaphora 2.1.0 (first 55000 functions):
image

Looking into this function, there is only one sql querying data from basic_blocks table.
image

Comparing the basic_blocks table in diaphora 3.2.1 with the one in diaphora 2.1.0, there is a new asm_type column and the address column no longer has the unique constraint. Could this be the cause of the problem? If so, is there a way to optimize it? Such a long export time ( more than 24 hours) almost makes the latest diaphora no longer usable for me.

@joxeankoret
Copy link
Owner

Uhm... it sounds weird. Let me take a look to it. If you can share your binaries it would be cool, but I guess I can work on this issue without them anyway. Thanks for letting me know!

@ddf8196
Copy link
Author

ddf8196 commented Jun 14, 2024

The binary file is bedrock dedicated server for windows and can be downloaded from this link: https://minecraft.azureedge.net/bin-win/bedrock-server-1.20.81.01.zip
Btw, I'm using IDA 8.3 with Python 3.11.6 on windows.

@Programatic
Copy link

Programatic commented Oct 3, 2024

Hey, @joxeankoret have you by any chance had a time to look at it? If not, I will probably take a stab at it because I am exporting a go binary that has 75k functions and it is taking hours to do

EDIT: I ran it with a version that added an index on that column that was being selected and it took about the same amount of time, but doubled the size of the resulting sqlite file.

@joxeankoret
Copy link
Owner

I'm sorry, I haven't had enough time. Usually, indices work like this: they will, probably, reduce query times (significantly or not depends on the queries by themselves) but greatly increase database size because, naturally, the index needs storage too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants