Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line map between output and input text #29

Open
gratianlup opened this issue Sep 29, 2024 · 1 comment
Open

Line map between output and input text #29

gratianlup opened this issue Sep 29, 2024 · 1 comment

Comments

@gratianlup
Copy link

Hi,

Would it be possible to also compute the mapping between the LLM output and the input from Ghidra decompiler as a line map? Something like LLM_OUT_LINES[line_number] = {one or more line numbers from the Ghidra input}.

In your Colab example, the output line:
if (fabs(a[i] - a[j]) < eps)

would be mapped to the 3 input lines:

if ((float)(DAT_001020d0 &
                 (uint)(*(float *)(param_2 + (long)local_10 * 4) -
                       *(float *)(param_2 + (long)local_c * 4))) < param_1) {

I'm not sure if something like this can be done with LLMs at all. If doable though, then this project would be really useful for tools like profilers, where one could mark the source lines where most time is spent by mapping assembly instructions to lines with the help of debug info.

@albertan017
Copy link
Owner

Aligning the input and output of a large language model isn't achievable unless we tailor the training process (similar to how objdump -d -S pairs one line of source code with a few lines of assembly). We plan to explore this line-by-line training approach (asm-src, not ghidra) in future updates for a more versatile chat model, which might take a few months to develop, but we hope it will be beneficial.

We've also observed that a group of smart researchers have done some work which may help your situation; you might want to explore their models.

https://arxiv.org/pdf/2406.17233

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants