-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to Obtain O0-O3 Assembly Code from ExeBench Dataset? #37
Comments
For compilable data, you may follow the compilation script for AnghaBench, with small modification on handling the source of function (exebench_data['func_def']) and its dependency (exebench_data['synth_deps']). For executable data, it's quite complicated. You may refer to the exebench github, or the discussion in other issue. You can modify the _DefaultAssembler in exebench/init.py. |
So, to reproduce the ExeBench re-excutability experiment, I understand that the steps are as follows:
Is my understanding correct? |
Yes, in theory, it should be effective. However, we encounter difficulties in generating the appropriate assembly for execution. As a result, we adjust the input to the Wrapper and alter the optimization stage in _DefaultAssembler to compile with various optimizations. |
Thanks! I'll give it a try. |
Hi Optimization O0: Run Rate: 0.1417
Optimization O1: Run Rate: 0.1288
Optimization O2: Run Rate: 0.1280
Optimization O3: Run Rate: 0.1277
c_source_code = (
row["synth_deps"]
+ "\n"
+ row["synth_io_pairs"]["dummy_funcs"][0]
+ "\n"
+ row["func_def"]
)
subprocess.run(
["gcc", "-c", "-o", obj_output, input_file_name, "-" + opt_state],
check=True,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
# Generate assembly code from object file using objdump
subprocess.run(
f"objdump -d --disassemble={function_name} {obj_output} > {asm_output}",
shell=True,
check=True,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
gen_results = llm.generate(inputs, sampling_params)
gen_results = [[output.outputs[0].text] for output in gen_results]
if args.debug:
print(f"gen_results: \n{gen_results}")
gen_results_repeat.append(gen_results)
synth_wrapper = Wrapper(
c_deps=dataset_row["synth_deps"]
+ "\n"
+ dataset_row["synth_io_pairs"]["dummy_funcs"][0]
+ "\n"
+ decompiled_c_func,
func_c_signature=dataset_row["func_head_types"].replace("extern", ""),
func_assembly=None,
cpp_wrapper=dataset_row["synth_exe_wrapper"],
)
# Check if the decompiled function can be compiled and run correctly
test_output = synth_wrapper(
exebench_dict_to_dict(dataset_row["synth_io_pairs"]["input"][0])
)
if diff_io(
test_output,
exebench_dict_to_dict(dataset_row["synth_io_pairs"]["output"][0]),
):
flag_run = 1 Can you tell me a little more about the changes you made to |
As highlighted in our paper, we initially eliminate functions that cannot be executed by testing the executability of the original function (i.e., use the dataset_row['func_def'], not the 'decompiled_c_func' in step 4, consider functions that can be executed under O0-O3). We find that approximately 2,500 functions can be compiled and executed within our environment. This number can fluctuate between 2,000 and 3,000 functions depending on the system. Therefore, we test only those functions that are executable, and your results closely align with our original findings (0.14 under 5000 -> 0.28 under 2500) |
I see. Thanks for detailed explanation! |
Hello,
I am currently working on reproducing the evaluation described in your paper and have a question regarding the evaluation using ExeBench.
From my understanding, the dataset provided by ExeBench includes assembly code for each function at optimization levels O0, Os, and O3. Could you kindly clarify how you obtained the assembly code for all optimization levels (O0, O1, O2, and O3)? Did you extract and recompile each function individually?
If you have a script or specific instructions for generating the dataset you used, would it be possible for you to share it?
Thank you.
The text was updated successfully, but these errors were encountered: