Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPC: Display data values on hover for pools as well #140

Merged
merged 10 commits into from
Dec 4, 2024

Conversation

LagoLunatic
Copy link
Contributor

This expands on #108 in order to allow the data display in the tooltip to also work for references to symbols that have been pooled in the current function, instead of just direct references where the relocation is on the same line as the load instruction itself.

Here are some examples of what this supports:


It allows viewing pooled float literal loads so that you can more easily see if the value is the same or different, regardless of whether the offset specified by the instruction is different or not (these screenshots are from the target and base of a fully decompiled function, but other functions in the TU haven't been started yet, so the .rodata section is unfinished):
2024-12-01_17_16_35_805_objdiff
2024-12-01_17_16_44_904_objdiff


It allows viewing any string that gets loaded from @stringBase, not just the first string like in the current version of objdiff:
2024-12-01_17_39_35_772_objdiff


It allows you to see the array symbol name that gets loaded into a particular register when it's done via an addi from a source register that has the pool start address in it (example is at the start of a loop):
2024-12-01_16_56_29_163_objdiff

And then, when one of the elements of that array symbol gets loaded with a lwzx/lbzx/etc instruction, it shows the value of the first element of the array (same function as above screenshot, inside the body of the loop):
2024-12-01_16_57_10_452_objdiff

Note that it does not support showing all the other elements of the array besides the first. So #124 is related but still not fully resolved.
Fixing that issue would be tricky because arrays might contains structs and not just single primitive data type, so you would need to at least know the array's stride.


When a value in the middle of a symbol is accessed, it now shows the offset from the start of the symbol that is being accessed in the tooltip:
2024-12-01_18_08_23_818_objdiff

The above example is showing a load of the v_offset field at offset 0x4 in a PTMF struct. The PTMF is located at offset 0x200 from the start of the .data section. So it displays 200+4 rather than 204, and @1373+4 instead of @1373.


I explain the exact implementation details in the comments so I won't repeat that here.

But I will mention what I didn't implement: Following branches.
I determine what registers contain what pools just by looking through each instruction in the function from top to bottom, ignoring control flow. So there could easily be false positives or false negatives.

I was actually originally planning to implement following branches, but was surprised that the naive branchless implementation I wrote as a temporary placeholder just seemed to... work on all the functions I tested it on. So I didn't implement branches because I didn't want to add a bunch of complexity without even one example of where it would help.
But I only checked maybe a dozen functions in total, so it's definitely possible for weird edge cases to exist that I didn't catch. Let me know if functions like that are found.

Floats and doubles will now always be displayed with a decimal point and one digit after it, even if they are whole numbers. Floats will also have the f suffix. This is so you can tell the data type just by glancing at the value.
@LagoLunatic
Copy link
Contributor Author

Just discovered a false positive where the heuristic was detecting string uses that don't actually exist:
2024-12-02_17_56_19_993_objdiff

Fixed by simply clearing out the volatile registers when a function call occurs:
2024-12-02_17_56_59_546_objdiff

Because we no longer have access to the actual symbol name via sections, guess_data_type can no longer detect the String data type for pooled references.
@@ -106,6 +106,7 @@ pub struct ObjIns {
pub mnemonic: Cow<'static, str>,
pub args: Vec<ObjInsArg>,
pub reloc: Option<ObjReloc>,
pub fake_pool_reloc: Option<ObjReloc>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't store this in reloc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, there's no scenario where reloc and fake_pool_reloc need to both be Some, so it might work. The reason I didn't do that initially is because reloc is accessed in many more places in the code than just the hover tooltip and I wasn't sure if I should be messing with all those without understanding them.

I just checked all the references to reloc now:

  1. The JSON output of objdiff-cli --output (in Instruction::new, might also be used by wasm?):
    image

  2. The context menu when right-clicking on an instruction (in ins_context_menu):
    image

There are other places besides those 2 that reference the field, but I think they rely on the relocation's type being a supported one, not R_PPC_NONE, so they don't actually show up in a user-visible place in practice.

Notice that if we go with this route there's no name, size, etc for the symbol being referenced here because of the hack where I replaced the target symbol with an empty placeholder.

Without the placeholder hack, the actual symbol is correctly displayed:
image

If you want I can merge the two fields and add checks to Instruction::new and ins_context_menu to skip showing the symbol if its name is empty. If this is done, any more references to ObjIns.reloc added in the future would also need similar checks to prevent the placeholder from being used.

Alternative, maybe less hacky approach: instead of the hack where I resolve the placeholder to the real symbol in the GUI code for showing the tooltip every frame, I could do it just once diff/code.rs process_code_symbol and mutate the result before returning it.
I didn't think of this before, but unlike ObjArch::process_code, process_code_symbol already has access to obj.sections without changing the function signature.
This would allow the actual symbol name to be displayed everywhere more consistently. It would also avoid needing to add repetitive checks on !target.name.is_empty() every time ins.reloc is used anywhere in the future to avoid bugs.
Basically, anything that calls process_code_symbol would see reloc as having the correct pooled reference target symbol. And it seems like objdiff only calls ObjArch::process_code via process_code_symbol.

Do other projects that rely on objdiff like dtk use this reloc field for anything? Would it be bad for it to refer to an indirect pooled reference there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a couple commits with the changes I described above if you want to see for yourself if they're okay. From my testing it seems to work fine for both objdiff-gui and objdiff-cli and causes the fake relocation symbol to display everywhere the real one would:
2024-12-03_19_06_43_312_Code

I didn't test with other things like wasm/dtk.

Copy link
Owner

@encounter encounter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the continued work on this. It's shaping up quite nicely.

.and_then(|ty| obj.arch.display_data_type(ty, &reloc.target.bytes))
{
ui.colored_label(appearance.highlight_color, s);
if reloc.addend >= 0 && reloc.target.bytes.len() > reloc.addend as usize {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the logic behind this exactly? I'm confused by the check against target.bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two lines down, I slice into target.bytes with reloc.addend being the start index in order to support displaying values that are in the middle of the symbol.
reloc.addend should always be 0 <= reloc.addend < symbol.bytes.len() to avoid a panic on slice. I'm pretty sure the fake addends I create in this PR always are.

But for real addends read from object files, I think theoretically the addend could be garbage if there's something wrong with the file? Or a bug in the compiler that produced the object file, or something? I figured displaying nothing would be better than objdiff crashing entirely if something funky ever did happen, but I haven't encountered an example of this in reality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait I forgot there actually is one situation it crashes without this check, .bss symbols. They have a size and I give them an addend if a field in the middle of the symbol is accessed, but target.bytes is always an empty vec, because .bss variables don't have bytes stored in the object file.

This crashes if I remove the reloc.target.bytes.len() > reloc.addend check:
image

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I missed that it was pulling the data from the symbol's contents.

@encounter
Copy link
Owner

🚀

@encounter encounter merged commit 10b2a9c into encounter:main Dec 4, 2024
18 checks passed
@LagoLunatic LagoLunatic deleted the pooled-data branch December 4, 2024 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants