Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider demangling symbols #67

Open
HadrienG2 opened this issue Oct 5, 2024 · 6 comments
Open

Consider demangling symbols #67

HadrienG2 opened this issue Oct 5, 2024 · 6 comments
Assignees
Labels
effort: medium Something that can be done quickly with some effort enhancement New feature or request

Comments

@HadrienG2
Copy link

HadrienG2 commented Oct 5, 2024

Is your feature request related to a problem? Please describe.

Symbols from C++ and Rust programs (or any other AoT-compiled programming language that uses the Itanium name mangling ABI) can be quite hard to map back to source code without demangling, especially in the presence of generics.

Describe the solution you'd like

It would be nice if binsider had an on-by-default option to demangle Itanium ABI symbols. The option could be toggled either via CLI or via a TUI shortcut. I've been using cpp_demangle to this end in crofiler as a pure-rust solution and it worked pretty well, although it does have a few edge cases where it does not perfectly match libiberty.

Describe alternatives you've considered

Demangling can also be done in many other ways, such as by binding to libiberty or calling the c++filt utility. Alternatively, you may also decide that demangling is not worth the code complexity cost. Or you may not want to provide an option to disable it for UI simplicity. I've seen a few demangling hiccups in tools I use (especially perf), which is why I think it's good to have a way to turn it off.

If you do want to have demangling, another UI design option besides an on/off TUI shortcut would be to have two columns in the symbol table, one with the mangled name and one with the non-mangled name, but I think this table is already a bit crowded for that...

Additional context

Prior art of common ELF-wrangling tools that can perfom demangling and do so by default, with an option to disable it, includes the perf profiler and the GDB debugger.

@HadrienG2 HadrienG2 added the enhancement New feature or request label Oct 5, 2024
@orhun orhun added the effort: medium Something that can be done quickly with some effort label Oct 6, 2024
@orhun
Copy link
Owner

orhun commented Oct 6, 2024

Hello 👋🏼 Thanks for the suggestion! I think it very much makes sense :)

It would be nice if binsider had an on-by-default option to demangle Itanium ABI symbols. The option could be toggled either via CLI or via a TUI shortcut.

Yup, that's what I was thinking. Simply add another key binding (maybe m) for enabling/disabling mangling.

I've been using cpp_demangle to this end in crofiler as a pure-rust solution and it worked pretty well, although it does have a few edge cases where it does not perfectly match libiberty.

I haven't worked extensively with mangling libraries/tooling so I have some questions:

  • Does it only work with IA-64 name mangling ABI? How was your experience with other binaries (i.e. non-C++ stuff)?
  • Is there a way to detect the mangling type earlier in the program? In other words, are there cases where we should check if something is demangle-able?

If you do want to have demangling, another UI design option besides an on/off TUI shortcut would be to have two columns in the symbol table, one with the mangled name and one with the non-mangled name, but I think this table is already a bit crowded for that...

Agreed.

@orhun
Copy link
Owner

orhun commented Oct 6, 2024

Also, I found rustc_demangle.

@HadrienG2
Copy link
Author

HadrienG2 commented Oct 6, 2024

I think you may find the name mangling wikipedia page worth a read. To summarize its key points:

  • Itanium name mangling has been used by GCC, clang and icc for C++ for a while, and its use extends far beyond IA-64 these days. As far as I know, it's used on pretty much every CPU arch by these compilers.
  • Itanium name mangling is not quite universal though, the main modern-day exception is MSVC++ on Windows. But if you're targeting ELF binaries only, that may not be a major concern ?
  • Other languages use several different schemes. I thought that Rust used Itanium mangling too, but actually according to this Wikipedia page modern rustc uses a close cousin of the Itanium rules that has been tweaked to account for C++/Rust differences, e.g. lack of function overloading. For this, you're probably better off using rustc_demangle indeed.

To this, I can add that in the Real World, programs will have a mixture of mangled and non-mangled symbols, because you need non-mangled symbols for interop with C-minded infrastructure, e.g. linkers and loaders. You can handle this in various ways.

In crofiler I've went with dumb trial and error: try to demangle the symbol, it the demangler errors out keep the name as is. But crofiler only needed to support C++. In your case, since you're building a cross-language tool it may be better to identify the prefixes associated with various mangling schemes (e.g. _Z for C++ and _R for modern rustc) and dispatch to the appropriate demangler accordingly.

@orhun
Copy link
Owner

orhun commented Oct 7, 2024

Thanks for the summary, it made everything more clear :)

But if you're targeting ELF binaries only, that may not be a major concern ?

I was actually planning to support more formats in the future, but that's shouldn't be a concern for now.

See #26 - I'd love to get your opinion on it as well.

Wikipedia page modern rustc uses a close cousin of the Itanium rules that has been tweaked to account for C++/Rust differences

Hmm, interesting. I found this RFC but not quite sure about the latest status of it.

I've went with dumb trial and error: try to demangle the symbol, it the demangler errors out keep the name as is. But crofiler only needed to support C++. In your case, since you're building a cross-language tool it may be better to identify the prefixes associated with various mangling schemes (e.g. _Z for C++ and _R for modern rustc) and dispatch to the appropriate demangler accordingly.

Yeah, sounds reasonable and I think that's the path that I will be taking.

@HadrienG2
Copy link
Author

HadrienG2 commented Oct 7, 2024

See #26 - I'd love to get your opinion on it as well.

I'm afraid I'm not knowledgeable enough about binary file formats to evaluate how good this abstraction layer is :) If I knew more, my first questions would be...

  • Does this library take a common subset or optional superset approach to inter-format abstraction?
    • If it's a common subset approach (only include features that every format supports), do you think that will be good enough for your target audience? And how much functionality will you lose with respect to your current ELF-only approach?
    • If it's an optional superset approach (support all features from every binary format, but some features will not be present for a particular binary format), are you ready to handle the extra (optional) complexity in the UI?
  • From a Rust perspective, adding a C++ dependency adds some complexity to the build. Have you evaluated how much and are you fine with that?
  • Are there other competing libraries that do a similar job? If so, how do they compare? A quick crates.io search suggests having a look at goblin, object and maybe vivisect (worse documentation, but port of a python library that has some nice docs).

All that being said, the QuarksLab company is quite reputable in the French security community, so it does give a good first impression from a future maintenance and expected feature-completeness perspective.

Wikipedia page modern rustc uses a close cousin of the Itanium rules that has been tweaked to account for C++/Rust differences

Hmm, interesting. I found this RFC but not quite sure about the latest status of it.

Indeed, I've just cross-checked a rust binary that I've built recently and the mangled symbols still start with _Z, so it seems to me that if this is merged into rustc, it may not be on-by-default yet. The wikipedia page may need some amending...

@orhun
Copy link
Owner

orhun commented Oct 8, 2024

Those are good questions - I think we'll be able to answer them better after starting the implementation.

If it's a common subset approach (only include features that every format supports), do you think that will be good enough for your target audience? And how much functionality will you lose with respect to your current ELF-only approach?

That's my biggest concern, losing some data in the TUI due to the abstraction...

are you ready to handle the extra (optional) complexity in the UI?

Yeah.. or that..

From a Rust perspective, adding a C++ dependency adds some complexity to the build. Have you evaluated how much and are you fine with that?

It can't be worse than the issues that I'm having with Linux-specific dependencies (e.g. lurk-cli) :D

Are there other competing libraries that do a similar job? If so, how do they compare?

I will look into them later on. But either way, doing this for other file formats will require some abstraction. Thanks for sharing the links!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: medium Something that can be done quickly with some effort enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants