ELF relocatable object exporter #4922
Replies: 5 comments 8 replies
-
Hi boricj, |
Beta Was this translation helpful? Give feedback.
-
I've opened a PR to upstream some of my modifications in Ghidra's ELF support code outlined in this discussion (#4938). Given the lack of feedback here (I should probably have included more explicit actionable items in hindsight), to be on the safe side I've tried to keep the changes as limited as possible. While I can still make my own code run on this light version, most of the refactoring described here has been left out of it. Unless I have some feedback here, I won't be able to open PRs for these changes because I don't know what, if anything, would be eligible for upstreaming. |
Beta Was this translation helpful? Give feedback.
-
We do not currently intend on supporting complex exports (e.g., ELF, PE) other than via the original file bytes export for simple patching which relies on restoring the original FileBytes where relocations have been applied. Likewise, our header classes are intended to facilitate parsing only and not for build-up or modification of such headers (e.g., ElfHeader). Our simplistic header classes and processor-specific extensions are not well suited for the general case of header creation/modification such as a compiler/linker would produce. |
Beta Was this translation helpful? Give feedback.
-
Change is forthcoming to master branch which will remove muteability of ELF Headers and the ability to write them (i.e., serialization). The ELF API at this point is focused entirely on parsing/loading of ELF binaries. |
Beta Was this translation helpful? Give feedback.
-
It's been over a year since the last comment in this thread; how's the progress going? =) |
Beta Was this translation helpful? Give feedback.
-
TL;DR
I've implemented an ELF relocatable object exporter inside Ghidra and now I'm in trouble because the diff is over 8,000 lines:
https://github.com/boricj/ghidra/tree/elfobjectexporter
Motivation & context
I'm tooling up for reverse-engineering with a specific focus on recreating the source code of programs from binary artifacts. The ones I'm currently interested in happen to be in archaic file formats (either a.out or proprietary equivalents) and lack debugging symbols.
Since recreating the source code of an entire program is hard, I'd rather divide and conquer the task with smaller, more manageable chunks. So I got the crazy idea to _un_link programs back into object files and rewrite those one at a time instead.
Given that an object file is basically:
If I can recreate each element for each part of the original program, then I should be able to recreate a set of object files that can then be relinked into a working program.
I then prototyped a bunch of Jython scripts that demonstrated that the process of unlinking programs back into ELF object files is possible in practice (https://github.com/boricj/ghidra-unlinker), but they suck and I need a much better implementation for my needs. So I rewrote the ELF relocatable object file exporter part, this time within Ghidra itself, leaving the analysis part of the unlinking process for later.
And then I ended up with a branch that has a diff of over 8,000 lines (https://github.com/boricj/ghidra/tree/elfobjectexporter), because I refactored half of Ghidra's ELF code in the process. The exporter itself does appear to work well enough to unlink the "hello world" sample from my Jython prototype test suite (https://github.com/boricj/ghidra-unlinker/tree/master/tests/reference/libunlinker/mips32el/linux) after manually creating the two needed entries in Ghidra's relocation table, such that I can relink it and run it successfully, but it's very much WIP, mostly untested and I just got it working yesterday.
The ELF code refactoring that got out of hand
The problem I've encountered with Ghidra's ELF code is that while it works well enough to import ELF files from the file system, it does not really handle the creation of ELF files in-memory from scratch well nor their export to a file. I also find it hard to work with, but maybe that's because I'm not a Java programmer by trade.
The incomplete list of changes I've made in this branch (https://github.com/boricj/ghidra/tree/elfobjectexporter):
ElfFileSection
from an interface that ELF table classes implement to an interface thatElfSectionHeader
andElfProgramHeader
implement.ElfFileSection
instead of implementingElfFileSection
ElfSectionHeader
andElfProgramHeader
intoElfSection
andElfSegment
because that's what they are (they contain not just the header part, they also contain the data itself) and terminology within Ghidra's ELF code is quite inconsistentElfHeader
to simplify down to an API likegetSections()
,getSections(Predicate<ElfSection> predicate)
,getSection(Predicate<ElfSection> predicate)
ElfFile
fromElfHeader
(which was probably a mistake in hindsight)readelf
written in Java using Ghidra's ELF code (useful to check ELF parsing code for correctness after so much refactoring, but I doubt that the Ghidra project would be interested in that particular piece of code)I would not say that all of the changes I've made are for the better, but I do think that the end result is a bit easier to work with and has less cobwebs overall.
How do I get out of this mess?
I really don't want to maintain such a large diff on my own which also happens to be a nightmare to rebase, so I'm looking to upstream the various bits and pieces up to and including the ELF object file exporter itself. I do understand that the large-scale ELF code refactoring (not all of which was required in hindsight) is going to be an issue, which will probably require yet more refactorings or even a complete rewrite before this is eligible for upstreaming.
In my defense, I was so preoccupied by demonstrating a valid use case with a working prototype before engaging with upstream that I was already far too deep in refactored land by the time I noticed. I decided to at least finish a working example as-is so that I have something to show for it and deal with the consequences later.
I figured that starting a discussion here first about this would be more polite than opening a comically large PR rewriting half of Ghidra's ELF code coming out of nowhere. So, here I am.
How is this useful?
I can foresee a number of use cases for unlinking in general and object exporters in particular:
I have also noticed a couple of issues that could be solved with an object exporter, such as #3880.
What's out of scope
The unlinking process requires that symbols, relocations and sections are reconstructed from a program before working. The ELF object exporter therefore assumes that the user already filled in all the blanks in the Symbol Table, the Relocation Table and the Program Tree. Implementing the various scripts/analyzers/plugins that recreates this data, as well as improving Ghidra's UI for this particular use case (especially making the Relocation Table editable), are not the ELF object exporter's problem. While the prototype written in Jython has some (badly hacked-together) code to deal with this, I currently have nothing for this iteration written in Java except for using the Python console to call
currentProgram.getRelocationTable().add(...)
by hand.I do not need object exporters for PE, Mach-O, xCOFF or other formats and I do not expect to need them in the foreseeable future, so I probably won't do them (and if I do, I'll make sure to engage with upstream first before going off on another refactoring rampage).
PS: I hope I didn't miss an already existing, similar Ghidra functionality that already does this, because that would be extremely embarrassing...
Beta Was this translation helpful? Give feedback.
All reactions