ELF relocatable object exporter #4922

boricj · 2023-01-23T18:22:25Z

boricj
Jan 23, 2023

TL;DR

I've implemented an ELF relocatable object exporter inside Ghidra and now I'm in trouble because the diff is over 8,000 lines:

https://github.com/boricj/ghidra/tree/elfobjectexporter

Motivation & context

I'm tooling up for reverse-engineering with a specific focus on recreating the source code of programs from binary artifacts. The ones I'm currently interested in happen to be in archaic file formats (either a.out or proprietary equivalents) and lack debugging symbols.

Since recreating the source code of an entire program is hard, I'd rather divide and conquer the task with smaller, more manageable chunks. So I got the crazy idea to _un_link programs back into object files and rewrite those one at a time instead.

Given that an object file is basically:

A symbol table
A relocation table
A bunch of sections with relocatable contents

If I can recreate each element for each part of the original program, then I should be able to recreate a set of object files that can then be relinked into a working program.

I then prototyped a bunch of Jython scripts that demonstrated that the process of unlinking programs back into ELF object files is possible in practice (https://github.com/boricj/ghidra-unlinker), but they suck and I need a much better implementation for my needs. So I rewrote the ELF relocatable object file exporter part, this time within Ghidra itself, leaving the analysis part of the unlinking process for later.

And then I ended up with a branch that has a diff of over 8,000 lines (https://github.com/boricj/ghidra/tree/elfobjectexporter), because I refactored half of Ghidra's ELF code in the process. The exporter itself does appear to work well enough to unlink the "hello world" sample from my Jython prototype test suite (https://github.com/boricj/ghidra-unlinker/tree/master/tests/reference/libunlinker/mips32el/linux) after manually creating the two needed entries in Ghidra's relocation table, such that I can relink it and run it successfully, but it's very much WIP, mostly untested and I just got it working yesterday.

The ELF code refactoring that got out of hand

The problem I've encountered with Ghidra's ELF code is that while it works well enough to import ELF files from the file system, it does not really handle the creation of ELF files in-memory from scratch well nor their export to a file. I also find it hard to work with, but maybe that's because I'm not a Java programmer by trade.

The incomplete list of changes I've made in this branch (https://github.com/boricj/ghidra/tree/elfobjectexporter):

Reworked ElfFileSection from an interface that ELF table classes implement to an interface that ElfSectionHeader and ElfProgramHeader implement.
Refactored the various ELF table classes to use ElfFileSection instead of implementing ElfFileSection
Rename ElfSectionHeader and ElfProgramHeader into ElfSection and ElfSegment because that's what they are (they contain not just the header part, they also contain the data itself) and terminology within Ghidra's ELF code is quite inconsistent
Rewritten the various section/segment getters of ElfHeader to simplify down to an API like getSections(), getSections(Predicate<ElfSection> predicate), getSection(Predicate<ElfSection> predicate)
Refactored the ELF parsing code to use all of these new facilities and some more
Split off ElfFile from ElfHeader (which was probably a mistake in hindsight)
Various fixes and missing functionality that the ELF object file exporter requires
The ELF relocatable object exporter itself
A partial re-implementation of readelf written in Java using Ghidra's ELF code (useful to check ELF parsing code for correctness after so much refactoring, but I doubt that the Ghidra project would be interested in that particular piece of code)

I would not say that all of the changes I've made are for the better, but I do think that the end result is a bit easier to work with and has less cobwebs overall.

How do I get out of this mess?

I really don't want to maintain such a large diff on my own which also happens to be a nightmare to rebase, so I'm looking to upstream the various bits and pieces up to and including the ELF object file exporter itself. I do understand that the large-scale ELF code refactoring (not all of which was required in hindsight) is going to be an issue, which will probably require yet more refactorings or even a complete rewrite before this is eligible for upstreaming.

In my defense, I was so preoccupied by demonstrating a valid use case with a working prototype before engaging with upstream that I was already far too deep in refactored land by the time I noticed. I decided to at least finish a working example as-is so that I have something to show for it and deal with the consequences later.

I figured that starting a discussion here first about this would be more polite than opening a comically large PR rewriting half of Ghidra's ELF code coming out of nowhere. So, here I am.

How is this useful?

I can foresee a number of use cases for unlinking in general and object exporters in particular:

As noted above, I expect that this will make my source code recreation goals easier to accomplish
Large-scale program patching to a far greater extent than practical for traditional binary patching (at a significantly larger upfront cost due to the various table recreations required, but one could in theory skip parts by relinking stuff at the same addresses)
Creation of libraries from bits and pieces of one program, to be reused in another program
Porting a program to another executable format and/or operating system by swapping out relevant pieces as needed and relinking, although I expect emulators and compatibility layers to be a far more practical option than this in most cases

I have also noticed a couple of issues that could be solved with an object exporter, such as #3880.

What's out of scope

The unlinking process requires that symbols, relocations and sections are reconstructed from a program before working. The ELF object exporter therefore assumes that the user already filled in all the blanks in the Symbol Table, the Relocation Table and the Program Tree. Implementing the various scripts/analyzers/plugins that recreates this data, as well as improving Ghidra's UI for this particular use case (especially making the Relocation Table editable), are not the ELF object exporter's problem. While the prototype written in Jython has some (badly hacked-together) code to deal with this, I currently have nothing for this iteration written in Java except for using the Python console to call currentProgram.getRelocationTable().add(...) by hand.

I do not need object exporters for PE, Mach-O, xCOFF or other formats and I do not expect to need them in the foreseeable future, so I probably won't do them (and if I do, I'll make sure to engage with upstream first before going off on another refactoring rampage).

PS: I hope I didn't miss an already existing, similar Ghidra functionality that already does this, because that would be extremely embarrassing...

graps01 · 2023-01-25T13:06:16Z

graps01
Jan 25, 2023

Hi boricj,
As a first time commenter in this discussion thread, it looks to me like you would benefit from reading a little bit about binary analysis techniques. For the elf specification, there are books available on Amazon or whichever online bookstore suits you. One book that I am reading is Learning Linux Binary Analysis by Ryan O'neill. Packt Publishing. 2022-present. Lots of elf fundamentals and exercises.

1 reply

boricj Jan 26, 2023
Author

Thanks, but while I don't have extensive reverse-engineering experience with Ghidra, after:

writing two separate implementations of ELF relocatable object exporters,
writing one set of scripts that recreates a fair amount of MIPS relocations from symbol references inside a Ghidra program,
creating my own canonical form of ELF files called DROW and associated tooling (Prekernel+Kernel+Lagom: Restore netboot functionality SerenityOS/serenity#11368),
contributing about 200 commits to two separate open-source operating systems and a bunch of other stuff visible on my GitHub profile,

I believe I have a reasonably decent understanding of the Executable and Linkable Format and associated toolchains. Also, although the process of unlinking executable code seems to be rather niche, there is some prior art out there about it on the Internet. I therefore assume it can be made to work (whether it is practical or not is another story).

I was looking more for feedback on how much, if any, is the Ghidra project interested in my modifications. I understand if the prototype-quality ELF relocatable object exporter is deemed out of scope for upstreaming, but I would prefer not to maintain an extensive set of patches to Ghidra's ELF code handling in my own fork to make it work if I can help it.

boricj · 2023-01-29T20:04:46Z

boricj
Jan 29, 2023
Author

I've opened a PR to upstream some of my modifications in Ghidra's ELF support code outlined in this discussion (#4938). Given the lack of feedback here (I should probably have included more explicit actionable items in hindsight), to be on the safe side I've tried to keep the changes as limited as possible.

While I can still make my own code run on this light version, most of the refactoring described here has been left out of it. Unless I have some feedback here, I won't be able to open PRs for these changes because I don't know what, if anything, would be eligible for upstreaming.

3 replies

graps01 Feb 1, 2023

Hi Jean,
I am willing to look at your PR's and some code if you want me to follow up on usability and potential use for Ghidra. I come from a computer science background and have had Java and C++ experience in the past, and present. While the Ghidra software is primarily written in Java, Jython is a meritable programming language, and I am curious to see what it looks like. If it's anything like Javascript and Java, then reading it would be fairly straightforward for me.

Brian

graps01 Feb 1, 2023

Additional notes: Hi boricj,
I am wondering if you are planning to release the code for multiple platforms. Do you have a preference ? Linux or MacOS ? Windows ? Other platforms ? The reason why I am asking is for software portability: the ability to make multiple platforms receive the code in usable form, with minimal alterations, to simply work and function as devised, on a target platform.
I am applying some software engineering concepts already, looking into other compilers for Ghidra, IDA Pro, and other reverse engineering software (incl. Binary Ninja).

graps01 Feb 2, 2023

Nota bene: I have some details from the Jython website for your immediate consideration. According to them, only Python 2 will be supported. Therefore, you must forward-compatible make your code modern by HAND, instead of using scripts or automated processes. No 2to3 or other script will guarantee the upwards compatibility for sure.
https://www.jython.org/

ghidra1 · 2023-02-21T15:21:38Z

ghidra1
Feb 21, 2023
Collaborator

We do not currently intend on supporting complex exports (e.g., ELF, PE) other than via the original file bytes export for simple patching which relies on restoring the original FileBytes where relocations have been applied. Likewise, our header classes are intended to facilitate parsing only and not for build-up or modification of such headers (e.g., ElfHeader). Our simplistic header classes and processor-specific extensions are not well suited for the general case of header creation/modification such as a compiler/linker would produce.

2 replies

boricj Feb 27, 2023
Author

I understand that the Ghidra team is not currently interested in upstreaming my modifications to enable my admittedly unusual and unproven workflow. I still believe this approach to be workable, so I'll keep working on this on my own in my fork. If I manage to demonstrate a real-world application of the unlinking technique, I'll report back to add more data points to this topic.

In the meantime, I do recommend removing the existing ELF serialization code inside the Ghidra source tree. It's severely broken without my fixes ; without any use case that justifies repairing it, this code will just keep bitrotting.

ghidra1 Mar 1, 2023
Collaborator

I agree., but will have to investigate what dependencies may exist.

ghidra1 · 2023-03-01T20:42:57Z

ghidra1
Mar 1, 2023
Collaborator

Change is forthcoming to master branch which will remove muteability of ELF Headers and the ability to write them (i.e., serialization). The ELF API at this point is focused entirely on parsing/loading of ELF binaries.

0 replies

SamuelMarks · 2024-05-24T01:04:26Z

SamuelMarks
May 24, 2024

It's been over a year since the last comment in this thread; how's the progress going? =)

2 replies

ryanmkurtz May 24, 2024
Maintainer

What specific progress are you asking about?

boricj May 24, 2024
Author

I've refactored my fork into a Ghidra extension that can be found here: https://github.com/boricj/ghidra-delinker-extension.

Right now it has analyzers for 32-bit MIPS and 32-bit x86, as well as an object file exporter for the ELF format. With it, I've managed to port a program from Linux to Windows and run PlayStation code on Linux, among other things. I'm using it as part of my own reverse-engineering/decompilation project of a PlayStation game with promising results, which I'm documenting on my personal blog.

I've probably picked the worst possible platform to attempt this on, as MIPS has proven to be an extraordinarily difficult architecture to delink in an automated manner, but I've made it work through sheer stubbornness and algorithmic wizardry. Contrast this with delinking x86 code, where every code relocation spot is a pointer-sized immediate of an instruction that's trivial to process... and Ghidra's analyzers do a better job at annotating references properly for it, too.

People seem to have a lot of trouble wrapping their heads around this delinking concept, probably because it sounds like heresy to anyone who attended a CS101 course... and what I'm doing with it probably doesn't help either, since building programs out of pieces of binary code pilfered from incompatible platforms in spite of ABIs sounds like Hollywood hacking. Which is too bad really, because I've streamlined my extension to the point where delinking something takes two mouse clicks and no special reverse-engineering skills to pull off.

So yeah, it's real and it works. I'm not aware of any actual users besides me, but a couple of other people have expressed some interest in my tooling, in particular about COFF object file support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ELF relocatable object exporter #4922

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

ELF relocatable object exporter #4922

boricj Jan 23, 2023

TL;DR

Motivation & context

The ELF code refactoring that got out of hand

How do I get out of this mess?

How is this useful?

What's out of scope

Replies: 5 comments · 8 replies

graps01 Jan 25, 2023

boricj Jan 26, 2023 Author

boricj Jan 29, 2023 Author

graps01 Feb 1, 2023

graps01 Feb 1, 2023

graps01 Feb 2, 2023

ghidra1 Feb 21, 2023 Collaborator

boricj Feb 27, 2023 Author

ghidra1 Mar 1, 2023 Collaborator

ghidra1 Mar 1, 2023 Collaborator

SamuelMarks May 24, 2024

ryanmkurtz May 24, 2024 Maintainer

boricj May 24, 2024 Author

boricj
Jan 23, 2023

Replies: 5 comments 8 replies

graps01
Jan 25, 2023

boricj Jan 26, 2023
Author

boricj
Jan 29, 2023
Author

ghidra1
Feb 21, 2023
Collaborator

boricj Feb 27, 2023
Author

ghidra1 Mar 1, 2023
Collaborator

ghidra1
Mar 1, 2023
Collaborator

SamuelMarks
May 24, 2024

ryanmkurtz May 24, 2024
Maintainer

boricj May 24, 2024
Author