Translation unit detection #65

ghost · 2021-07-24T21:42:48Z

Most of the unresearched code currently sits in a handful of large assembly blobs.
These blobs contain lots of unrelated pieces of code. We need to improve structuring.

A basic improvement is to recover the original translation unit slices and generate C inline ASM files for each TU.

The CodeWarrior build system leaks some information on TU structure.
Examples:

Data sections of a TU (especially small data) are aligned and padded. Hint: Padding detected (i.e. no xrefs) and next piece of data is aligned
Strings and floating point literals are deduplicated within a TU. Hint: The TU boundary has to be between two copies of the same data.

riidefi · 2021-07-24T21:54:53Z

Some more clues:

The majority of data is not shared across TUs
Non-SDA data loads are typically done as first_tu_data + (data - first_tu_data). Example:
.rel.text1:806DD3A8 addi r30, r30, aMashballoongc@l # "MashBalloonGC"
.rel.text1:806DD3AC addi r4, r30, (aHeyhoshipgba_0 - 0x808A0420) # "HeyhoShipGBA"
.rel.text1:806DD3B0 bl strcmp

riptl · 2022-03-19T22:30:07Z

Resuming work on this. To begin with, I'm going to export all symbols, XREFs, etc, from @stblr's Ghidra using https://github.com/r0metheus/GhiDump
This should get us off the ground with the sdata2 float dedup heuristic.

riptl · 2022-03-27T09:21:18Z

First attempt at translation unit detection using the sdata2 heuristic has been successful (well, kinda?).

File format is

<SDATA2_START>..<SDATA2_STOP> <TEXT_START>..<TEXT_STOP>

Please note that the detected text TUs only set the minimum span. They are always greater in practice.

sdata_detect_attempt.txt

riidefi · 2022-03-28T00:48:01Z

Nice work! I think for the time being, we can fairly easily do .text splits using the symbol map. If the script could then autogenerate the data splits, that would be really convenient.

ghost added enhancement p-high devops Changes to the build system and CI labels Jul 24, 2021

ghost self-assigned this Jul 24, 2021

ghost removed their assignment Jul 24, 2021

ghost added the decompiler Improvements to decompiler tooling label Jul 24, 2021

riptl removed the enhancement label Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translation unit detection #65

Translation unit detection #65

ghost commented Jul 24, 2021

riidefi commented Jul 24, 2021

riptl commented Mar 19, 2022

riptl commented Mar 27, 2022 •

edited

Loading

riidefi commented Mar 28, 2022

Translation unit detection #65

Translation unit detection #65

Comments

ghost commented Jul 24, 2021

riidefi commented Jul 24, 2021

riptl commented Mar 19, 2022

riptl commented Mar 27, 2022 • edited Loading

riidefi commented Mar 28, 2022

riptl commented Mar 27, 2022 •

edited

Loading