A transform will be typically implemented as a single Pass
subclass, which
implements begin_module
and at least one Patch
subclass (or function
decorated with @patch_constraints
, if using Patch.from_function
). Inside
of the begin_module
callback, the pass will register any modifications with
the passed RewritingContext
.
The below example defines a pass that just adds a nop
instruction at the
entry of all functions.
import gtirb_rewriting.driver
from gtirb_rewriting import *
class NopPass(Pass):
"""
Inserts a nop at the start of every function.
"""
def begin_module(self, module, functions, context):
context.register_insert(
AllFunctionsScope(FunctionPosition.ENTRY, BlockPosition.ENTRY),
Patch.from_function(self.nop_patch),
)
@patch_constraints()
def nop_patch(self, context):
return "nop"
if __name__ == "__main__":
# Allow gtirb-rewriting to provide us a command line driver. See
# docs/Drivers.md for details.
gtirb_rewriting.driver.main(NopPass)
Creating a GTIRB IR file is accomplished with the ddisasm
tool. A typical
invocation will look like:
ddisasm binary --ir binary.gtirb
Additionally, specifying -j1
may speed up disassembly on some systems as
ddisasm incurs extra overhead running parallel.
Reassembling the modified GTIRB IR is done with the gtirb-pprinter
tool. For
Linux, the invocation typically looks like:
gtirb-pprinter --policy=complete modified_binary.gtirb -b modified_binary
Specifying the "complete" printing policy is necessary to be able to rewrite
the code in _start
(otherwise the pretty-printer will default to skipping it
and letting the compiler regenerate it).
gtirb-rewriting provides a CallPatch
class that is able to insert function
calls using the platform's ABI. Its constructor takes the symbol to call,
along with the arguments to pass. Arguments currently must be either a Symbol,
an integer, or a callable that is passed an InsertionContext and returns
either a Symbol or an integer.
For example, inserting a call to exit with a fixed status code:
# at the top of the file
from gtirb_rewriting.patches import CallPatch
# in a Pass's begin_module callback:
exit_sym = context.get_or_insert_extern_symbol('exit', 'libc.so.6')
context.register_insert(..., CallPatch(exit_sym, [42]))
Passes frequently need to insert initialization code that is executed before
any code in the program runs. This is accomplished by using a
SingleBlockScope
with the module's entry_point
as the block.
For example, inserting a call to initialize some supporting library:
# at the top of the file
from gtirb_rewriting.patches import CallPatch
# in a Pass's begin_module callback:
init_sym = context.get_or_insert_extern_symbol(
'init_support_code', 'libsupport.so')
context.register_insert(
SingleBlockScope(module.entry_point, BlockPosition.ENTRY),
CallPatch(init_sym))
Disassembly is done via gtirb_capstone
's GtirbInstructionDecoder
object,
like so:
# at the top of the file:
from gtirb_capstone.instructions import GtirbInstructionDecoder
# in a Pass's begin_module callback:
decoder = GtirbInstructionDecoder(module.isa)
for function in functions:
for block in function.get_all_blocks():
offset = 0
for instruction in decoder.get_instructions(block):
pass # do something with the instruction here
offset += instruction.size
Patches can specify how many scratch general-purpose registers they require
by setting scratch_registers
in their constraints object. gtirb-rewriting
will then provide those registers in the insertion context.
Register objects can be formatted into a string to get the register name, optionally using the format specifier to get the name of a subregister.
Additionally, gtirb-rewriting will implicitly generate code to spill/restore the scratch registers as needed around the patch.
For example, a patch that takes two scratch registers:
@patch_constraints(scratch_registers=2)
def sample_patch(self, context):
reg1, reg2 = context.scratch_registers
return f"""
mov $0, %{reg1}
mov $1, %{reg2:32}
"""
Imagining that registers chosen were rax
and rbx
on x64-64, this patch
would expand to:
mov $0, %rax
mov $1, %ebx
Patches can also specify if there are registers from preceding instructions that
it would like to read. This indicates to gtirb-rewriting that while we are not
clobbering this register, we also do not want it to be considered as a scratch
register, as we want to preserve the contents of it. For example, if one is
inserting code to a program that assumes preceding instructions write to rax
,
and would like to read the contents of rax
, then setting
reads_registers={'rax'}
would allow the patch read from rax
and use
scratch_registers
without worrying about allocated scratch registers
clobbering the value of rax
.
A patch's constraints should describe what the patch's assembly will be doing in terms of what it clobbers. This allows gtirb-rewriting to spill/restore registers correctly.
Here is a summary of the current constraints:
align_stack
: aligns the stack to the ABI required alignment for calling a functionclobbers_flags
: preserves the flags registerclobbers_registers
: preserves specific registers by namepreserve_caller_saved_registers
: preserved the registers that are considered caller-saved during a function call by the ABIscratch_registers
: see the above section on scratch registersx86_syntax
: choose between using Intel and AT&T assembly syntax for the patchreads_registers
: registers which are read from incoming assembly instructions so not be included as scratch registers or clobbered registers.
Patches are free to refer to existing symbols in the program and to introduce
new labels, though the label names must not conflict with symbols already
present. To assist with this, gtirb-rewriting will automatically suffix
"temporary" labels, e.g. those starting with .L
for ELF x86-64, with a
unique integer behind the scenes.
For example, the following patch will actually generate symbols like
.Lmy_label_1
, etc:
def get_asm(self, context):
return """
jmp .Lmy_label
.Lmy_label:
nop
"""
If your patch is intended to be portable across different ABIs, you can use
ABI.temporary_label_prefix
to get the prefix needed for a temporary label or
InsertionContext.temporary_label
create an appropriate label. For example:
def get_asm(self, context):
label = context.temporary_label("my_label")
return f"""
jmp {label}
{label}:
nop
"""
For profiling and tracing transforms, it's common to want to know the original
address of a given block of code. While it is possible to access the block's
address from within a patch's get_asm method via the InsertionContext
, this
will give you the wrong answer because the address gets modified in the
process of applying transforms.
Instead, transforms should create a dict from code block to original address in the Pass's begin_module callback and refer to that later on.
Instructions can be replaced using the replace_at
function, which takes
the block to modify, the offset in that block, the number of bytes to replace,
and the patch to replace them with. Both the offset and the number of bytes to
replace must fall on instruction boundaries.
Instructions can be deleted using the delete_at
function, which takes
the block to modify, the offset in that block, and the number of bytes to
delete. Both the location and number of bytes to delete must fall on
instruction boundaries.
Deleting whole functions can be done using the delete_function
function.
Any references to the deleted blocks, e.g. symbols or control flow, will
be retargetted to reference a proxy block.
Patches can add data to non-text sections by switching section with the normal
assembler directives (.data
, etc) and using directives like .byte
to
specify the data.
For example, this patch would call __assert_fail
with the assertion message,
file, line, and function arguments:
@patch_constraints(x86_syntax=X86Syntax.INTEL)
def assert_patch(insertion_context):
return """
lea rdi, [rip + .Lassertion]
lea rsi, [rip + .Lunknown]
xor rdx, rdx
lea rcx, [rip + .Lunknown]
call __assert_fail
ud2
.rodata
.Lassertion:
.string "something went wrong!"
.Lunknown:
.string "unknown"
"""
Directives like .byte
can be used to emit instructions that the assembler
may not understand. Instructions added via .byte
must not have an impact on
control flow.
gtirb-rewriting uses a simple heuristic to determine, at the block level, code from data: if the block has any incoming edges or contains other instructions, the entire block will be treated as code. Otherwise it is treated as a data block.
For example, the bytes in this patch will be treated as code:
.byte 0x66
.byte 0x90
While these bytes will be treated as data:
jmp .L_end
.byte 0x66
.byte 0x90
.L_end:
nop
The RewritingContext
object can be used to rewrite a Module
directly,
without using PassManager
. This is not recommended for a few reasons:
- Passes provide a mechanism for combining together different transformations without getting into the problem of transformation interference.
- GTIRB files can contain multiple modules. Implementing your transform as a
Pass helps ensure that your transform handles this case by invoking the
begin_module
/end_module
callbacks for each module in the GTIRB IR.
RewritingContext
exposes two ways to insert code: register_insert
and
insert_at
. The difference is that insert_at is passed a single location in
the program, where as register_insert
is passed a scope that is later
resolved into any number of concrete locations. In the future, this will allow
gtirb_rewriting to select insertion locations that have the cheapest cost (as
defined by number of registers spilled, etc).
In general, it is recommended to use register_insert
if one of the existing
Scopes meets your needs and insert_at
for any other cases.
gtirb_rewriting will log each insertion it applies to its logger at the DEBUG level. Unless passed a different logger in the PassManager's (or RewritingContext's) constructor, it will log to the "gtirb_rewriting" logger.
This can be made visible by:
logging.basicConfig(format="%(message)s")
logging.getLogger("gtirb_rewriting").setLevel(logging.DEBUG)