|
| 1 | +Modeling I/O Side Effects |
| 2 | +========================= |
| 3 | + |
| 4 | +Many programs of interest that we wish to simulate produce side effects, which we would like to be available for comparison in our analysis. |
| 5 | +To enable this use case, cozy has a subsystem for producing IO side effects. Common examples of IO side effects we have found in example programs |
| 6 | +include writing to stdout/stderr, writing to the network, or writing over a serial connection. |
| 7 | + |
| 8 | +Modeling IO side effects is typically straightforward, and can be accomplished by hooking side effect producing functions and instead redirecting |
| 9 | +the side effect payload to a list attached to the current state. When a child state is forked from its parent, it obtains a copy of side effects |
| 10 | +from its parent. cozy keeps track of IO side effects over different channels (ie, a channel for stdout, network, etc.) and attempts to |
| 11 | +intelligently align side effects in the visualization interface. |
| 12 | + |
| 13 | +Note that by default, angr automatically concretizes data written to stdout/stderr. cozy side effects keeps the data symbolic and avoids the concretization. |
| 14 | +In this way cozy's side effects interface is superior to the angr default. |
| 15 | + |
| 16 | +========================== |
| 17 | +Performing a Side Effect |
| 18 | +========================== |
| 19 | + |
| 20 | +The primary function to take a look at is :py:func:`cozy.side_effect.perform`. The first argument is the :py:class:`angr.SimState` that the side effect |
| 21 | +will attach to. This argument can be obtained by hooking a side effect function, whose :py:meth:`angr.SimProcedure.run` method takes in a |
| 22 | +:py:class:`angr.SimState` object. Alternatively you can set a breakpoint using :py:class:`cozy.directive.Breakpoint` and obtain the :py:class:`angr.SimState` |
| 23 | +object in the breakpoint's `breakpoint_fun` callback. |
| 24 | + |
| 25 | +Here is an example of the use of :py:func:`cozy.side_effect.perform` in a custom :py:class:`angr.SimProcedure` hook:: |
| 26 | + |
| 27 | + # Here we are hooking a function called process_command, |
| 28 | + # so we need to make a class that inherits from SimProcedure |
| 29 | + class process_command(angr.SimProcedure): |
| 30 | + def run(self, cmd_str): |
| 31 | + strlen = angr.SIM_PROCEDURES["libc"]["strlen"] |
| 32 | + max_len = self.state.solver.max(self.inline_call(strlen, cmd_str).ret_expr) |
| 33 | + # Here we construct the side effect payload. Here it is a bunch of symbolic data. |
| 34 | + cmd = [self.state.memory.load(cmd_str + i, 1) for i in range(max_len)] |
| 35 | + def concrete_post_processor(concrete_cmd): |
| 36 | + return [chr(r.concrete_value) for r in concrete_cmd] |
| 37 | + cozy.side_effect.perform(self.state, "process_command", cmd, concrete_post_processor=concrete_post_processor) |
| 38 | + |
| 39 | +The second argument is the side effect channel. Different types of side effects should be performed over different channels. For example, |
| 40 | +you may have a channel for networked output and a channel for stdout. |
| 41 | + |
| 42 | +The third argument is the side effect body. The body must be a mixture of string-keyed Python dictionaries, Python lists, Python tuples, |
| 43 | +claripy concrete values, and claripy symbolic values. This should represent the payload of the side effect. |
| 44 | + |
| 45 | +The fourth argument is an optional post processing function to apply to concretized versions of the side effect's body if post processing is required. |
| 46 | +In this example we use the Python `chr` function to convert the integer to Python characters, which will be shown in the visualization |
| 47 | +user interface. |
| 48 | + |
| 49 | +The fifth argument is an optional label used to aid alignment in the user interface. For example, if you have multiple sites that produce |
| 50 | +side effects on the same channel, you will want to label the different sites with different labels. This aids the alignment algorithm to intelligently |
| 51 | +compare the produced side effects. One possible label is the code address location that the side effect is produced at. |
0 commit comments