This lists the major changes in angr. Tracking minor changes are left as an exercise for the reader :-)
Quite a few changes and improvements are made to CFGFast
and CFGAccurate
in order to have better and faster CFG recovery.
The two biggest changes in CFGFast
are jump table resolution and data references collection, respectively.
Now CFGFast
resolves indirect jumps by default.
You may get a list of indirect jumps recovered in CFGFast
by accessing the indirect_jumps
attribute.
For many cases, it resolves the jump table accurately.
Data references collection is still in alpha mode.
To test data references collection, just pass collect_data_references=True
when creating a fast CFG, and access the memory_data
attribute after the CFG is constructed.
CFG recovery on ARM binaries is also improved.
Fixed a bug in claripy where __div__
was not always doing unsigned division.
Added new methods SDiv
and SMod
for signed division and signed remainder, respectively.
Reads/writes to the x87 fpu registers now work correctly - there is special logic that rotates a pointer into part of the register file to simulate the x87 stack.
A new paradigm called an "otiegnqwvk", or an "exploration technique", allows the packaging of special logic related to path group stepping.
Syscalls are no longer handled by simuvex.procedures.syscalls.handler
.
Instead, syscalls are now handled by angr.SimOS.handle_syscall()
.
Previously, the address of a syscall SimProcedure is the address right after the syscall instruction (e.g. int 80h
), which collides with the real basic block starting at that address, and is very confusing.
Now each syscall SimProcedure has its own address, just as a normal SimProcedure.
To support this, there is another region mapped for the syscall addresses, Project._syscall_obj
.
Some refactoring and bug fixes in CFGFast
.
Claripy has been given the ability to handle annotations on ASTs.
An annotation can be used to customize the behavior of some backends without impacting others.
For more information, check the docstrings of claripy.Annotation
and claripy.Backend.apply_annotation
.
New state constructor - call_state
. Comes with a refactor to SimCC
, a refactor to callable
, and the removal of PathGroup.call
.
All these changes are thoroughly documented, in angr-doc/docs/structured_data.md
Refactor of SimType
to make it easier to use types - they can be instanciated without a SimState and one can be added later.
Comes with some usability improvements to SimMemView.
Also, there's a better wrapper around PyCParser for generating SimType instances from c declarations and definitions.
Again, thoroughly documented, still in the structured data doc.
CFG
is now an alias to CFGFast
instead of CFGAccurate
.
In general, CFGFast
should work under most cases, and it's way faster than CFGAccurate
.
We believe such a change is necessary, and will make angr more approachable to new users.
You will have to change your code from CFG
to CFGAccurate
if you are relying on specific functionalities that only exist in CFGAccurate
, for example, context-sensitivity and state-preserving.
An exception will be raised by angr if any parameter passed to CFG
is only supported by CFGAccurate
.
For more detailed explanation, please take a look at the documentation of angr.analyses.CFG
.
PyVEX has a structural overhaul. The IRExpr
, IRStmt
, and IRConst
modules no longer exist as submodules, and those module names are deprecated.
Use pyvex.expr
, pyvex.stmt
, and pyvex.const
if you need to access the members of those modules.
The names of the first three parameters to pyvex.IRSB
(the required ones) have been changed.
If you were passing the positional args to IRSB as keyword args, consider switching to positional args.
The order is data
, mem_addr
, arch
.
The optional parameter sargc
to the entry_state
and full_init_state
constructors has been removed and replaced with an argc
parameter.
sargc
predates being able to have claripy ASTs independent from a solver.
The new system is to pass in the exact value, ast or integer, that you'd like to have as the guest program's arg count.
CLE and angr can now accept file-like streams, that is, objects that support stream.read()
and stream.seek()
can be passed in wherever a filepath is expected.
Documentation is much more complete, especially for PyVEX and angr's symbolic execution control components.
There have been several improvements to claripy that should be transparent to users:
- There's been a refactoring of the VSA StridedInterval classes to fix cases where operations were not sound. Precision might suffer as a result, however.
- Some general speed improvements.
- We've introduced a new backend into claripy: the ReplacementBackend. This frontend generates replacement sets from constraints added to it, and uses these replacement sets to increase the precision of VSA. Additionally, we have introduced the HybridBackend, which combines this functionality with a constraint solver, allowing for memory index resolution using VSA.
angr itself has undergone some improvements, with API changes as a result:
- We are moving toward a new way to store information that angr has recovered about a program: the knowledge base. When an analysis recovers some truth about a program (i.e., "there's a basic block at 0x400400", or "the block at 0x400400 has a jump to 0x400500"), it gets stored in a knowledge-base. Analysis that used to store data (currently, the CFG) now store them in a knowledge base and can share the global knowledge base of the project, now accessible via
project.kb
. Over time, this knowledge base will be expanded in the course of any analysis or symbolic execution, so angr is constantly learning more information about the program it is analyzing. - A forward data-flow analysis framework (called ForwardAnalysis) has been introduced, and the CFG was rewritten on top of it. The framework is still in alpha stage - expect more changes to be made. Documentation and more details will arrive shortly. The goal is to refactor other data-flow analysis, like CFGFast, VFG, DDG, etc. to use ForwardAnalysis.
- We refactored the CFG to a) improve code readability, and b) eliminate some bad designs that linger due to historical reasons.
Claripy has a new manager for backends, allowing external backends (i.e., those implemented by other modules) to be used.
The result is that claripy.backend_concrete
is now claripy.backends.concrete
, claripy.backend_vsa
is now claripy.backends.vsa
, and so on.
Improved the ability to recover from failures in instruction decoding.
You can now hook specific addresses at which VEX fails to decode with project.hook
, even if those addresses are not the beginning of a basic block.
This is a pretty beefy release, with over half of claripy having been rewritten and major changes to other analyses. Internally, Claripy has been unified -- the VSA mode and symbolic mode now work on the same structures instead of requiring structures to be created differently. This opens the door for awesome capabilities in the future, but could also result in unexpected behavior if we failed to account for something.
Claripy has had some major interface changes:
- claripy.BV has been renamed to claripy.BVS (bit-vector symbol). It can now create bitvectors out of strings (i.e., claripy.BVS(0x41, 8) and claripy.BVS("A") are identical).
- state.BV and state.BVV are deprecated. Please use state.se.BVS and state.se.BVV.
- BV.model is deprecated. If you're using it, you're doing something wrong, anyways. If you really need a specific model, convert it with the appropriate backend (i.e., claripy.backend_concrete.convert(bv)).
There have also been some changes to analyses:
- Interface: CFG argument
keep_input_state
has been renamed tokeep_state
. With this option enabled, both input and final states are kept. - Interface: Two arguments
cfg_node
andstmt_id
ofBackwardSlicing
have been deprecated. Instead,BackwardSlicing
takes a single argument,targets
. This means that we now support slicing from multiple sources. - Performance: The speed of CFG recovery has been slightly improved. There is a noticeable speed improvement on MIPS binaries.
- Several bugs have been fixed in DDG, and some sanity checks were added to make it more usable.
And some general changes to angr itself:
- StringSpec is deprecated! You can now pass claripy bitvectors directly as arguments.