This lists the major changes in angr. Tracking minor changes are left as an exercise for the reader :-)
Building off of the engine changes from the last release, we have begun to extend angr to other architectures. AVR and MSP430 are in progress. In the meantime, subwire has created a reference implementation of BrainFuck support in angr, done two different ways! Check out angr-bf for more info!
- We have rebased our fork of VEX on the latest master branch from valgrind (as of 2 months ago, at least...). We have also submitted our patches to VEX to upstream, so we should be able to stop maintaining a fork pretty soon.
- The way we interact with VEX has changed substancially, and should speed things up a bit.
- Loading sets of binaries with many import symbols has been sped up
- Many, many improvements to angr-management, including the switch away from enaml to using pyside directly.
For the last month, we have been working on a major refactor of the angr to change the way that angr reasons about the code that it analyzes. Until now, angr has been bound to the VEX intermediate representation to lift native code, supporting a wide range of architectures but not being very expandable past them. This release represents the ground work for what we call translation and execution engines. These engines are independent backends, pluggable into the angr framework, that will allow angr to reason about a wide range of targets. For now, we have restructured the existing VEX and Unicorn Engine support into this engine paradigm, but as we discuss in our blog post, the plan is to create engines to enable angr's reasoning of Java bytecode and source code, and to augment angr's environment support through the use of external dynamic sandboxes.
For now, these changes are mostly internal. We have attempted to maintain compatibility for end-users, but those building systems atop angr will have to adapt to the modern codebase. The following are the major changes:
- simuvex: we have introduced SimEngine. SimEngine is a base class for abstractions over native code. For example, angr's VEX-specific functionality is now concentrated in SimEngineVEX, and new engines (such as SimEngineLLVM) can be implemented (even outside of simuvex itself) to support the analysis of new types of code.
- simuvex: as part of the engines refactor, the SimRun class has been eliminated. Instead of different subclasses of SimRun that would be instantiated from an input state, engines each have a
process
function that, from an input state, produces a SimSuccessors instance containing lists of different successor states (normal, unsat, unconstrained, etc) and any engine-specific artifacts (such as the VEX statements. Take a look atsuccessors.artifacts
). - simuvex:
state.mem[x:] = y
now requires a type for storage (for examplestate.mem[x:].dword = y
). - simuvex: the way of calling inline SimProcedures has been changed. Now you have to create a SimProcedure, and then call
execute()
on it and pass in a program state as well as the arguments. - simuvex: accessing registers through
SimRegNameView
(likestate.regs.eax
) always triggers SimInspect breakpoints and creates new actions. Now you can access a register by prefixing its name with an underscore (e.g.state.regs._eax
orstate._ip
) to avoid triggering breakpoints or creating actions. - angr: the way hooks work has slightly changed, though is backwards-compatible. The new angr.Hook class acts as a wrapper for hooks (SimProcedures and functions), keeping things cleaner in the
project._sim_procedures
dict. - angr: we have deprecated the keyword argument
max_size
and changed it to tosize
in theangr.Block
constructor (i.e., the argument toproject.factory.block
and more upstream methods (path.step
,path_group.step
, etc). - angr: we have deprecated
project.factory.sim_run
and changed it to toproject.factory.successors
, and it now generates aSimSuccessors
object. - angr:
project.factory.sim_block
has been deprecated and replaced withproject.factory.successors(default_engine=True)
. - angr: angr syscalls are no longer hooks. Instead, the syscall table is now in
project._simos.syscall_table
. This will be made "public" after a usability refactor. If you were usingproject.is_hooked(addr)
to see if an address has a related SimProcedure, now you probably want to check if there is a related syscall as well (usingproject._simos.syscall_table.get_by_addr(addr) is not None
). - pyvex: to support custom lifters to VEX, pyvex has introduced the concept of backend lifters. Lifters can be written in pure python to produce VEX IR, allowing for extendability of angr's VEX-based analyses to other hardware architectures.
As usual, there are many other improvements and minor bugfixes.
- claripy: support
unsat_core()
to get the core of unsatness of constraints. It is in fact a thin wrapper of theunsat_core()
function provided by Z3. Also a new state optionCONSTRAINT_TRACKING_IN_SOLVER
is added to SimuVEX. That state option must be enabled if you want to useunsat_core()
on any state. - simuvex:
SimMemory.load()
andSimMemory.store()
now takes a new parameterdisable_actions
. Setting it to True will prevent any SimAction creation. - angr: CFGFast has a better support for ARM binaries, especially for code in THUMB mode.
- angr: thanks to an improvement in SimuVEX, CFGAccurate now uses slightly less memory than before.
- angr:
len()
on pathtrace
oraddr_trace
is made much faster. - angr: Fix a crash during CFG generation or symbolic execution on platforms/architectures with no syscall defined.
- angr: as part of the refactor,
BackwardSlicing
is temporarily disabled. It will be re-enabled once all DDG-related refactor are merged to master.
Additionally, packaging and build-system improvements coordinated between the angr and Unicorn Engine projects have allowed angr's Unicorn support to be built on Windows. Because of this, unicorn
is now a dependency for simuvex
.
Looking forward, angr is poised to become a program analysis engine for binaries and more!
It has been over a month since the last release 5.6.10.12. Again, we’ve made some significant changes and improvements on the code base.
- angr: Labels are now stored in KnowledgeBase.
- angr: Add a new analysis:
Disassembly
. The new Disassembly analysis provides an easy-to-use interface to render assembly of functions. - angr: Fix the issue that
ForwardAnalysis
may prematurely terminate while there are still un-processed jobs. - angr: Many small improvements and bug fixes on
CFGFast
. - angr: Many small improvements and bug fixes on
VFG
. Bring back widening support. Fix the issue thatVFG
may not terminate under certain cases. Implement a new graph traversal algorithm to have an optimal traversal order. Allow state merging at non-merge-points, which allows faster convergence. - angr-management: Display a progress during initial CFG recovery.
- angr-management: Display a “Load binary” window upon binary loading. Some analysis options can be adjusted there.
- angr-management: Disassembly view: Edge routing on the graph is improved.
- angr-management: Disassembly view: Support starting a new symbolic execution task from an arbitrary address in the program.
- angr-management: Disassembly view: Support renaming of function names and labels.
- angr-management: Disassembly view: Support “Jump to address”.
- angr-management: Disassembly view: Display resolved and unresolved jump targets. All jump targets are double-clickable.
- SimuVEX: Move region mapping from
SimAbstractMemory
toSimMemory
. This will allow an easier conversion betweenSimAbstractMemory
andSimSymbolicMemory
, which is to say, conversion between symbolic states and static states is now possible. - SimuVEX & claripy: Provide support for
unsat_core
in Z3. It returns a set of constraints that led to unsatness of the constraint set on the current state. - archinfo: Add a new Boolean variable
branch_delay_slot
for each architecture. It is set to True on MIPS32.
Major point release! An incredible number of things have changed in the month run-up to the Cyber Grand Challenge.
- Integration with Unicorn Engine supported for concrete execution. A new SimRun type, SimUnicorn, may step through many basic blocks at once, so long as there is no operation on symbolic data. Please use our fork of unicorn engine, which has many patches applied. All these patches are pending merge into upstream.
- Lots of improvements and bug fixes to CFGFast. Rumors are angr’s CFG was only "optimized" for x86-64 binaries (which is really because most of our test cases are compiled as 64-bit ELFs). Now it is also “optimized” for x86 binaries :) (editor's note: angr is built with cross-architecture analysis in mind. CFG construction is pretty much the only component which has architecture-specific behavior.)
- Lots of improvements to the VFG analysis, including speed and accuracy. However, there is still a lot to be done.
- Lots of speed optimizations in general - CFGFast should be 3-6x faster under CPython with much less memory usage.
- Now data dependence graph gives you a real dependence graph between variable definitions. Try
data_graph
andsimplified_data_graph
on a DDG object! - New state option
simuvex.o.STRICT_PAGE_ACCESS
will cause aSimSegfaultError
to be raised whenever the guest reads/writes/executes memory that is either unmapped or doesn't have the appropriate permissions. - Merging of paths (as opposed to states) is performed in a much smarter way.
- The behavior of the
support_selfmodifying_code
project option is changed: Before, this would allow the state to be used as a fallback source of instruction bytes when no backer from CLE is available. Now, this option makes instruction lifting use the state as the source of bytes always. When the option is disabled and execution jumps outside the normal binary, the state will be used automatically. - Actually support self-modifying code - if a basic block of code modifies itself, the block will be re-lifted before the next instruction starts.
- Syscalls are handled differently now - Before you would see a SimRun for a syscall helper, now you'll just see a SimProcedure for the given syscall. Additionally, each syscall has its own address in a "syscalls segment", and syscalls are treated as jumps to this segment. This simplifies a lot of things analysis-wise.
- CFGAccurate accepts a
base_graph
keyword to its constructor, e.g.CFGFast().graph
, or even.graph
of a function, to use as a base for analysis. - New fast memory model for cases where symbolic-addressed reads and writes are unlikely.
- Conflicts between the
find
andavoid
parameters to the Explorer otiegnqwvk are resolved correctly. (credit clslgrnc) - New analysis
StaticHooker
which hooks library functions in unstripped statically linked binaries. Lifter
can be used without creating an angr Project. You must manually specify the architecture and bytestring in calls to.lift()
and.fresh_block()
. If you like, you can also specify the architecture as a parameter to the constructor and omit it from the lifting calls.- Add two new analyses developed for the CGC (mostly as examples of doing static analysis with angr): Reassembler and BinaryOptimizer.
In general, there have been enormous amounts of speed improvements in this release. Depending on the workload, angr should run about twice as fast. Aside from this, there have also been many submodule-specific changes:
Quite a few changes and improvements are made to CFGFast
and CFGAccurate
in order to have better and faster CFG recovery.
The two biggest changes in CFGFast
are jump table resolution and data references collection, respectively.
Now CFGFast
resolves indirect jumps by default.
You may get a list of indirect jumps recovered in CFGFast
by accessing the indirect_jumps
attribute.
For many cases, it resolves the jump table accurately.
Data references collection is still in alpha mode.
To test data references collection, just pass collect_data_references=True
when creating a fast CFG, and access the memory_data
attribute after the CFG is constructed.
CFG recovery on ARM binaries is also improved.
A new paradigm called an "otiegnqwvk", or an "exploration technique", allows the packaging of special logic related to path group stepping.
Reads/writes to the x87 fpu registers now work correctly - there is special logic that rotates a pointer into part of the register file to simulate the x87 stack.
With the recent changes to Claripy, we have configured SimuVEX to use the composite solver by default. This should be transparent, but should be considered if strange issues (or differences in behavior) arise during symbolic execution.
Fixed a bug in claripy where __div__
was not always doing unsigned division, and added new methods SDiv
and SMod
for signed division and signed remainder, respectively.
Claripy frontends have been completely rewritten into a mixin-centric solver design. Basic frontend functionality (i.e., calling into the solver or dealing with backends) is handled by frontends (in claripy.frontends
), and additional functionality (such as caching, deciding when to simplify, etc) is handled by frontend mixins (in claripy.frontend_mixins
). This makes it considerably easier to customize solvers to your specific needE. For examples, look at claripy/solver.py
.
Alongside the solver rewrite, the composite solver (which splits constraints into independent constraint sets for faster solving) has been immensely improved and is now functional and fast.
Syscalls are no longer handled by simuvex.procedures.syscalls.handler
.
Instead, syscalls are now handled by angr.SimOS.handle_syscall()
.
Previously, the address of a syscall SimProcedure is the address right after the syscall instruction (e.g. int 80h
), which collides with the real basic block starting at that address, and is very confusing.
Now each syscall SimProcedure has its own address, just as a normal SimProcedure.
To support this, there is another region mapped for the syscall addresses, Project._syscall_obj
.
Some refactoring and bug fixes in CFGFast
.
Claripy has been given the ability to handle annotations on ASTs.
An annotation can be used to customize the behavior of some backends without impacting others.
For more information, check the docstrings of claripy.Annotation
and claripy.Backend.apply_annotation
.
New state constructor - call_state
. Comes with a refactor to SimCC
, a refactor to callable
, and the removal of PathGroup.call
.
All these changes are thoroughly documented, in angr-doc/docs/structured_data.md
Refactor of SimType
to make it easier to use types - they can be instanciated without a SimState and one can be added later.
Comes with some usability improvements to SimMemView.
Also, there's a better wrapper around PyCParser for generating SimType instances from c declarations and definitions.
Again, thoroughly documented, still in the structured data doc.
CFG
is now an alias to CFGFast
instead of CFGAccurate
.
In general, CFGFast
should work under most cases, and it's way faster than CFGAccurate
.
We believe such a change is necessary, and will make angr more approachable to new users.
You will have to change your code from CFG
to CFGAccurate
if you are relying on specific functionalities that only exist in CFGAccurate
, for example, context-sensitivity and state-preserving.
An exception will be raised by angr if any parameter passed to CFG
is only supported by CFGAccurate
.
For more detailed explanation, please take a look at the documentation of angr.analyses.CFG
.
PyVEX has a structural overhaul. The IRExpr
, IRStmt
, and IRConst
modules no longer exist as submodules, and those module names are deprecated.
Use pyvex.expr
, pyvex.stmt
, and pyvex.const
if you need to access the members of those modules.
The names of the first three parameters to pyvex.IRSB
(the required ones) have been changed.
If you were passing the positional args to IRSB as keyword args, consider switching to positional args.
The order is data
, mem_addr
, arch
.
The optional parameter sargc
to the entry_state
and full_init_state
constructors has been removed and replaced with an argc
parameter.
sargc
predates being able to have claripy ASTs independent from a solver.
The new system is to pass in the exact value, ast or integer, that you'd like to have as the guest program's arg count.
CLE and angr can now accept file-like streams, that is, objects that support stream.read()
and stream.seek()
can be passed in wherever a filepath is expected.
Documentation is much more complete, especially for PyVEX and angr's symbolic execution control components.
There have been several improvements to claripy that should be transparent to users:
- There's been a refactoring of the VSA StridedInterval classes to fix cases where operations were not sound. Precision might suffer as a result, however.
- Some general speed improvements.
- We've introduced a new backend into claripy: the ReplacementBackend. This frontend generates replacement sets from constraints added to it, and uses these replacement sets to increase the precision of VSA. Additionally, we have introduced the HybridBackend, which combines this functionality with a constraint solver, allowing for memory index resolution using VSA.
angr itself has undergone some improvements, with API changes as a result:
- We are moving toward a new way to store information that angr has recovered about a program: the knowledge base. When an analysis recovers some truth about a program (i.e., "there's a basic block at 0x400400", or "the block at 0x400400 has a jump to 0x400500"), it gets stored in a knowledge-base. Analysis that used to store data (currently, the CFG) now store them in a knowledge base and can share the global knowledge base of the project, now accessible via
project.kb
. Over time, this knowledge base will be expanded in the course of any analysis or symbolic execution, so angr is constantly learning more information about the program it is analyzing. - A forward data-flow analysis framework (called ForwardAnalysis) has been introduced, and the CFG was rewritten on top of it. The framework is still in alpha stage - expect more changes to be made. Documentation and more details will arrive shortly. The goal is to refactor other data-flow analysis, like CFGFast, VFG, DDG, etc. to use ForwardAnalysis.
- We refactored the CFG to a) improve code readability, and b) eliminate some bad designs that linger due to historical reasons.
Claripy has a new manager for backends, allowing external backends (i.e., those implemented by other modules) to be used.
The result is that claripy.backend_concrete
is now claripy.backends.concrete
, claripy.backend_vsa
is now claripy.backends.vsa
, and so on.
Improved the ability to recover from failures in instruction decoding.
You can now hook specific addresses at which VEX fails to decode with project.hook
, even if those addresses are not the beginning of a basic block.
This is a pretty beefy release, with over half of claripy having been rewritten and major changes to other analyses. Internally, Claripy has been unified -- the VSA mode and symbolic mode now work on the same structures instead of requiring structures to be created differently. This opens the door for awesome capabilities in the future, but could also result in unexpected behavior if we failed to account for something.
Claripy has had some major interface changes:
- claripy.BV has been renamed to claripy.BVS (bit-vector symbol). It can now create bitvectors out of strings (i.e., claripy.BVS(0x41, 8) and claripy.BVS("A") are identical).
- state.BV and state.BVV are deprecated. Please use state.se.BVS and state.se.BVV.
- BV.model is deprecated. If you're using it, you're doing something wrong, anyways. If you really need a specific model, convert it with the appropriate backend (i.e., claripy.backend_concrete.convert(bv)).
There have also been some changes to analyses:
- Interface: CFG argument
keep_input_state
has been renamed tokeep_state
. With this option enabled, both input and final states are kept. - Interface: Two arguments
cfg_node
andstmt_id
ofBackwardSlicing
have been deprecated. Instead,BackwardSlicing
takes a single argument,targets
. This means that we now support slicing from multiple sources. - Performance: The speed of CFG recovery has been slightly improved. There is a noticeable speed improvement on MIPS binaries.
- Several bugs have been fixed in DDG, and some sanity checks were added to make it more usable.
And some general changes to angr itself:
- StringSpec is deprecated! You can now pass claripy bitvectors directly as arguments.