Skip to content
This repository has been archived by the owner on Apr 28, 2023. It is now read-only.

Latest commit

 

History

History
247 lines (184 loc) · 20.9 KB

CHANGELOG.md

File metadata and controls

247 lines (184 loc) · 20.9 KB

Changelog

This lists the major changes in angr. Tracking minor changes are left as an exercise for the reader :-)

angr 6.7.3.26

Building off of the engine changes from the last release, we have begun to extend angr to other architectures. AVR and MSP430 are in progress. In the meantime, subwire has created a reference implementation of BrainFuck support in angr, done two different ways! Check out angr-bf for more info!

  • We have rebased our fork of VEX on the latest master branch from valgrind (as of 2 months ago, at least...). We have also submitted our patches to VEX to upstream, so we should be able to stop maintaining a fork pretty soon.
  • The way we interact with VEX has changed substancially, and should speed things up a bit.
  • Loading sets of binaries with many import symbols has been sped up
  • Many, many improvements to angr-management, including the switch away from enaml to using pyside directly.

angr 6.7.1.13

For the last month, we have been working on a major refactor of the angr to change the way that angr reasons about the code that it analyzes. Until now, angr has been bound to the VEX intermediate representation to lift native code, supporting a wide range of architectures but not being very expandable past them. This release represents the ground work for what we call translation and execution engines. These engines are independent backends, pluggable into the angr framework, that will allow angr to reason about a wide range of targets. For now, we have restructured the existing VEX and Unicorn Engine support into this engine paradigm, but as we discuss in our blog post, the plan is to create engines to enable angr's reasoning of Java bytecode and source code, and to augment angr's environment support through the use of external dynamic sandboxes.

For now, these changes are mostly internal. We have attempted to maintain compatibility for end-users, but those building systems atop angr will have to adapt to the modern codebase. The following are the major changes:

  • simuvex: we have introduced SimEngine. SimEngine is a base class for abstractions over native code. For example, angr's VEX-specific functionality is now concentrated in SimEngineVEX, and new engines (such as SimEngineLLVM) can be implemented (even outside of simuvex itself) to support the analysis of new types of code.
  • simuvex: as part of the engines refactor, the SimRun class has been eliminated. Instead of different subclasses of SimRun that would be instantiated from an input state, engines each have a process function that, from an input state, produces a SimSuccessors instance containing lists of different successor states (normal, unsat, unconstrained, etc) and any engine-specific artifacts (such as the VEX statements. Take a look at successors.artifacts).
  • simuvex: state.mem[x:] = y now requires a type for storage (for example state.mem[x:].dword = y).
  • simuvex: the way of calling inline SimProcedures has been changed. Now you have to create a SimProcedure, and then call execute() on it and pass in a program state as well as the arguments.
  • simuvex: accessing registers through SimRegNameView (like state.regs.eax) always triggers SimInspect breakpoints and creates new actions. Now you can access a register by prefixing its name with an underscore (e.g. state.regs._eax or state._ip) to avoid triggering breakpoints or creating actions.
  • angr: the way hooks work has slightly changed, though is backwards-compatible. The new angr.Hook class acts as a wrapper for hooks (SimProcedures and functions), keeping things cleaner in the project._sim_procedures dict.
  • angr: we have deprecated the keyword argument max_size and changed it to to size in the angr.Block constructor (i.e., the argument to project.factory.block and more upstream methods (path.step, path_group.step, etc).
  • angr: we have deprecated project.factory.sim_run and changed it to to project.factory.successors, and it now generates a SimSuccessors object.
  • angr: project.factory.sim_block has been deprecated and replaced with project.factory.successors(default_engine=True).
  • angr: angr syscalls are no longer hooks. Instead, the syscall table is now in project._simos.syscall_table. This will be made "public" after a usability refactor. If you were using project.is_hooked(addr) to see if an address has a related SimProcedure, now you probably want to check if there is a related syscall as well (using project._simos.syscall_table.get_by_addr(addr) is not None).
  • pyvex: to support custom lifters to VEX, pyvex has introduced the concept of backend lifters. Lifters can be written in pure python to produce VEX IR, allowing for extendability of angr's VEX-based analyses to other hardware architectures.

As usual, there are many other improvements and minor bugfixes.

  • claripy: support unsat_core() to get the core of unsatness of constraints. It is in fact a thin wrapper of the unsat_core() function provided by Z3. Also a new state option CONSTRAINT_TRACKING_IN_SOLVER is added to SimuVEX. That state option must be enabled if you want to use unsat_core() on any state.
  • simuvex: SimMemory.load() and SimMemory.store() now takes a new parameter disable_actions. Setting it to True will prevent any SimAction creation.
  • angr: CFGFast has a better support for ARM binaries, especially for code in THUMB mode.
  • angr: thanks to an improvement in SimuVEX, CFGAccurate now uses slightly less memory than before.
  • angr: len() on path trace or addr_trace is made much faster.
  • angr: Fix a crash during CFG generation or symbolic execution on platforms/architectures with no syscall defined.
  • angr: as part of the refactor, BackwardSlicing is temporarily disabled. It will be re-enabled once all DDG-related refactor are merged to master.

Additionally, packaging and build-system improvements coordinated between the angr and Unicorn Engine projects have allowed angr's Unicorn support to be built on Windows. Because of this, unicorn is now a dependency for simuvex.

Looking forward, angr is poised to become a program analysis engine for binaries and more!

angr 5.6.12.3

It has been over a month since the last release 5.6.10.12. Again, we’ve made some significant changes and improvements on the code base.

  • angr: Labels are now stored in KnowledgeBase.
  • angr: Add a new analysis: Disassembly. The new Disassembly analysis provides an easy-to-use interface to render assembly of functions.
  • angr: Fix the issue that ForwardAnalysis may prematurely terminate while there are still un-processed jobs.
  • angr: Many small improvements and bug fixes on CFGFast.
  • angr: Many small improvements and bug fixes on VFG. Bring back widening support. Fix the issue that VFG may not terminate under certain cases. Implement a new graph traversal algorithm to have an optimal traversal order. Allow state merging at non-merge-points, which allows faster convergence.
  • angr-management: Display a progress during initial CFG recovery.
  • angr-management: Display a “Load binary” window upon binary loading. Some analysis options can be adjusted there.
  • angr-management: Disassembly view: Edge routing on the graph is improved.
  • angr-management: Disassembly view: Support starting a new symbolic execution task from an arbitrary address in the program.
  • angr-management: Disassembly view: Support renaming of function names and labels.
  • angr-management: Disassembly view: Support “Jump to address”.
  • angr-management: Disassembly view: Display resolved and unresolved jump targets. All jump targets are double-clickable.
  • SimuVEX: Move region mapping from SimAbstractMemory to SimMemory. This will allow an easier conversion between SimAbstractMemory and SimSymbolicMemory, which is to say, conversion between symbolic states and static states is now possible.
  • SimuVEX & claripy: Provide support for unsat_core in Z3. It returns a set of constraints that led to unsatness of the constraint set on the current state.
  • archinfo: Add a new Boolean variable branch_delay_slot for each architecture. It is set to True on MIPS32.

angr 5.6.8.22

Major point release! An incredible number of things have changed in the month run-up to the Cyber Grand Challenge.

  • Integration with Unicorn Engine supported for concrete execution. A new SimRun type, SimUnicorn, may step through many basic blocks at once, so long as there is no operation on symbolic data. Please use our fork of unicorn engine, which has many patches applied. All these patches are pending merge into upstream.
  • Lots of improvements and bug fixes to CFGFast. Rumors are angr’s CFG was only "optimized" for x86-64 binaries (which is really because most of our test cases are compiled as 64-bit ELFs). Now it is also “optimized” for x86 binaries :) (editor's note: angr is built with cross-architecture analysis in mind. CFG construction is pretty much the only component which has architecture-specific behavior.)
  • Lots of improvements to the VFG analysis, including speed and accuracy. However, there is still a lot to be done.
  • Lots of speed optimizations in general - CFGFast should be 3-6x faster under CPython with much less memory usage.
  • Now data dependence graph gives you a real dependence graph between variable definitions. Try data_graph and simplified_data_graph on a DDG object!
  • New state option simuvex.o.STRICT_PAGE_ACCESS will cause a SimSegfaultError to be raised whenever the guest reads/writes/executes memory that is either unmapped or doesn't have the appropriate permissions.
  • Merging of paths (as opposed to states) is performed in a much smarter way.
  • The behavior of the support_selfmodifying_code project option is changed: Before, this would allow the state to be used as a fallback source of instruction bytes when no backer from CLE is available. Now, this option makes instruction lifting use the state as the source of bytes always. When the option is disabled and execution jumps outside the normal binary, the state will be used automatically.
  • Actually support self-modifying code - if a basic block of code modifies itself, the block will be re-lifted before the next instruction starts.
  • Syscalls are handled differently now - Before you would see a SimRun for a syscall helper, now you'll just see a SimProcedure for the given syscall. Additionally, each syscall has its own address in a "syscalls segment", and syscalls are treated as jumps to this segment. This simplifies a lot of things analysis-wise.
  • CFGAccurate accepts a base_graph keyword to its constructor, e.g. CFGFast().graph, or even .graph of a function, to use as a base for analysis.
  • New fast memory model for cases where symbolic-addressed reads and writes are unlikely.
  • Conflicts between the find and avoid parameters to the Explorer otiegnqwvk are resolved correctly. (credit clslgrnc)
  • New analysis StaticHooker which hooks library functions in unstripped statically linked binaries.
  • Lifter can be used without creating an angr Project. You must manually specify the architecture and bytestring in calls to .lift() and .fresh_block(). If you like, you can also specify the architecture as a parameter to the constructor and omit it from the lifting calls.
  • Add two new analyses developed for the CGC (mostly as examples of doing static analysis with angr): Reassembler and BinaryOptimizer.

angr 4.6.6.28

In general, there have been enormous amounts of speed improvements in this release. Depending on the workload, angr should run about twice as fast. Aside from this, there have also been many submodule-specific changes:

angr

Quite a few changes and improvements are made to CFGFast and CFGAccurate in order to have better and faster CFG recovery. The two biggest changes in CFGFast are jump table resolution and data references collection, respectively. Now CFGFast resolves indirect jumps by default. You may get a list of indirect jumps recovered in CFGFast by accessing the indirect_jumps attribute. For many cases, it resolves the jump table accurately. Data references collection is still in alpha mode. To test data references collection, just pass collect_data_references=True when creating a fast CFG, and access the memory_data attribute after the CFG is constructed.

CFG recovery on ARM binaries is also improved.

A new paradigm called an "otiegnqwvk", or an "exploration technique", allows the packaging of special logic related to path group stepping.

SimuVEX

Reads/writes to the x87 fpu registers now work correctly - there is special logic that rotates a pointer into part of the register file to simulate the x87 stack.

With the recent changes to Claripy, we have configured SimuVEX to use the composite solver by default. This should be transparent, but should be considered if strange issues (or differences in behavior) arise during symbolic execution.

Claripy

Fixed a bug in claripy where __div__ was not always doing unsigned division, and added new methods SDiv and SMod for signed division and signed remainder, respectively.

Claripy frontends have been completely rewritten into a mixin-centric solver design. Basic frontend functionality (i.e., calling into the solver or dealing with backends) is handled by frontends (in claripy.frontends), and additional functionality (such as caching, deciding when to simplify, etc) is handled by frontend mixins (in claripy.frontend_mixins). This makes it considerably easier to customize solvers to your specific needE. For examples, look at claripy/solver.py.

Alongside the solver rewrite, the composite solver (which splits constraints into independent constraint sets for faster solving) has been immensely improved and is now functional and fast.

angr 4.6.6.4

Syscalls are no longer handled by simuvex.procedures.syscalls.handler. Instead, syscalls are now handled by angr.SimOS.handle_syscall(). Previously, the address of a syscall SimProcedure is the address right after the syscall instruction (e.g. int 80h), which collides with the real basic block starting at that address, and is very confusing. Now each syscall SimProcedure has its own address, just as a normal SimProcedure. To support this, there is another region mapped for the syscall addresses, Project._syscall_obj.

Some refactoring and bug fixes in CFGFast.

Claripy has been given the ability to handle annotations on ASTs. An annotation can be used to customize the behavior of some backends without impacting others. For more information, check the docstrings of claripy.Annotation and claripy.Backend.apply_annotation.

angr 4.6.5.25

New state constructor - call_state. Comes with a refactor to SimCC, a refactor to callable, and the removal of PathGroup.call. All these changes are thoroughly documented, in angr-doc/docs/structured_data.md

Refactor of SimType to make it easier to use types - they can be instanciated without a SimState and one can be added later. Comes with some usability improvements to SimMemView. Also, there's a better wrapper around PyCParser for generating SimType instances from c declarations and definitions. Again, thoroughly documented, still in the structured data doc.

CFG is now an alias to CFGFast instead of CFGAccurate. In general, CFGFast should work under most cases, and it's way faster than CFGAccurate. We believe such a change is necessary, and will make angr more approachable to new users. You will have to change your code from CFG to CFGAccurate if you are relying on specific functionalities that only exist in CFGAccurate, for example, context-sensitivity and state-preserving. An exception will be raised by angr if any parameter passed to CFG is only supported by CFGAccurate. For more detailed explanation, please take a look at the documentation of angr.analyses.CFG.

angr 4.6.3.28

PyVEX has a structural overhaul. The IRExpr, IRStmt, and IRConst modules no longer exist as submodules, and those module names are deprecated. Use pyvex.expr, pyvex.stmt, and pyvex.const if you need to access the members of those modules.

The names of the first three parameters to pyvex.IRSB (the required ones) have been changed. If you were passing the positional args to IRSB as keyword args, consider switching to positional args. The order is data, mem_addr, arch.

The optional parameter sargc to the entry_state and full_init_state constructors has been removed and replaced with an argc parameter. sargc predates being able to have claripy ASTs independent from a solver. The new system is to pass in the exact value, ast or integer, that you'd like to have as the guest program's arg count.

CLE and angr can now accept file-like streams, that is, objects that support stream.read() and stream.seek() can be passed in wherever a filepath is expected.

Documentation is much more complete, especially for PyVEX and angr's symbolic execution control components.

angr 4.6.3.15

There have been several improvements to claripy that should be transparent to users:

  • There's been a refactoring of the VSA StridedInterval classes to fix cases where operations were not sound. Precision might suffer as a result, however.
  • Some general speed improvements.
  • We've introduced a new backend into claripy: the ReplacementBackend. This frontend generates replacement sets from constraints added to it, and uses these replacement sets to increase the precision of VSA. Additionally, we have introduced the HybridBackend, which combines this functionality with a constraint solver, allowing for memory index resolution using VSA.

angr itself has undergone some improvements, with API changes as a result:

  • We are moving toward a new way to store information that angr has recovered about a program: the knowledge base. When an analysis recovers some truth about a program (i.e., "there's a basic block at 0x400400", or "the block at 0x400400 has a jump to 0x400500"), it gets stored in a knowledge-base. Analysis that used to store data (currently, the CFG) now store them in a knowledge base and can share the global knowledge base of the project, now accessible via project.kb. Over time, this knowledge base will be expanded in the course of any analysis or symbolic execution, so angr is constantly learning more information about the program it is analyzing.
  • A forward data-flow analysis framework (called ForwardAnalysis) has been introduced, and the CFG was rewritten on top of it. The framework is still in alpha stage - expect more changes to be made. Documentation and more details will arrive shortly. The goal is to refactor other data-flow analysis, like CFGFast, VFG, DDG, etc. to use ForwardAnalysis.
  • We refactored the CFG to a) improve code readability, and b) eliminate some bad designs that linger due to historical reasons.

angr 4.5.12.?

Claripy has a new manager for backends, allowing external backends (i.e., those implemented by other modules) to be used. The result is that claripy.backend_concrete is now claripy.backends.concrete, claripy.backend_vsa is now claripy.backends.vsa, and so on.

angr 4.5.12.12

Improved the ability to recover from failures in instruction decoding. You can now hook specific addresses at which VEX fails to decode with project.hook, even if those addresses are not the beginning of a basic block.

angr 4.5.11.23

This is a pretty beefy release, with over half of claripy having been rewritten and major changes to other analyses. Internally, Claripy has been unified -- the VSA mode and symbolic mode now work on the same structures instead of requiring structures to be created differently. This opens the door for awesome capabilities in the future, but could also result in unexpected behavior if we failed to account for something.

Claripy has had some major interface changes:

  • claripy.BV has been renamed to claripy.BVS (bit-vector symbol). It can now create bitvectors out of strings (i.e., claripy.BVS(0x41, 8) and claripy.BVS("A") are identical).
  • state.BV and state.BVV are deprecated. Please use state.se.BVS and state.se.BVV.
  • BV.model is deprecated. If you're using it, you're doing something wrong, anyways. If you really need a specific model, convert it with the appropriate backend (i.e., claripy.backend_concrete.convert(bv)).

There have also been some changes to analyses:

  • Interface: CFG argument keep_input_state has been renamed to keep_state. With this option enabled, both input and final states are kept.
  • Interface: Two arguments cfg_node and stmt_id of BackwardSlicing have been deprecated. Instead, BackwardSlicing takes a single argument, targets. This means that we now support slicing from multiple sources.
  • Performance: The speed of CFG recovery has been slightly improved. There is a noticeable speed improvement on MIPS binaries.
  • Several bugs have been fixed in DDG, and some sanity checks were added to make it more usable.

And some general changes to angr itself:

  • StringSpec is deprecated! You can now pass claripy bitvectors directly as arguments.