Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs and reorganize repo directory structure #322

Merged
merged 10 commits into from
Mar 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,9 @@ find_package(PkgConfig REQUIRED)

set(EXTERNAL_DIR ${PROJECT_SOURCE_DIR}/external)

# libia2 needs to be first so it defines ${libia2_BINARY_DIR}
add_subdirectory(libia2)
# runtime needs to be first so it defines libia2_BINARY_DIR
add_subdirectory(runtime)

add_subdirectory(examples)
add_subdirectory(rewriter)
add_subdirectory(partition-alloc)
add_subdirectory(pad-tls)
add_subdirectory(runtime)
add_subdirectory(tests)
add_subdirectory(tools)
82 changes: 11 additions & 71 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,81 +1,21 @@
# IA2 Phase 2
# IA2 Sandboxing Runtime

This repo provides tools for compartmentalizing an application and its dependencies using Intel's memory protection keys (MPK) for userspace. The repo includes a tool that rewrites headers to create call gate wrappers as well as the runtime to initialize the protection keys for trusted compartments.
IA2 (or Intent-capturing Annotations for Isolation and Assurance) is a runtime and set of tools for compartmentalizing C/C++ applications to provide coarse spatial memory-safety.

## Setup
Applications typically use many third-party C/C++ libraries that may introduce memory-safety vulnerabilities if developers don't have the resources to exhaustively audit them. Putting everything in a single process means that a vulnerability in one library can compromise another and that's a problem for programs that handle security-sensitive information. The IA2 sandbox splits applications into isolated compartments and uses CPU hardware features (Memory Protection Keys on x86-64) to forbid cross-compartment memory accesses. Compartments are delineated along pre-existing boundaries (at the shared library level) avoiding the need to rearchitect a codebase and source-code annotations are used to mark variables that are intentionally shared between compartments. The runtime also ensures that each set of shared libraries in a compartment uses a distinct region of memory for its stack, heap, static and thread-local variables.

Ubuntu 20.04 is used for testing. Other Linux distributions may or may not work. Adjust the commands below accordingly.
## Sandboxing workflow

### Install the package prerequisites.
IA2 uses source-code transformations as part of the build process to sandbox programs. Our [rewriter tool](docs/source_rewriter.md) processes a codebase's source files before each build to produce a set of intermediate sources with annotations for compartment transitions at cross-DSO calls. These intermediate sources are then passed on to a build system using off-the-shelf compilers with some additional flags and the IA2 runtime is linked in to create the compartmentalized program. See the [design doc](docs/design.md) for details.

```
sudo apt install -y libusb-1.0-0-dev libclang-dev llvm-dev \
ninja-build zlib1g-dev python3-pip cmake \
libavformat-dev libavutil-dev pcregrep patchelf
pip install lit
rustup install nightly
```
This workflow treats intermediate sources as build artifacts so the only annotations that need to be checked-in are those that can't be inferred. These are primarily annotations for shared variables and cross-library indirect calls when round-tripping function pointers through `void *`. For the latter the rewriter also does type-system transformations to turn missing annotations into compiler errors and avoid accidental miscompartmentalization. In compartmentalized programs cross-compartment memory accesses kill the process, but they also support a permissive mode which just logs the accesses to aid developers adding shared variable annotations.

### Configure with CMake
## Features

*Note*: Adjust paths to your version of Clang/LLVM

```
mkdir build && pushd build
cmake .. \
-DClang_DIR=/usr/lib/cmake/clang-12 \
-DLLVM_DIR=/usr/lib/llvm-12/cmake \
-DLLVM_EXTERNAL_LIT=`which lit` \
-G Ninja
```

### Build and run the tests

*Note*: Pass `-v` to ninja to see build commands and output from failing tests.

```
ninja check-ia2
```
- **Hardware checks** Coarse spatial memory safety enforced using CPU hardware features to reduce the runtime cost of bounds checking. On x86-64 this means Memory Protection Keys ([`wrpkru`](https://www.kernel.org/doc/html/next/core-api/protection-keys.html)) for memory permissions and Control-flow Enforcement Technology (`endbr` and [SHSTK](https://docs.kernel.org/next/x86/shstk.html)) to prevent call gate misuse. Support for Aarch64 using Memory Tagging Extensions is in-progress.
- **Minimal developer-facing annotations** Compartment transition annotations that can be inferred are automatically added by the rewriter and only appear in build artifacts. This reduces the developer-facing annotations while allowing developers to audit source-code changes made by the rewriter if necessary.
- **Toolchain-independent** Compartmentalized programs are built with off-the shelf compilers and linkers.

## Usage

### Defining compartments

First invoke the [`INIT_RUNTIME`](https://github.com/immunant/IA2-Phase2/blob/5cdb743d3a42e8df8e4d8cf61fb3551656001c73/libia2/include/ia2.h#L204) macro once in any binary or shared library to define the number of protection keys that need to be allocated. The argument passed to `INIT_RUNTIME` must be a number between 1 and 15. Then the [`IA2_COMPARTMENT`](https://github.com/immunant/IA2-Phase2/blob/4a3a0c8d2a2b1881e0e41c89db070db3da187f9e/libia2/include/ia2_compartment_init.inc#L3) `#define` is used to define trusted compartments at the shared object level. This can be the main executable ELF or any dynamically-linked shared libraries. Memory belonging to a trusted compartment is assigned one of the [15 protection keys](https://man7.org/linux/man-pages/man7/pkeys.7.html) and can only be accessed by the shared object itself. Objects that don't explicitly define a compartment are treated as untrusted by default.

To assign a protection key to a trusted compartment, insert `#define IA2_COMPARTMENT n` with an argument between 1-15 specifying the index of the protection key, and include `ia2_compartment_init.inc`:

```c
#define IA2_COMPARTMENT 1
#include <ia2_compartment_init.inc>
```

This argument must differ from the other trusted compartments. Trusted compartments must also be aligned and padded properly by using the `padding.ld` script in `libia2/`. In CMake this is done automatically for executables built with `define_test` while libraries built with `define_shared_lib` must add `LINK_OPTS "-Wl,-T${libia2_BINARY_DIR}/padding.ld"`. To use in manual builds just include `-Wl,-T/path/to/padding.ld` in the final compilation step. Manual builds also require disabling lazy binding with `-Wl,-z,now`.

### Wrapping calls

Calls between compartments must have call gates to toggle the PKRU permissions at each transition. For direct calls, this is done by rewriting headers to generate the source for a wrapper that provides versions of every function with call gates. These wrappers are specific to both the wrapped library and caller. This means that the generated source must be compiled once per compartment that links against the wrapped library. Each caller's wrapper must define the `CALLER_PKEY` macro with the appropriate value for the caller.

#### From CMake

We provide a CMake rule to wrap a library or the main executable. This rule builds a wrapper and provides its dependency information to consumers of its outputs. Specifically, wrapper libs also depend on libia2 and have additional required compilation flags (-fno-omit-frame-pointer) for application code.

[Usage from CMake](https://github.com/immunant/IA2-Phase2/blob/main/cmake/define-ia2-wrapper.cmake#L10-L32) looks like this (wrapping `myunsafelib` which is used by your existing `my_prog` target):
```diff
+define_ia2_wrapper(
+ WRAPPER my_wrapper_target
+ WRAPPED_LIB myunsafelib-1.0
+ HEADERS myunsafelib.h myunsafelib_config.h
+ CALLER_PKEY 0
+)
+
add_executable(my_prog main.c)
-target_link_libraries(my_prog PRIVATE myunsafelib-1.0)
+target_link_libraries(my_prog PRIVATE my_wrapper_target)
```

Wrapped libraries are treated as untrusted by default. If the library being wrapped defined a trusted compartment, `COMPARTMENT_PKEY n` must be specified in define_ia2_wrapper. Here `n` is the argument used in `IA2_COMPARTMENT` to define the compartment. If the caller is an untrusted compartment, set `CALLER_PKEY UNTRUSTED`. To create a wrapper for the main binary (i.e. if shared libraries call it directly) the `WRAP_MAIN` option must be specified.

#### Manual usage

See [this doc](https://github.com/immunant/IA2-Phase2/blob/main/docs/usage.md) for notes on manual usage.
See [this doc](docs/build_instructions.md) for instructions on building the tools and tests in this repo. For more detailed instructions on the compartmentalization process see the [usage doc](docs/usage.md).
2 changes: 1 addition & 1 deletion cmake/define-test.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ function(define_test)
if (DEFINE_TEST_WITHOUT_SANDBOX)
add_test(${TEST_NAME} ${WRAPPED_MAIN})
else()
add_test(NAME ${TEST_NAME} COMMAND ${CMAKE_BINARY_DIR}/runtime/ia2-sandbox ${CMAKE_CURRENT_BINARY_DIR}/${WRAPPED_MAIN} WORKING_DIRECTORY ${CMAKE_BINARY_DIR})
add_test(NAME ${TEST_NAME} COMMAND ${CMAKE_BINARY_DIR}/runtime/tracer/ia2-sandbox ${CMAKE_CURRENT_BINARY_DIR}/${WRAPPED_MAIN} WORKING_DIRECTORY ${CMAKE_BINARY_DIR})
add_dependencies(${WRAPPED_MAIN} ia2-sandbox)
endif()
add_dependencies(check ${WRAPPED_MAIN})
Expand Down
43 changes: 43 additions & 0 deletions docs/build_instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Setup

Ubuntu 20.04 is used for testing. Other Linux distributions may or may not work. Adjust the commands below accordingly.

## Install the package prerequisites.

```
sudo apt install -y libusb-1.0-0-dev libclang-dev llvm-dev \
ninja-build zlib1g-dev python3-pip cmake \
libavformat-dev libavutil-dev pcregrep patchelf
pip install lit
rustup install nightly
```

## Configure with CMake

*Note*: Adjust paths to your version of Clang/LLVM or using `llvm-config --cmakedir`

```
mkdir build && pushd build
cmake .. \
-DClang_DIR=/usr/lib/cmake/clang-15 \
-DLLVM_DIR=/usr/lib/cmake/llvm-15 \
-DLLVM_EXTERNAL_LIT=`which lit` \
-G Ninja
```

### Notable CMake variables

- `LIBIA2_DEBUG` - Adds additional runtime assertions to validate control-flow.
- `LIBIA2_AARCH64` - Builds the runtime and tests for Aarch64 using MTE instead of x86-64 with MPK. Tools are still built for the host.
- `CMAKE_TOOLCHAIN_FILE` - Typically set to `cmake/aarch64-toolchain.cmake` to build for Aarch64 using GCC. This also sets LIBIA2_AARCH64.

## CMake targets

- `check` - builds and runs the test suite. Pass `-v` to ninja to see build commands and output from failing tests.
- `ia2-rewriter` - builds the source-code rewriter. Depends on libclang-dev and llvm-dev.
- `pad-tls` - builds a script for padding ELF headers for TLS segments. Only required for compartmentalized DSOs that use thread-local storage.
- `libia2` - builds the runtime as a static library. This does not include call gate transitions as those are program-specific and generated by the rewriter.
- `partition-alloc` - builds the compartment-aware shim for Chromium's PartitionAlloc allocator.
- `ia2-sandbox` - builds the syscall tracer.

Tests are enumerated in `tests/CMakeLists.txt`. To build a specific test use `$TEST_main_wrapped` as the target. See the [`directory structure doc`](docs/directory_structure.md) for an overview of the rest of the repo's contents.
152 changes: 84 additions & 68 deletions docs/design.md
Original file line number Diff line number Diff line change
@@ -1,68 +1,84 @@
# Source-to-Source Header Rewriting Design

In Phase 1 we relied on compiler instrumentation to insert call gates at
inter-compartment calls and for rewriting object allocations for shared values.
For Phase II we plan to instead use source rewriting and standard linking to
interpose between compartments, removing the requirement for a customized
compiler.

## Design Structure

Our goal is to compartmentalize library(s) at the dynamic linking interface
between shared libraries. These libraries generally use the C ABI and declare
their API using C headers. Our rewriter will take these header declarations as
input and produce a new, drop-in replacement library that exposes the same API
but includes compartment transitions when entering and exiting the original
library code. This replacement library will only contain call-gate wrappers for
each exported function and will dynamically link against the original library
for the actual implementations of the API.

By replacing the original library with our wrapper in the application build
system, users can be sure that they cannot inadvertently call functions in the
compartmentalized library directly, bypassing the compartment transition. This
also handles `dlopen` calls, as long as the applications `dlopen`s the wrapper
library rather than the original.

The rewriter will produce C files with inline call gates and stack transitions
that users can audit if desired and can build using their existing toolchain.
When we change types of function declarations (e.g. char* -> SharedCharPtr) to
indicate that a type needs to be allocated in the shared memory region, we will
need to produce a replacement header with these rewritten types for users to
include instead of the original library's header. Again, these headers will be
standard C and auditable after creation.

Using a C API is the lowest common denominator for FFI interop between
languages. Even C++ and Rust usually use the C ABI to interop with other
external libraries. We may need a different design if we want to support
compartmentalizing, e.g., pure Rust crates rather than external C libraries.

## Compartment Transitions

### Trusted -> Untrusted direct calls

Our wrapper library will contain wrappers for every exported function. Each
wrapper will save and clear callee-saved registers to the caller stack, save an
identifier or address to the stack to validate the return site, copy any
parameters on the stack to the callee stack, switch compartments, make a direct
call to the callee. On returning the wrapper will transition back to the
original compartment, check the stack cookie/return address to ensure that the
program returned to the corresponding wrapper, switch to the caller stack, and
return.

### Untrusted -> Trusted direct calls

Rare case. During wrapper creation we may be able to scan the target library for
unresolved symbols and provide reverse wrappers for these symbols.

### Trusted -> Untrusted indirect calls

Indirect calls to exported functions will go through our wrappers, as taking the
address of an exported function in the target library will give the address of
the wrapper. Private callbacks that aren't exported are trickier. We will
probably need to replace function pointer types in the headers with special
opaque structures that capture the function pointer and redirect it to a call
gate transition.

### Untrusted -> Trusted indirect calls

Same as trusted -> untrusted in the wrapper paradigm.
# Compartmentalization design

IA2 is a sandboxing framework with the following goals:

1. Allow sandboxing individual processes at the dynamic shared object (DSO) granularity
2. Allow inspection of code inserted by the framework
3. Avoid changes to existing compilers and linkers
4. Avoid changes to the operating system or dynamic linker

IA2 can be used in conjunction with multi-process sandboxing and is particularly
suitable for processes that can't feasibly be split into more processes. The x86
implementation relies on Memory Protection Keys (MPK) for protecting pages and
control-flow integrity (e.g. as provided by Intel's CET). It currently only
supports Linux, but its design does not preclude porting it to other operating
systems that provide access to the previously mentioned hardware primitives.

# Protected memory

Compartmentalization protects applications against spatial memory-safety
vulnerabilities in dependencies by placing sets of DSOs in separate
compartments. Memory belonging to each set of DSOs can only be accessed from its
compartment by default. This includes stack variables, static and
dynamically-allocated data and thread-local storage. The on-disk application
and libraries are assumed to be accessible to attackers, so read-only static
data is not protected from other compartments since it can just be read from the
binaries.

# Building compartmentalized applications

The following diagram shows the workflow for building compartmentalized
applications.

![compartmentalization workflow](img/workflow.png)

The build process works by adding a source code rewriting step before each
build. This creates new source files which are then passed on to an existing
build system with some additional standard compiler flags. The rewriter also
generates a source file with application-specific call-gate code and the
framework provides a static library that must also be linked in.

## Supported compilers and linkers

The framework is routinely tested with gcc and clang as compilers and LLVM's lld
and GNU ld as linkers. The gold linker is currently not supported due to its
minimal support for linker scripts. Other compilers may be added as the need
arises.

# Runtime Initialization

The runtime initialization happens by interposing `main` using `ld --wrap=main`.
The `main` wrapper switches from the stack initialized by the loader to a
protected stack for the main binary's compartment, initializes the PKRU
register to set memory access permissions for the initial compartment then calls
the real `main` provided by the application. Once the real `main` returns the
wrapper undos these operations before returning control to the C runtime. This
implies protecting against vulnerabilities in the C runtime is out of scope.

The `INIT_RUNTIME` macro must also be invoked to initialize the stacks and
thread-local storage used by each compartment. Applications require one stack
per (compartment * thread), so to minimize memory usage only one stack per
compartment is initially created and further sets of compartment stacks are
allocated on-demand as new threads are created.

# Compartment initialization

When DSOs are loaded, their statically-allocated memory is protected using
`pkey_mprotect`. This happens by including `ia2_compartment_init.inc` in one DSO
per compartment. This inserts a constructor (called automatically) that uses
`dl_iterate_phdr` to find the writeable ELF segments for the DSO and its
dependencies declared using `IA2_COMPARTMENT_LIBRARIES`.

# Framework code interposition

Calls to DSOs in different compartments are interposed using call gates. To
provide build-time assurance that call gates cannot be misused, they are
application-specific and generated by the rewriter. Direct cross-compartment
calls are identified by the `__wrap_` prefix. To ensure that compartments can be
mutually distrusting, indirect cross-compartment calls are split into sets of
two half call gates that uses an intermediate PKRU value without access to any
compartment. For each potential indirect call, the rewriter inserts the first
half call gate at the callsite and replaces the function pointer expression with
the second half call gate. To ensure that roundtrip casts between `void *` and
function pointers do not lead to missing call gates, the rewriter also changes
function pointer types in the rewritten sources to ABI-compatible structs.
Loading
Loading