-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syscall categorization #231
Comments
We can go through the whole list in more detail while allow-listing each that we need, these just stuck out to me. |
We can (I think ChromeOS folks investigated this though they didn't end up going with it as they were able to just modify the kernel), but after reading about it I think actually the Landlock LSM is probably more appropriate, if we can reasonably depend on it. Landlock gets us unprivileged access to control over access to kernel objects without the system-wide changes that SELinux requires. This means we'll end up allowing the three landlock syscalls
In my tests (see #233) we do need to allow
I lumped this in with other shm operations without actually knowing its precise semantics. Thanks for catching it. EDIT: I've updated the table to forbid
👍 |
For // Verify that brk()/sbrk() will not stomp on existing mappings.
// Exits with status 0 if this property holds.
#include <unistd.h>
#include <sys/mman.h>
#include <stdio.h>
int main(void) {
// Determine the current program break.
void* prog_break = sbrk(0);
// Allocate some memory on the next page.
void* alloc = mmap(prog_break + 4096, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);
if (alloc != prog_break + 4096) {
printf("mmap of prog_break + 4096 failed\n");
return 1;
}
// Try to get more memory via sbrk().
void* new_prog_break = sbrk(4096);
// If sbrk() failed, it's because our allocation prevented it.
if (new_prog_break == (void*)-1) {
printf("sbrk(4096) failed\n");
return 0;
}
// Otherwise, our allocation is presumably stomped-on.
printf("%p -> %p\n", prog_break, new_prog_break);
return 1;
} Output: ./sbrk-test
sbrk(4096) failed |
We also need to trace #define _GNU_SOURCE
#include <assert.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
#define PAGE_SIZE 4096
/* circumvent pkeys for debugging */
unsigned char read_proc_self_mem_byte(void* ptr)
{
unsigned char buf[32] = {0};
int fd = open("/proc/self/mem", O_RDWR);
pread(fd, buf, sizeof(buf), (uint64_t)ptr);
return buf[0];
}
int main(int argc, char** argv)
{
/* allocate memory to protect with pkey */
void *mem = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
memset(mem, 0x5a, PAGE_SIZE);
int pkey = pkey_alloc(0 /* reserved */, PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE);
/* use pkey to forbid access */
pkey_mprotect(mem, PAGE_SIZE, PROT_NONE, pkey);
if (madvise(mem, PAGE_SIZE, MADV_DONTNEED) < 0)
perror("madvise(MADV_DONTNEED)");
printf("[0]=%02x\n", read_proc_self_mem_byte(mem));
/* this assertion fails because madvise(MADV_DONTNEED) bypasses pkeys */
assert(read_proc_self_mem_byte(mem) == 0x5a);
return 0;
} |
Controlling system calls isn't as simple as filtering them on syscall identity, but identity is the first criterion we filter on. I've gone through x86_64 syscalls and categorized them on the basis of what I think we ultimately want/need to do to support them and how they interact with our sandbox.
In general, we will disallow forbidden syscalls with seccomp-bpf discriminating on the syscall number; syscalls that require more complex filtering will have some amount of argument inspection done directly by logic in seccomp-bpf and some by userspace helper.
In some cases we can give the userspace helper more latitude for inspecting arguments (e.g. those in buffers pointed at by syscall arguments) by wrapping syscalls in a shim that "freezes" these arguments by copying them to a read-only memory region. This allows us to avoid TOCTTOU by checking pointers against a known map of frozen memory regions and to racelessly inspect pointed-to data if it is within a frozen region.
I've attempted to categorize all syscalls for x86_64, but we should implement filtering as an allowlist prioritized based on the syscalls used by common programs, our tests, and nginx. I'll file a separate issue on syscall usage characterization and which subsets of the below table to prioritize.
List of x86_64 syscalls is from here: https://syscalls.mebeim.net/?table=x86/64/x64/v6.3
The table (editable here) is inlined below, as converted with this tool followed by
sed -r -e 's/ +/ /g'
to collapse spaces and make it fit in GH's comment character limit:syscalls, categorization, and policy
The text was updated successfully, but these errors were encountered: