Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verify that the write caused by set_tid_address is constrained by pkeys #292

Closed
Tracked by #233
fw-immunant opened this issue Sep 25, 2023 · 2 comments
Closed
Tracked by #233

Comments

@fw-immunant
Copy link
Contributor

fw-immunant commented Sep 25, 2023

Split out of #233.

DESCRIPTION
For each thread, the kernel maintains two attributes (addresses) called set_child_tid and clear_child_tid.
These two attributes contain the value NULL by default.

   set_child_tid
          If a thread is started using clone(2) with the CLONE_CHILD_SETTID flag, set_child_tid is set to the
          value passed in the ctid argument of that system call.

          When  set_child_tid  is  set, the very first thing the new thread does is to write its thread ID at
          this address.

My concern is that if a hostile compartment A starts a thread with clone() passing the address of memory owned by victim compartment B as the set_child_tid address, the write to this address may succeed even though the clone() syscall was issued by compartment A. I believe this write is performed by the kernel inside the implementation of clone(), which means that it may or may not respect pkeys depending on how it is implemented.

We should test this; if the write does ignore pkeys, we need to filter calls to clone() to either ensure that the set_child_tid and clear_child_tid addresses are owned by the compartment (and ensure that the latter address stays thusly owned until thread ends) or simply forbid the relevant CLONE_CHILD_SETTID/CLONE_CHILD_CLEARTID) flags.

@fw-immunant
Copy link
Contributor Author

It looks like these writes are performed by put_user; grep the kernel for {set,clear}_child_tid, e.g.: https://elixir.bootlin.com/linux/latest/source/kernel/sched/core.c#L5314

Anyone know whether put_user respects pkeys? I think given that it's not doing a complex dance to circumvent the MMU like /proc/self/mem does (described here) that it likely does respect them.

@fw-immunant
Copy link
Contributor Author

I just wrote a test program (below):

#define _GNU_SOURCE

#include <assert.h>
#include <fcntl.h>
#include <sched.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>

void print_unix(char* s) {
	write(1, s, strlen(s));
}

static int thread_body(void* arg)
{
	print_unix("child thread ran\n");
	return 0;
}

#define PAGE_SIZE 4096
#define STACK_SIZE (PAGE_SIZE * 1024)	/* Stack size for cloned child */

/* circumvent pkeys for debugging */
unsigned char read_proc_self_mem_byte(void* ptr)
{
	unsigned char buf[32] = {0};
	int fd = open("/proc/self/mem", O_RDWR);
	pread(fd, buf, sizeof(buf), (uint64_t)ptr);
	return buf[0];
}

int read_proc_self_mem_int(void* ptr) {
	char out[sizeof(int)];
	for(int i=0; i<sizeof(out); i++) {
		out[i] = read_proc_self_mem_byte((char*)ptr+i);
	}
	int read = -1;
	memcpy(&read, &out, sizeof(int));
	return read;
}

int main(int argc, char** argv)
{
	/* allocate memory to protect with pkey */
	void *mem = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
	memset(mem, 0x5a, PAGE_SIZE);
	int pkey = pkey_alloc(0 /* reserved */, PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE);
	/* comment out this line to see the program fail its assertions */
	pkey_mprotect(mem, PAGE_SIZE, PROT_NONE, pkey);

	/* allocate thread stack */
	char* stack = mmap(NULL, STACK_SIZE, PROT_READ | PROT_WRITE,
				 MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0);
	if (stack == MAP_FAILED)
		return 1;

	char* stack_top = stack + STACK_SIZE;

	int clone_flags = CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM
		| CLONE_SIGHAND | CLONE_THREAD
		| CLONE_SETTLS | CLONE_PARENT_SETTID
		| CLONE_CHILD_CLEARTID;

	int tid = 500;
	int* tid_clear_addr = (int*)mem;
	int* tid_addr = (int*)((char*)mem+8);
	char* tls = malloc(4096 * 64);

	pid_t pid = clone(thread_body, stack_top, clone_flags, argv[1], tid_addr, tls, tid_clear_addr);
	printf("clone() pid %lld\n", pid);
	if (pid < 0)
		return 1;

	usleep(1000);

	printf("*tid_clear_addr=%08x\n", read_proc_self_mem_int(tid_clear_addr));
	assert(read_proc_self_mem_int(tid_clear_addr) == 0x5a5a5a5a);
	printf("*tid_addr=%08x\n", read_proc_self_mem_int(tid_addr));
	assert(read_proc_self_mem_int(tid_addr) == 0x5a5a5a5a);

	return 0;
}

Looks like we're safe; these writes silently fail. Comment out the pkey_mprotect call and see the program fail its assertions due to the writes succeeding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant