Dynamic Allocator Misuse

Created On 19. Sep 2021

Updated: 2021-09-19 14:47:49.258751000 +0000

Created By: acidghost

The Dynamic Memory Allocator also termed as the Heap, is the region in a program's memory which is dynamically allocated and freed upon use. (also see Memory Segments).

Table of contents

  • Dynamic Allocators
    • mmap
    • The Heap
    • Reminisce the Data Segment
    • malloc
    • What can go wrong?
    • How to detect issues?
  • Dangers
    • Memory Leaks
    • Heap Exhaustion
    • Use After Free
  • tcache
    • Linked List
    • tcache Freeing
    • tcache Allocation
    • How do malloc() and free() work in a Linked List
    • Double-Free
    • tcache Poisoning
  • Heap Metadata
    • What's a chunk?
    • Chunk Size
    • Overlapping Metadata
    • Freed Chunks
    • Different Caches
    • tcache Chunks
    • largebin Chunks
    • The Wilderness
  • Heap Metadata Corruption
    • The Unlink Attack
    • Poison Null Byte
    • House of Force
    • House of Spirit
    • Miscellaneous Heapery

Note: This information is constantly changing (see: tcache) and PTMALLOC IS VERY COMPLEX. This is an approximation. The information below refers to the ptmalloc cache design in libc 2.31.

Dynamic Allocators

Doug Lea released dlmalloc into public domain in 1987:

  • Linux: ptmalloc (Posix Thread aware fork of dlmalloc)
  • FreeBSD: jemalloc (also used in Firefox, Android)
  • Windows: Segment Heap, NT Heap
  • Kernel allocators: kmalloc (Linux kernel memory allocator), kalloc (iOS kernel memory allocator)

This post describes specifically ptmalloc, the allocator in libc on Linux. Other allocators work differently where jemalloc for example has no inline metadata. The information below is enough for understanding how they can be implemented and misused.

mmap

mmap (mmap(0, num_pages*0x1000, ...) creates pages of memory on demand and interacts with them. It allows dynamic allocation/deallocation of how the program needs and that memory survives across functions. However, there are few issues. The allocation has to be a multiple of 4096 bytes which is the size of a page. It is also very slow requiring kernel involvement.

Imagine a library that has mmap()ed lots of memory and hands out small chunks of it on demand, similar to:

char *firstname = allocate_memory(128);
char *lastname = allocate_memory(256);
scanf("%s %s", firstname, lastname);
printf("Hello %s %s!", firstname, lastname);
free_memory(firstname);
free_memory(lastname);

To handle the memory allocation there already is a special function that works and does similar stuff mentioned in the code above.

The Heap

The heap, as implemented by ptmalloc/glibc (and analogues), provides:

  • malloc() - allocate some memory
  • free() - free a prior allocated chunk

And some auxiliary functions:

  • realloc() - change the size of an allocation
  • calloc() - allocate and zero-out memory

These functions are used, extensively, by practically every single non-trivial piece of software. ptmalloc actually does not use mmap.

Reminisce the Data Segment

  • historic oddity from segmented memory spaces of yore
  • with ASLR, placed randomly into memory near-ish the PIE base
  • starts out with a size of 0
  • managed by the brk and sbrk system calls:
    • sbrk(NULL) returns the end of the data segment
    • sbrk(delta) expands the end of the data segment by delta bytes
    • brk(addr) expands the end of the data segment to addr

Under the hood, this is managed just like mmap(). ptmalloc slices off bits of the data segment for small allocations, and uses mmap() for large allocations.

What can go wrong?

The heap is:

  • used by imperfect human programmers
  • humans forget to free memory
  • humans forget all the spots where they store pointers to data
  • humans forget what they've freed
  • a library that strives for performance
  • allocation and deallocation needs to be fast, or programs will slow down
  • optimizations often leave security as an afterthought

Bugs caused by #1 become security issues due to #2 if not caught!

How to detect issues?

Valgrind can detect heap misuse, if your testcases can trigger it. glibc itself has some hardening techniques:

  • export MALLOC_CHECK_=1
  • export MALLOC_PERTURB_=1
  • export MALLOC_MMAP_THRESHOLD_=1

There are various "more secure" allocators being developed (but not really deployed) Like many other issues, no general techniques exist for detecting dynamic allocation errors.

Dangers

To see more why the heap can have leaks, let's check the lifecycle of allocator security:

---> Allocator ->
|     app developers: it should be faster! ->
|     alloc developers: here is another optimization layer to speed it up! ->
|     security researchers: oh no, this optimization layer is big security issue! ->
|     allocator developers: FORGET SECURITY, WE NEED SPEED! ->
|     security researchers: exploit it and pass it to media ->
<--- alloc developers: ok, here is a security fix!

So, there is more than one way to misuse the heap!

  • Forgetting to free memory
    - leads to resource exhaustion

  • Forgetting that we have freed memory
    - using free memory
    - freeing free memory

  • Corrupting metadata used by the allocator to keep track of heap state
    conceptually similar to corruption internal function state on the stack. Take for example a buffer overflow. While it can leak up the stack by overflowing the values there, in the heap we can clobber up its metadata which will make it go crazy.

Memory Leaks

Problem: Allocated memory must be explicitly freed.

    int foo()
    {
        char *blah = malloc(1024);
        // use blah in safe ways
        return 1;
    }

If you malloc something and then you forget to free it before losing references to it, as in the example above, it will remain there until the process dies. The memory pointed by blah is never freed and will just remain there in the data segment. This could lead to more issues, including data leaks and heap exhaustion.

Heap Exhaustion

In the code below, after some thousands of allocations, it will return nil.

int main()
{
        int i = 0;
        char *a = 1;
        while (a)
        {
                i++;
                a = malloc(0x1000000);
                printf("Allocated: %p\n", a);
        }
        printf("%d\n", i);
}

This might not always be an issue, but when the return value of malloc is not checked, this can lead to such issues as memory access at 0 or kernel corruption.

Use After Free

int main() {
    char *user_input = malloc(8);

    printf("Name? ");
    scanf("%7s", user_input);
    printf("Hello %s!\n", user_input);
    free(user_input);

    long *authenticated = malloc(8);
    *authenticated = 0;

    printf("Password? ");
    scanf("%7s", user_input);

    if (getuid() == 0 || strcmp(user_input, "hunter2") == 0) *authenticated = 1;
    if (*authenticated) sendfile(0, open("/flag", 0), 0, 128);
}

In the example, if we call free on a pointer on line 7 and then use it on line 13, this could lead to more issues. The memory that gets freed at first, stays valid and might be reused. When we are allocate a variable that tells us if we are authenticated or not in line 10 and 11, and then writing with scanf() on line 13, the variable will be overwritten with user input.

tcache

tcache stands for "Thread Local Caching" in ptmalloc, to speed up repeated (small) allocations in a single thread. It is quite a recent feature as of the time this post was written, introduced in glibc 2.26 in 2017. It is implemented as a singly-linked list, with each thread having a list header for different-sized allocations.

typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

In each thread there is one instance of tcache_perthread_struct and the struct contains an array of tcache entry pointers, each pointing to a beginning of a list, of tcache entries - allocations. These allocations are called bins, where different sized allocations go into different lists. tcache is:

  • a caching layer for "small" allocations (<1032 bytes on amd64)
  • makes a singly-linked-list using the first word of the free chunk
  • has few security checks

Linked List

A tcache entry points to an array with 2 pointers, next and key. tcache entry is the allocation itself, where next will point to the next allocation and key will point to the original structure.

typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

typedef struct tcache_entry
{
  struct tcache_entry *next;
  struct tcache_perthread_struct *key;
} tcache_entry;

In the figure below, we can see how the tcache_perthread_struct Bën has different tcache_entrys that point to different allocations. With different allocation (malloc(x size)), entries will get retained and the count of sizes of corresponding allocations will be tracked upon free()ing an entry.

We can see a single-linked list, where there is just one forward pointer per entry. In a double-linked list, for example, b will point back to a. Read more in detail how this struct works here.

tcache freeing

Each tcache_entry is actually the exact allocation that was freed! On free(), the following happens:

  1. Select the right "bin" based on the size:
idx = (freed_allocation_size - 1) / 16;
  1. Check to make sure the entry hasn't already been freed (double-free):
((unsigned long*)freed_allocation)[1] == &our_tcache_perthread_struct;
  1. Push the freed allocation to the front of the list!
((unsigned long*)freed_allocation)[0] = our_tcache_perthread_struct.entries[idx];
our_tcache_perthread_struct.entries[idx] = freed_allocation;
our_tcache_perthread_struct.count[idx]++;
  1. Record the tcache_perthread_struct associated with the freed allocation (for checking against double-frees)
((unsigned long*)freed_allocation)[1] = &our_tcache_perthread_struct

tcache allocation

On allocation, the following happens:

  1. Select the bin number based on the requested size:
idx = (requested_size - 1) / 16;
  1. Check the appropriate cache for available entries:
if our_tcache_perthread_struct.count[idx] > 0;
  1. Reuse the allocation in the front of the list if available:
unsigned long *to_return = our_tcache_perthread_struct.entries[idx];
tcache_perthread_struct.entries[idx] = to_return[0];
tcache_perthread_struct.count[idx]--;
return to_return;

Things that are not done:

  • clearing all sensitive pointers (only key is cleared for some reason).
  • checking if the next (return[0]) address makes sense

How do malloc() and free() work in a Linked List

when malloc requests a size of 16 bytes in memory (malloc(16)) it tracks the pointer between the previous size and what to return:

--------------------------------------------------------
prev_chunk | what malloc gives to you back 16 bytes | metadata
--------------------------------------------------------
          / \
           |
           |
           |

the linked list tracks the size of the allocated memory and this is also how free() knows what should be freed. In a linked list the first malloc however doesn't have a previous size to track, so it will get that memory differently. Imagine a tcache operation with a size 16:

--------------------------------------------------------
prev_chunk | aaaaaaaakeykeyky | metadata
--------------------------------------------------------

the first 8 bytes will be what malloc requested and the next 8 will be the key it tracks. When free() is called it goes back to the previous 8 bytes and they will be ready for the next malloc:

malloc(16)
tcache size 16 = 00000000
--------------------------------------------------------
prev_chunk | aaaaaaaakeykeyky | metadata
--------------------------------------------------------
          / \
           |
           |
           

malloc(16)
tcache size 16 = b
--------------------------------------------------------
prev_chunk | bbbbbbbbkeykeyky | metadata
--------------------------------------------------------
          / \
           |
           |
           a

free(a)
tcache size 16 = aaaaaaaa
--------------------------------------------------------
prev_chunk | 00000000keykeyky | metadata
--------------------------------------------------------
          / \
           |
           |
           a

free(b)
tcache size 16 = bbbbbbbbb
--------------------------------------------------------
prev_chunk | aaaaaaaakeykeyky | metadata
--------------------------------------------------------
          / \
           |
           |
           b

The following way tcache tracks the linked list operations, by maintaining the recursive path of pointers where the requested size is referenced for the next operation. In glibc 2.27 used in Ubuntu 18.04 for example, another free would produce:

free(b)
tcache size 16 = bbbbbbbbb
--------------------------------------------------------
prev_chunk | bbbbbbbbbkeykeyky | metadata
--------------------------------------------------------
          / \
           |
           |
           b

This is an issue, because now all the future mallocs will return bbbbbbbbb. This was handled in the more actual versions of glibc and if someone will try such operation, the program will throw an exception that is triggered by the referenced above keykeyky key. When this is allowed, a double free is produced.

Double-Free

Above we saw how a double free can occur. Let's check how it can still occur:

int main()
{
        unsigned long long *a;

        a = malloc(64);
        printf("Original malloc: %p\n", a);

        printf("Freeing: %p\n", a);
        free(a);

        a[1] = 12345678;

        printf("Double-freeing: %p\n", a);
        free(a);

        printf("malloc 1: %p\n", malloc(64));
        printf("malloc 2: %p\n", malloc(64));
}

By corrupting the available space after a free(), it is possible to trigger a double free without the exception. The next 2 pointers will point to the same address:

Original malloc: 0x56209a2f6260
Freeing: 0x56209a2f6260
Double-freeing: 0x56209a2f6260
malloc 1: 0x56209a2f6260
malloc 2: 0x56209a2f6260

tcache poisoning

What happens if we corrupt tcache_entry->next?

int main()
{
        char stack_buffer[16];
        unsigned long long *a;
        unsigned long long *b;

        //warm up the tcache
        a = malloc(16);
        b = malloc(16);

        free(b);
        free(a);

        // corrupt the next pointer
        a[0] = &stack_buffer;

        printf("Stack buffer: %p\n", &stack_buffer);
        printf("First malloc: %p\n", malloc(16));
        printf("Second buffer: %p\n", malloc(16));

}

We create a stack buffer, free it and then corrupt the next pointer of the freed entry that used to point to b (the last thing that was freed). With the first malloc(x), it will return a and with the second, we will get what a was pointing to next, which is our stack buffer:

Stack buffer: 0x7ffd65b11830
First malloc: 0x558a5c019260
Second buffer: 0x7ffd65b11830

We see our stack buffer, because we corrupted the next pointer to which a was pointing. This way, we can return anywhere we want, as long it is a valid memory where malloc() can return.

Heap Metadata

As we saw with tcache, the ptmalloc uses a bunch of metadata to track its operation. It keeps them in:

  • global metadata (i.e., the tcache structure)
  • per-chunk metadata

When an allocation is freed, that area (the pointer that is returned by malloc) is reused for tcache metadata to create a tcache single used list. This is called per-chunk metadata.

What's a chunk?

malloc(x) returns mem_addr, but in actuality, ptmalloc tracks chunk_addr:

                       ----------------------------------           
chunk_addr ----------> | unsigned long mchunk_prev_size; |
                       |    unsigned long mchunk_size;   |
                       ---------------------------------- 
mem_addr ------------> | USABLE MEMORY (at least size x) |
                       ---------------------------------- 

Size

malloc(n) guarantees at least n usable space, but chunks sizes are multiples of 0x10 (multiple of 16).

                       ----------------------------------           
chunk_addr ----------> | unsigned long mchunk_prev_size; | <--- not used for tcache
                       |    unsigned long mchunk_size;   | <- last 3 bits are flags: 0 - PREV_IN_USE; 1 - IS_MAPPED; 2 - NON_MAIN_AREA
                       ---------------------------------- 
mem_addr ------------> | USABLE MEMORY (at least size x) |
                       ---------------------------------- 

As seen in tcache the USABLE MEMORY is freed by the allocation and then used by the corresponding implementation to store its metadata. Before that in chunk_addr is stored the size of the allocations. If the chunk before unsigned long mchunk_size is not in use then in unsigned long mchunk_prev_size will be the size of the previous chunk. unsigned long mchunk_prev_size will be valid only while the chunk is in use. Let's check an example:

#include <stdio.h>

int main(int argc, char **argv)
{
        // disable buffering in libc
        // to prevent printf and scanf use it internally
        // to allow us to see the sizes
        setbuf(stdout, NULL);
        //allocate our chunks
        int size = strtoll(argv[1], 0, 0);
        unsigned long *a = malloc(size);
        unsigned long *b = malloc(size);

        free(a); // free the earlier one
        malloc(size*2);

        printf("a_chunk: %#p\n", (a-2));
        printf("b_chunk: %#p\n", (b-2));
        puts("");
        printf("a_chunk->prev_size: %#llx\n", *(a-2));
        printf("a_chunk->size: %#p\n, %#llx\n", *(a-1));
        printf("b_chunk->prev_size: %#llx\n", *(b-2));
        printf("b_chunk->size: %#llx\n", *(b-1));
}

We allocate a and b and then a is freed. Afterwards there is the output of previous chunk and the size. In C, when a number is added to a pointer, it will shift that pointer by that number times size of that pointer. By subtracting *(a-2) for instance, can be seen where the pointer is pointing 2 times 8 bytes (the size of pointer) behind. In this way the chunk addresses of unsigned long mchunk_prev_size (a) and unsigned long mchunk_size (b) can be seen.

The output will yield:

a_chunk: 0x55f1fe624250
b_chunk: 0x55f1fe624680

a_chunk->prev_size: 0
a_chunk->size: 0x431
b_chunk->prev_size: 0x430
b_chunk->size: 0x430

In a_chunk->size is the chunk size. 0x431 - 1 at the end indicates that the previous chunk is in use. Since a was freed, b has a different size - 0x430, where the 0 bit at the end indicates that the chunk is free. Since a was freed, b has the previous size of the chunk - of a.

Overlapping metadata

To save memory, the prev_size field of a chunk whose PREV_IN_USE flag is set (i.e., the previous chunk is not free) is used by the previous chunk.

----------------------------------------  -----------------------------------------
| chunk1: unsigned long *a = malloc(0x10) |  chunk1: unsigned long *b = malloc(0x10) |
________________________________________  ________________________________________
|   prev_size  |  size  |  a[0]  |  a[1]  |   prev_size  |  size  |  b[0]  |  b[1]   |
----------------------------------------  -----------------------------------------

-------------------------------------------
| chunk1: unsigned long *a = malloc(0x18)    |
____________________________________________
|   prev size  |  size  |  a[0]  |    a[1]   |
---------------------------------------------------------------------------
                                  |  chunk1: unsigned long *b = malloc(0x18) |
                                  ___________________________________________
                                  | prev_size |  size  |  b[0]  |  b[1]      |
                                  --------------------------------------------

In ptmalloc the prev_size is checked only when a is free. When a is allocated, it can use the previous size field of b as storage. As an optimization measure, it saves 8 bytes of RAM per allocation when the such cases occur. Let's check an example:

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv)
{
        // disable buffering in libc
        // to prevent printf and scanf use it internally
        // to allow us to see the sizes
        setbuf(stdout, NULL);
        //allocate our chunks
        int size = strtoll(argv[1], 0, 0);
        unsigned long *a = malloc(size);
        unsigned long *b = malloc(size);

        puts("Filling a with 'A'");
        memset(a, 'A', size);

        printf("a_chunk: %#p\n", (a-2));
        printf("b_chunk: %#p\n", (b-2));
        puts("");
        printf("a_chunk->prev_size: %#lx\n", *(a-2));
        printf("a_chunk->size: %#p\n, %#lx\n", *(a-1));
        printf("b_chunk->prev_size: %#lx\n", *(b-2));
        printf("b_chunk->size: %#lx\n", *(b-1));
}

16 bytes is the usable space in a chunk. If we allocate more, 24 for example, we will see upon running the program heap_overlap 24:

Filling a with 'A'
a_chunk: 0x557deb68d250
b_chunk: 0x557deb68d270

a_chunk->prev_size: 0
a_chunk->size: 0x21
b_chunk->prev_size: 0x4141414141414141
b_chunk->size: 0x21

The previous size of b is filled with A's. The metadata of next chunk overlaps with the current one if it is in use. This is the intended use in ptmalloc to save some space and from a security perspective, this is insane.

Freed Chunks

As we saw with tcache, a free()d chunk has additional metadata about the location of other chunks:

                       ----------------------------------           
chunk_addr ----------> | unsigned long mchunk_prev_size; |
                       |    unsigned long mchunk_size;   |
                       ---------------------------------- 
mem_addr ------------> |  CACHE-SPECIFIC METADATA  |
                       ---------------------------------- 

Different Caches

Current design in ptmalloc:

  1. 64 singly-linked tcache bins for allocations of size 16 to 1032 (functionally "covers" fastbins and smallbins)
  2. 10 singly-linked "fast" bins for allocations of size up to 160 bytes
  3. 1 doubly-linked "unsorted" bin to quickly stash free()d chunks that don't fit into tcache or fastbins
  4. 64 doubly-linked "small" bins for allocations up to 512 bytes
  5. doubly-linked "large" bins (anything over 512 bytes) that contain different-sized chunks

tcache Chunks

Free tcache-cached chunks have a pointer to the allocated space of the next chunk and a pointer to the per-thread struct.

                       ----------------------------------           
chunk_addr ----------> | unsigned long mchunk_prev_size; |
                       |    unsigned long mchunk_size;   |
                       ---------------------------------- 
mem_addr ------------> |      struct tcache_entry *next;      |
                       | struct tcache_perthread_struct *key; |
                       ---------------------------------- 

largebin Chunks

When free()d, large are:

  1. Consolidated with adjacent free chunks.
  2. Put into an "unsorted" bin regardless of size.
  3. Properly put into a doubly-linked list later, during the next allocation that they fail to "satisfy".
                       -------------------------------------           
chunk_addr ----------> |  unsigned long mchunk_prev_size;    |
                       |     unsigned long mchunk_size;      |
                       ------------------------------------- 
mem_addr ------------> |       struct malloc_chunk*fd;       |
                       |       struct malloc_chunk*fd;       |
                       |   struct malloc_chunk* fd_nextsize; |
                       |   struct malloc_chunk*bk_next_size; |
                       ---------------------------------------- 

In the struct we can see the forward (*fd) and backwards (*bk) pointers pointing to corresponding chunks:

struct malloc_chunk {
  unsigned long mchunk_prev_size;
  unsigned long mchunk_size;
  struct malloc_chunk* fd;
  struct malloc_chunk* bk;
  struct malloc_chunk* fd_nextsize;
  struct malloc_chunk* bk_nextsize;
};

typedef struct malloc_chunk* mchunkptr;

The Wilderness

The heap is a finite-sized allocation that needs to be manually expanded. During allocation, malloc() will (simplified view):

  1. Look for a free chunk that will satisfy the allocation, and return it.
  2. Otherwise, check the "available" space left at the end of the heap. If there is enough there, return that and reduce the "available" space.
  3. If there isn't enough "available" space, malloc() will mmap() certain large allocations.
  4. Otherwise, malloc will grow the heap with brk() and go to #2.

How does malloc() store the available space? In the "Wilderness", a fake chunk at the end of the heap that stores the available space. Let's check this closer:

#include <stdlib.h>
#include <stdio.h>\

int main(int argc, char **argv)
{
        printf("Warming up the heap: %p\n", malloc(16));
        unsigned long *a;

        a = malloc(16);
        printf("malloc(16): %p\n", a);
        printf("Wilderness size after 1 allocations: %#llx\n", a[3]);

        a = malloc(16);
        printf("malloc(16): %p\n", a);
        printf("Wilderness size after 2 allocations: %#llx\n", a[3]);

        a = malloc(16);
        printf("malloc(16): %p\n", a);
        printf("Wilderness size after 3 allocations: %#llx\n", a[3]);

        a = malloc(0x20890);
        printf("malloc(0x20890): %p\n", a);
        a = malloc(16);
        printf("malloc(16): %p\n", a);
        printf("Wilderness size after giant allocation: %#llx\n", a[3]);
}

We allocate 16 bytes printing a[3] which is the size metadata of the next chunk. Let's recap:

  • a[0] - first 8 bytes, which are allocated at the beginning to set the chunk to be in use/previous chunk in use.
  • a[1] - next 8 bytes, the chunk in use
  • a[2] - reserved 8 bytes or the previous size which is used when the chunk is allocated
  • a[3] - the size metadata of the next chunk or the wilderness.
  • a[4] - first 8 bytes of the tcache metadata of the allocated memory in the next chunk

After that we keep allocating memory to a. Let's examine the wilderness in the output:

Warming up the heap: 0x55d0e9d1c260
malloc(16): 0x55d0e9d1c690
Wilderness size after 1 allocations: 0x20961
malloc(16): 0x55d0e9d1c6b0
Wilderness size after 2 allocations: 0x20941
malloc(16): 0x55d0e9d1c6d0
Wilderness size after 3 allocations: 0x20921
malloc(0x20890): 0x55d0e9d1c6f0
malloc(16): 0x55d0e9d3cf90
Wilderness size after giant allocation: 0x61

We can see how the wilderness keeps decreasing by 20 bytes and with the giant allocation at the end it gets very tiny. In the older versions of libc (before 2.31), the wilderness goes to 0, however in newer ones, libc regrows it by 0x1000.

Heap Metadata Corruption

What might we want to achieve with heap metadata corruption?

  • Modify arbitrary memory - modifying the next pointer in tcache
  • Achieve an overlapping allocation (with other heap structures, stack, etc) - to return the same pointer twice, an allocation inside an allocation, or an allocation pointing to the stack
  • Use either of those capabilities for further control.

The Unlink Attack

Recall the largebin Chunks mentioned above. When a chunk (bigger than the tcache size) is allocated, it is removed from the doubly-linked list. This looks like:

chunk->fd->bk = chunk->bk;
chunk->bk->fd = chunk->fd;

If you control chunk->fd and chunk->bk, you can overwrite an arbitrary location in memory with an arbitrary (but valid) pointer. In other words, when you remove a chunk from a largebin, you will look at the chunk forward of the previous chunk (chunk->fd->bk). You will set its back address to the chunk behind your chunk (chunk->bk) + the forward address (->fd) of the chunk behind your chunk (= chunk->bk;) to the chunk in front of your chunk (chunk->fd;).

Extra checks have been introduced in libc 2.31. (chunk->fd->bk == chunk && chunk->bk->fd == chunk) have heavily weakened this attack, (though you can still pass this check if you inject a chunk!).

Poison Null Byte

What if all we have is a single 0-byte write? This is common, with off-by-one string operations!

buf = malloc(0x1008);
int read_length = read(0, 0x1008, 128);
buf[read_length] = 0;

In the program above, the extra allocated 8 bytes, will be the previous size of the next chunk. In this example we have a one byte overflow which overflows the size of the previous chunk. Imagine 3 allocations beside each other where the one in the middle is freed. After overflowing it in the middle with a 0 ending byte, it will get truncated to 0x1000 from 0x1008. However, C will count the size as 0x1008. In this case, the previous size won't be properly updated anymore. Upon freeing the chunk on the right, the overflowed chunk in the middle will be consolided with the one in the front. This will lead to further allocation over the overflowed chunk in the middle and it will get overlapped. This vulnerability was however patched in the past, but bypasses do exist as well. Read more in detail on this attack here https://heap-exploitation.dhavalkapil.com/attacks/shrinking_free_chunks

House of Force

This is another patched but interesting attack, where the size of the wilderness gets corrupted. Here the wilderness gets overwritten with a gigantic negative number and confuse the heap in such way that the next allocation would be the previous one. Another fun thing, you can allocate a lot of memory in front and have that memory extra onto the stack in front. See https://github.com/shellphish/how2heap/blob/master/glibc_2.27/house_of_force.c

House of Spirit

This is one of the first House exploits that appeared and since over 15 years it is still actual! It exploits the fact that there are very few checks done at free() time. This is how it works:

  1. Forge something that looks like a chunk.
  2. free() it.
  3. The next malloc() will return that chunk to you!

With a pointer overwrite, can be used to later malloc() a stack pointer. Can be done with or without tcache. Let's check an example:

struct malloc_chunk {
        unsigned long prev_size;
        unsigned long size;
        struct malloc_chunk* fd;
        struct malloc_chunk* bk;
        struct malloc_chunk* fd_nextsize;
        struct malloc_chunk* bk_nextsize;
};

int main(int argc, char **argv)
{
        printf("Warming up the heap: %p\n", malloc(16));
        unsigned long *a;
        struct malloc_chunk stack_chunk = { 0 };

        a = malloc(16);
        printf("Pre-poison malloc: %p\n", a);

        // fake our chunk
        stack_chunk.prev_size = 0;
        stack_chunk.size = 0x20; // create a chunk of size of 16 bytes
        free(&stack_chunk.fd); // we free the chunk (which is actually the one found in the struct above)

        // on the next malloc, our fake chunk will be returned
        a = malloc(16);
        printf("Post-poison malloc: %p\n", a);
}

In the output we can see, that after poisoning the previous allocation, a stack address is returned:

Warming up the heap: 0x557c163a6260
Pre-poison malloc: 0x557c163a6690
Post-poison malloc: 0x7ffe517bc480

Also see https://github.com/shellphish/how2heap/blob/master/glibc_2.31/tcache_house_of_spirit.c

Miscellaneous Heapery

How do you trigger a malloc() in an uncooperative program? In fact printf(), scanf(), and friends will use malloc() to allocate space during operation. With the right setup, can lead straight to an overwrite. For example, initiating the House of Spirit and then calling printf() can lead to a straight buffer overflow. If you want to disable this in your own programs (for example when using printf() for debugging), disable libc's internal buffers with:

setbuf(stdout, NULL);
setbuf(stdin, NULL);

Special and Amazing Thanks to Professor Zardus for this Awesome Module!

Extra Literature and References:

https://pwn.college/modules/heap
malloc internals: https://sourceware.org/glibc/wiki/MallocInternals
A guidebook on the heap: https://heap-exploitation.dhavalkapil.com
Automated heap security analysis engine: https://github.com/angr/heaphopper
One of the earliest heap exploits from 2000's: https://www.openwall.com/articles/JPEG-COM-Marker-Vulnerability.
Vudo malloc tricks, by MaXX: http://phrack.org/issues/57/8.html
Once upon a free(), by anonymous: http://phrack.org/issues/57/9.html
Phantasmal Phantasmagoria developed a "lore" around heap metadata corruption: https://seclists.org/bugtraq/2005/Oct/118
Heap Automation: https://www.usenix.org/conference/usenixsecurity18/presentation/heelan
Repository of heap misuse examples https://github.com/shellphish/how2heap

Section: Binary Exploitation (PWN)

Back