Binary Files and Processes in Linux

Created On 17. Apr 2021

Updated: 2021-06-06 01:55:00.938153000 +0000

Created By: acidghost

In Linux systems, the executable files are known as executable and linkable format (ELF).
Let's read the information of an ELF file. Here I'm launching it against hexdump:

readelf -a /usr/bin/hexdump
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x18f0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          24936 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         28
  Section header string table index: 27

Looks cool, but what's in there? First in the ELF header, the magic is defined. To see what is in those bytes we can use a nice function in pwntools:

In [1]: bytes.fromhex("7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00")
Out[1]: b'\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00'

Or in python:

>>> print(''.join([chr(int(c, 16)) for c in "7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00".split()]))
ELF

Or just echo it out:

$  echo -e $"\x7f\x45\x4c\x46"
ELF

Isn't this amazing? The bytes that spell ELF are 7f 45 4c 46. The 'Magic' in each ELF header indicates that the file is an executable and that is how Linux knows that it should be executed.
This can be also seen with the command 'file'. Just run file /usr/bin/hexdump.
Further, there is information defining that file's architecture (64bit) and how it deals with the numbers (2's complement, little endian). The entry point address defines the address where the execution of the program begins.

Program Header

In program and segment headers it is described how the program should be loaded. The program header of an ELF file will have more entry types, where the most important ones are:

  • INTERP: defines the library that should be used to load this ELF into memory
  • LOAD: defines a part of the file that should be loaded into memory
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000001f8 0x00000000000001f8  R E    0x8
  INTERP         0x0000000000000238 0x0000000000000238 0x0000000000000238
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000005860 0x0000000000005860  R E    0x200000
  LOAD           0x0000000000005b30 0x0000000000205b30 0x0000000000205b30
                 0x00000000000004fa 0x00000000000005b0  RW     0x200000
  DYNAMIC        0x0000000000005c40 0x0000000000205c40 0x0000000000205c40
                 0x0000000000000200 0x0000000000000200  RW     0x8
  NOTE           0x0000000000000254 0x0000000000000254 0x0000000000000254
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x0000000000005190 0x0000000000005190 0x0000000000005190
                 0x00000000000000f4 0x00000000000000f4  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000005b30 0x0000000000205b30 0x0000000000205b30
                 0x00000000000004d0 0x00000000000004d0  R      0x1

In INTERP it is seen the offset and the size of the library that will load the headers. The VirtAddr specifies the virtual memory address and it is often randomized for better security. MemSiz specifies how much will be stored in the memory. In LOAD it is indicated the offset and file size of the headers that are loaded.

Section Headers

In section headers is the metadata that describes the program components. You will find these familiar to the code structure when writing an assembly program in x86. The important ones are:

  • .text - the executable code of the program
  • .plt, .got - for resolving dynamically linked functions/variables
  • .data - initialized data (ex. global arrays with initial values)
  • .rodata - initialised read-only data (ex. string constants)
  • .bss - uninitialized data (ex. global arrays without initial values)
Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000000238  00000238
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000000254  00000254
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000000274  00000274
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .gnu.hash         GNU_HASH         0000000000000298  00000298
       0000000000000060  0000000000000000   A       5     0     8
  [ 5] .dynsym           DYNSYM           00000000000002f8  000002f8
       0000000000000600  0000000000000018   A       6     1     8
  [ 6] .dynstr           STRTAB           00000000000008f8  000008f8
       00000000000002ac  0000000000000000   A       0     0     1
  [ 7] .gnu.version      VERSYM           0000000000000ba4  00000ba4
       0000000000000080  0000000000000002   A       5     0     2
  [ 8] .gnu.version_r    VERNEED          0000000000000c28  00000c28
       0000000000000080  0000000000000000   A       6     2     8
  [ 9] .rela.dyn         RELA             0000000000000ca8  00000ca8
       0000000000000420  0000000000000018   A       5     0     8
  [10] .rela.plt         RELA             00000000000010c8  000010c8
       0000000000000438  0000000000000018  AI       5    23     8
  [11] .init             PROGBITS         0000000000001500  00001500
       0000000000000017  0000000000000000  AX       0     0     4
  [12] .plt              PROGBITS         0000000000001520  00001520
       00000000000002e0  0000000000000010  AX       0     0     16
  [13] .plt.got          PROGBITS         0000000000001800  00001800
       0000000000000008  0000000000000008  AX       0     0     8
  [14] .text             PROGBITS         0000000000001810  00001810
       0000000000002fa2  0000000000000000  AX       0     0     16
  [15] .fini             PROGBITS         00000000000047b4  000047b4
       0000000000000009  0000000000000000  AX       0     0     4
  [16] .rodata           PROGBITS         00000000000047c0  000047c0
       00000000000009d0  0000000000000000   A       0     0     8
  [17] .eh_frame_hdr     PROGBITS         0000000000005190  00005190
       00000000000000f4  0000000000000000   A       0     0     4
  [18] .eh_frame         PROGBITS         0000000000005288  00005288
       00000000000005d8  0000000000000000   A       0     0     8
  [19] .init_array       INIT_ARRAY       0000000000205b30  00005b30
       0000000000000008  0000000000000008  WA       0     0     8
  [20] .fini_array       FINI_ARRAY       0000000000205b38  00005b38
       0000000000000008  0000000000000008  WA       0     0     8
  [21] .data.rel.ro      PROGBITS         0000000000205b40  00005b40
       0000000000000100  0000000000000000  WA       0     0     32
  [22] .dynamic          DYNAMIC          0000000000205c40  00005c40
       0000000000000200  0000000000000010  WA       6     0     8
  [23] .got              PROGBITS         0000000000205e40  00005e40
       00000000000001a8  0000000000000008  WA       0     0     8
  [24] .data             PROGBITS         0000000000206000  00006000
       000000000000002a  0000000000000000  WA       0     0     8
  [25] .bss              NOBITS           0000000000206040  0000602a
       00000000000000a0  0000000000000000  WA       0     0     32
  [26] .gnu_debuglink    PROGBITS         0000000000000000  0000602c
       0000000000000034  0000000000000000           0     0     4
  [27] .shstrtab         STRTAB           0000000000000000  00006060
       0000000000000101  0000000000000000           0     0     1

How do we interact with ELF files?

  • gcc - compile the ELF file
  • patchelf - change libraries,interpreter, etc.
  • objcopy - export import sections
  • objdump - disassemble the ELF file
  • ldd - program to check the shared objects within a file
  • kaitai struct (https://ide.kaitai.io/) - an interactive tool to inspect ELF files

Processes

A process is launched as result of fork and clone system calls that are producing a copy of the calling process. Upon the call of a process, the caller forks itself into a parent and the child of the process. The child process executes (by calling execve system call) becoming the called process.
Every process will have:

  • state - running, waiting, stopped, zombie
  • priority and other scheduling information
  • parent, siblings, children
  • shared resources - files, pipes, sockets
  • virtual memory space
  • security context

When the process is launched it will usually call __libc_start_main() in libc which calls the program's main() function. libc.so is almost always present when calling Linux processes, and it is the standard C library. It provides such functionality as printf(), scanf(), malloc() and many others.

Dynamically and Statically Linked ELF Files

When loading a file, the Linux kernel will check if the file is a dynamically or statically linked ELF file. If the file is a dynamically linked ELF as with hexdump that was analysed before, then the kernel will analyze the interpreter defined in the ELF and let it take control. Dynamically linked means that ELF files rely on specific libraries that they depend on and need to be loaded. ELF files can be also statically linked, which will make them self contained not needing to load the libraries externally. Usually, in Linux systems the ELF files are dynamically linked. Statically linked files could be a more secure practice, however, because of their large size that could cause more strain on the systems, the dynamically linked stay as the preferred choice.

Interpreter

Before the program runs, the kernel checks the beginning of the file. It firstly looks for the ELF interpreter, which is also known as the "loader". We can check which interpreter the file has by running:

$ readelf -a /usr/bin/hexdump | grep interpreter
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]

The interpreter can be temporarily overridden by specifying it directly and then firing with it the process:

$ /lib64/ld-linux-x86-64.so.2 /usr/bin/hexdump -n 100
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0000000 6161 6161 6161 6161 6161 6161 6161 6161

It can be also changed permanently with patchelf --set-interpreter:

$ patchelf --set-interpreter /great_interpreter /usr/bin/hexdump

While this interpreter doesn't exist, hexdump won't be able to execute as before. If we request again:

$ readelf -a /usr/bin/hexdump | grep interpreter

We should see the /great_interpreter set as the interpreter. Why does this happen? When bash initiates a process it launches an execve process and then calls the kernel, which on its turn calls the interpreter that is set. If we strace hexdump, we will see the system calls run by the program during the execution. At the beginning we see how execve starts the child process of bash:

execve("/lib64/ld-linux-x86-64.so.2", ["/lib64/ld-linux-x86-64.so.2", "/usr/bin/hexdump", "-n", "100"], 0x7fff36ff9f48 /* 60 vars */) = 0

It also tries every listed path and with openat it succeeds and reads out different properties of the library:

openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libbsd.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P4\0\0\0\0\0\0"..., 832) = 832

Scripts start with a #! and the kernel will extract the interpreter from the rest of that line and execute it with the original file as an argument. More arguments can be passed in such way:

#! /bin/echo argument1 argument2

Libraries

LD_PRELOAD and LD_LIBRARY_PATH are both environment variables that specify what libraries should be looked at first at when the program is launched. They both are loaded consecutively, so the kernel first loads LD_PRELOAD and then LD_LIBRARY_PATH. LD_LIBRARY_PATH can be set in shell the following way:

$ strace -E LD_LIBRARY_PATH=/some/library/path /usr/bin/hexdump 2>&1 | head -n 20

And then we can see that it runs with the /some/library/path variable:

openat(AT_FDCWD, "/some/library/path/tls/x86_64/x86_64/libbsd.so.0", O_RDONLY|O_CLOEXEC) = -1 ENOENT

Both can be loaded in such way:

 $ LD_LIBRARY_PATH=/some/library/path LD_PRELOAD=somefile.so  /usr/bin/hexdump 

With patchelf we can also set the DT_RPATH:

$ patchelf --set-rpath /some/path

This will look for libraries in specified path.

Virtual Memory Space

Virtual memory space is where all the libraries get loaded to. It also contains the heap, stack, memory mapped by the program, helper regions, kernel code and can be looked at /proc/self/maps. The virtual memory will be dedicated to the process, while the physical memory is shared among the whole system.
To catch the memory layout of hexdump's execution, I launch it against another file and stop the execution with CTRL + Z. Then I can run ps aux | grep hexdump to find the process PID and use it to look it up with cat /proc/PID/maps.

56223c2b1000-56223c2b7000 r-xp 00000000 08:01 265426                     /usr/bin/hexdump
56223c4b6000-56223c4b7000 r--p 00005000 08:01 265426                     /usr/bin/hexdump
56223c4b7000-56223c4b8000 rw-p 00006000 08:01 265426                     /usr/bin/hexdump
56223dcb2000-56223dcd3000 rw-p 00000000 00:00 0                          [heap]
7f2799a0e000-7f2799d3e000 r--p 00000000 08:01 265519                     /usr/lib/locale/locale-archive
7f2799d3e000-7f2799d58000 r-xp 00000000 08:01 1076370                    /lib/x86_64-linux-gnu/libpthread-2.27.so
7f2799d58000-7f2799f57000 ---p 0001a000 08:01 1076370                    /lib/x86_64-linux-gnu/libpthread-2.27.so
7f2799f57000-7f2799f58000 r--p 00019000 08:01 1076370                    /lib/x86_64-linux-gnu/libpthread-2.27.so
7f2799f58000-7f2799f59000 rw-p 0001a000 08:01 1076370                    /lib/x86_64-linux-gnu/libpthread-2.27.so
7f2799f59000-7f2799f5d000 rw-p 00000000 00:00 0 
7f2799f5d000-7f2799f64000 r-xp 00000000 08:01 1076373                    /lib/x86_64-linux-gnu/librt-2.27.so
7f2799f64000-7f279a163000 ---p 00007000 08:01 1076373                    /lib/x86_64-linux-gnu/librt-2.27.so
7f279a163000-7f279a164000 r--p 00006000 08:01 1076373                    /lib/x86_64-linux-gnu/librt-2.27.so
7f279a164000-7f279a165000 rw-p 00007000 08:01 1076373                    /lib/x86_64-linux-gnu/librt-2.27.so
7f279a165000-7f279a34c000 r-xp 00000000 08:01 1050956                    /lib/x86_64-linux-gnu/libc-2.27.so
7f279a34c000-7f279a54c000 ---p 001e7000 08:01 1050956                    /lib/x86_64-linux-gnu/libc-2.27.so
7f279a54c000-7f279a550000 r--p 001e7000 08:01 1050956                    /lib/x86_64-linux-gnu/libc-2.27.so
7f279a550000-7f279a552000 rw-p 001eb000 08:01 1050956                    /lib/x86_64-linux-gnu/libc-2.27.so
7f279a552000-7f279a556000 rw-p 00000000 00:00 0 
7f279a556000-7f279a569000 r-xp 00000000 08:01 1048661                    /lib/x86_64-linux-gnu/libbsd.so.0.8.7
7f279a569000-7f279a768000 ---p 00013000 08:01 1048661                    /lib/x86_64-linux-gnu/libbsd.so.0.8.7
7f279a768000-7f279a769000 r--p 00012000 08:01 1048661                    /lib/x86_64-linux-gnu/libbsd.so.0.8.7
7f279a769000-7f279a76a000 rw-p 00013000 08:01 1048661                    /lib/x86_64-linux-gnu/libbsd.so.0.8.7
7f279a76a000-7f279a76b000 rw-p 00000000 00:00 0 
7f279a76b000-7f279a792000 r-xp 00000000 08:01 1050952                    /lib/x86_64-linux-gnu/ld-2.27.so
7f279a975000-7f279a979000 rw-p 00000000 00:00 0 
7f279a992000-7f279a993000 r--p 00027000 08:01 1050952                    /lib/x86_64-linux-gnu/ld-2.27.so
7f279a993000-7f279a994000 rw-p 00028000 08:01 1050952                    /lib/x86_64-linux-gnu/ld-2.27.so
7f279a994000-7f279a995000 rw-p 00000000 00:00 0 
7fff84bca000-7fff84beb000 rw-p 00000000 00:00 0                          [stack]
7fff84bee000-7fff84bf1000 r--p 00000000 00:00 0                          [vvar]
7fff84bf1000-7fff84bf3000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

There can be found:

  • starting and ending address of region's process address space
  • permissions that the memory allows the process to access
  • offset of the region
  • dev - hex numbers that point to the driver and device from the source where it was mapped from
  • inode - the file number of the source file where it was mapped
  • pathname - the name of the file where the region mapped from. When there is no path name, virtual dynamic shared objects are informed by system calls to switch to kernel mode

Initialization

Each program has functions that run before it is launched. These are called constructors. libc can initialize memory regions for dynamic allocations when program launches. This can be specified also manually at the beginning of the C program:

__attribute_((constructor)) void yey()
{
        puts("yey!");
}

then it can be compiled like this

$ gcc -static-pie -o prorgam-static program.c

and when launching the following program it will yield yey! at the very start. This can be used in combination with LD_PRELOAD inject libraries in the process which can be useful in custom configs and debugging.

Environmental Variables

The environmental variables can be used the behavior of some utilities. Below is a nice program that loops through all environmental variables and outputs them.

int main(int argc, char **argv, char **envp)
{
        for (int i = 0; envp[i] != 0; i++) puts(envp[i]);
}

Compile it:

$ gcc -o env env.c 

Different variables can be added like this ENV=VAR ./env, and the file can be then directly used as source for debugging.

System Calls

there are over 300 System Calls in Linux. A few examples are:

  • int execve(const char *filename, char **argv, char **envp) - replaces the process
  • ssize_t write(int fd, const void *buf, size_t nbytes) - write to a file descriptor

Frequently used syscalls are open, read, write (also used in cat) and fork, execve, wait (in a shell).

Shared Memory

By sharing the memory of a process it is possible to establish a communication that wouldn't require system calls (after establishing it). To do it easily, use a shared memory mapped file in /dev/shm (see man mmap).

Signals

Signals pause the process execution and invoke the handler. Handlers are functions that take one argument - the signal number. The signals that can't be handled are SIGKILL and SIGSTOP. To check the most useful signals see man 7 signal. We encounter signals very often, for example 'segmentation fault' is also a signal.

Termination

When the process terminates, it receives an unhandled signal and exit() system call. Every process must be "reaped". This means that after termination they will stay in zombie state until they are wait() ed by their parent. When this happens their exit code will be returned to the parent and the process will be freed. If the parent dies without waiting on them, they are re-parented to PID 1 and will stay there until they are cleaned up.

References

https://www.intezer.com/blog/research/executable-linkable-format-101-part1-sections-segments/
https://gist.github.com/CMCDragonkai/10ab53654b2aa6ce55c11cfc5b2432a4
https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-1.html
https://pwn.college/modules/intro

Section: Binary Exploitation (PWN)

Back