Analyzing Malware with Ghidra

Created On 01. Dec 2021

Updated: 2021-12-01 23:50:22.107369000 +0000

Created By: acidghost

Ghidra is a reverse engineering tool created by National Security Agency. It's quite versatile and one of its strongest advantages is the powerful decompiler that is also used in the open source project Radare2 and the functionality to script with python and java. This is not all though. Other awesome use cases include mass malware analysis, vulnerability audition of binary files and development of custom plugins. In this post however, I will go mainly through Ghidra's decompilation. If you are new to Ghidra, you can use this bash script to install it on Debian Linux. Replace the name of the release with the latest one (since this was the latest available at my time) and make sure to have at least 700MB free since it requires the java JDK, which weights some space.

When I first saw the animations that play when the project is opening (see header image), I associated them with the dragons from HOMM3. Somewhat less with the Hydra itself, and more with the Red and Green Dragons.

In this post I will show few approaches of analyzing a malware with Ghidra by reversing this keylogger. It is a nice example, since it uses the WinAPI and we have the source code. Having the source code is great, since we can follow up directly with Ghidra's interpretation and source side by side.

The Keylogger

It is an advanced standalone console application with the ability to log any keystroke within specific time intervals and send the logs over to a remote email server. As I previously tested it on Windows 7 and 10, there wasn't any alert about its presence, however Windows Defender seems to be to recognize it since August 2021. I can't tell for sure if the issue is related to CodeBlocks's build or signature but I had a similar alert after exporting the project.
small
A good fork of the project can be found here and highly appreciated are the comments within the project that describe explicitly what each of the function does.
When compiling take care to set the corresponding flags as mentioned here and in case you experience and error during the build try this. After building it in CodeBlocks, you can find the file in bin>Debug>"your Keylogger". You can also directly download the one I compiled from here and unzip it with the password infected.
Interesting headers:

sendmail.h - uses the local SMTP Client that is configured through a powershell script to send the keystrokes over to the attacker. It uses -ExecutionPolicy Bypass -File.. in Powershell to bypass administrator privileges and send the logs to the server. See more on this here
base64.h - encrypts the locally saved logs with the keystrokes to prevent the victim to immediately realize what is happening out of its contents.
keybhook.h - keeps track of the keystrokes and notifies the attacker what happened during the time the program was run.

Reversing the Keylogger

Upon loading in and analyzing the program, we get an overview of what it does. Checking the main function in the decompiler, we can immediately see the subroutines and functions that the program calls when it runs.

Strings

Every malware reversing process starts usually with inspecting the strings. Usually in CTFs it is rare when the contained strings are offered as generously as in this keylogger.

Since the keylogger is compiled as a standalone application, there is less room for optimizations, which reveals more about its functionality. We can see that the keystrokes and any other passed in strings are recognized. The keylogger could make us the job harder by having the important strings obfuscated. If there weren't any other indicators, the Powershell script could be identified from the visible strings. Looking for some external resources, we can see the email and password that are left as plain strings in the sendmail header.

Imports

Import functions show information on what kind of internal functions, libraries, syscalls the program uses to execute its needs.

From USER32.DLL alone, we can tell that the program uses Windows hooks to send messages somewhere.

Functions

Futher in the Symbol Tree, we can see few functions such as main, WinMainCRTStartup and more in tls_callback_ and FUN_0041. main is the entrypoint of any C/C++ programs, while mainCRTStartup functions are entrypoints of the C runtime library. TLS (thread-local-storage) callbacks run before the application entrypoint and are Windows specific where additional initialization and termination for per-thread data structures happens.
As the program starts it calls WinMainCRTStartup which on it's turn runs a function that uses GetSystemTimeAsFileTime in FUN_004151f0 to get the system time, which are likely executed as a requirement from the timer.h header. After that it invokes __tmainCRTStartup which initializes other relevant components via the WinAPI needed for program's execution.

Into `main`

As seen above and from what we know, main has few imports such as GetMessage that won't reveal much information directly if we try to check them, since these are native C++ functions. This can also happen to other functions that Ghidra cannot define, since they can contain variables that are changed during execution and other things, that often from such static analysis as here, cannot be said much. From the main into the Hook function, we can find familiar parts in the decompiled code, as in the initial keybhook.h.

Base64

It is a good idea to rename the functions but also probably keeping Ghidra's convention with the numbers, so we can keep track of the decompiled structure as seen prepended in the functions below.
small
Looking into the Base64 encode function, we can find familiar parts.

While the strings were a given, we cannot exactly see which parameters are exactly passed to encrypt the log file. We could possibly find out more on this if we were dynamically reversing with multiple test cases, but this would be quite tedious for the scope of this post. This could go well as a rather hard reverse challenge though.
By manually looking up one of the known salts and hovering and then clicking over its reference we land in a massive function, where it seems all the initialization occurs when the program launches.

Sendmail

Digging further into functions, we can also find the activity from the sendmail.h header, where we can clearly see how Powershell is bypassed and misused.

Fini

By analyzing the keylogger, we've seen how Ghidra recognizes standalone compiled C++ code and how the decompiler reflects it. Such analysis can be quickly deployed to understand what the program does and since Ghidra's UI representation is very clean, it can become also comfortable when we need to work with different bytes or assembly code. The tool doesn't have its own debugger for dynamic analysis, but it can be optionally set and on Linux, gdb is used by default. Since Ghidra is a free tool, the number of custom loaders, plugins, extensions is growing, which extend its capabilities of binary code format support and automated analysis. Without any doubt, it is a nice one to have in the toolbelt of any security engineer.

Section: Reverse Engineering

Back

'For every complex problem there is an answer that is clear, simple, and wrong.' - H.L. Mencken