# Analyzing Malicious Windows Programs ## The Windows API * What is the Windows API? * A broad set of functionality that governs the way that malware interacts with the Microsoft libraries * Uses its own names to represent C types * ***Hungarian Notation*** * used for API function identifiers * Uses a prefix naming scheme that makes it easy to identify a variable's type

### Handles * What are handles? * Items that have been opened or created in the OS: * Window * Process * Module * Menu * File * Cannot be used in arithmetic operations * Do not always represent the object's address * Only thing you can do with handles is store it and use it in a later function call to refer to the same object * Example: * `CreateWindowEx` function - returns an `HWND`, which is a handle to a window ### File System Functions * Common ways that malware interacts with the system: * creating or modifying files * Distinct filenames * Changes to existing filenames * Functions for accessing the file system * `CreateFile` * Used to create and open files * Can open existing files, pipes, streams, and I/O devices * Can also create new files * `dwCreationDisposition` parameter controls whether the function creates a new file or opens an existing one * `ReadFile` and `WriteFile` * Used for reading and writing to files * Operate on files as a stream * `CreateFileMapping` and `MapViewOfFile` * ***File mappings*** are commonly used by malware writers because they allow a file to be loaded into memory and manipulated easily * `CreateFileMapping` - loads a file from disk into memory * `MapViewOfFile` - returns a pointer to the abuse address of the mapping, can be used to access the file in memory * Malware calling these functions could use the pointer returned from `MapViewOfFile` to read and write anywhere in the file * Handy when parsing a file format * Malware can obtain map of file, make changes in memory and execute the PE file as if it had been loaded by the OS loader ### Special Files * Not accessed by their drive letter and folder * Stealthier than regular ones because they don't show up in directory listings * Provide greater access to system hardware and internal data * Can be passed as strings to any of the file-manipulation functions and operate on a file as if it were a normal file ### Shared Files * Special files with names that start with ***\serverName\share*** * Access directories or files in a shared folder stored on a network * The `\\?\` prefix tells the OS to disable all string parsing and allows access to longer filenames ### Files Accessible via Namespaces * ***Namespaces*** * Thought of as a fixed number of folders, each storing different types of objects. * **NT Namespace** * Lowest level namespace is the NT namespace with the `\` prefix * The `NT namespace` has access to all devices and all other namespaces exist within the NT namespace * **Win32 device namespace** * Prefix `\\.\` * Often used by malware to access physical devices directly, and read and write to them like a file * Example: `\\.\PhysicalDisk1` to directly access Disk1 (ignoring the file system) allowing it to modify it in ways not possible using the API * Malware might be able to read and write data to an unallocated sector without creating or accessing files, allows it to avoid detection by AV and security programs * Example: * Witty worm * accessed `\Device\PhysicalDisk1` via the NT namespace to corrupt its victim's file system * Would open it and write to a random space on the drive at regular intervals, eventually corrupting the victim's OS and rendering it unable to boot * Malware can also access physical memory directly, allows user-space programs to write to kernel space. * This technique is used by malware to modify the kernel and hide programs in user space ### Alternate Data Streams * Allows additional data to be added to an existing file within NTFS, essentially adding one file to another * Extra data does not show up in a directory listing and it is not shown when displaying the contents of the file; only visible when you access the stream * Named according to the convention ***normalFile.txt:Stream:$DATA*** * Allows a program to read and write to a stream * Malware authors like ADS because it can be used to hide data ## The Windows Registry * Malware often uses the Registry for persistence or configuration data * Malware adds entries into the registry that will allow it to run automatically when the computer boots * Writing entries to the `Run` subkey set up software to run automatically - often used by malware to launch itself automatically ### Common Registry Functions * Malware uses registry functions that are part of the Windows API to modify the registry to run automatically when the system boots * Common Functions: * `RegOpenKeyEx` - opens a registry for editing and querying * `RegSetValueEx` - adds a new value to the registry and sets its data * `RegGetValue` - returns the data for a value entry in the registry * If you see these in malware, you need to identify the registry keys they are accessing

### Registry Scripting with .reg Files * They are like scripts for changing the registry * Files with a ***.reg*** extension contain human-readable registry data. * When a user double-clicks a ***.reg*** file, it automatically modifies the registry by merging the information the file contains into the registry * Malware uses ***.reg*** files to modify the registry

### Networking APIs * Malware relies on network functions to do its dirty work * Malware most commonly uses Berkeley compatible sockets (primarily implemented in ***ws2\_32.dll***)

* `WSAStartup` function has to be called before any other networking functions to allocate resources for the networking libraries. * While debugging code, set a breakpoint on `WSAStartup` #### Server and Client Sides * ***Server side*** - maintains an open socket waiting for incoming connections * Steps: * socket * bind * listen * accept * send/recv * ***Client side*** - connects to a waiting socket * Steps: * socket call * connect call * send/recv calls

### The WinINet API * A higher-level API * Functions are stored in ***Wininet.dll*** * Implements protocols like HTTP and FTP at the application layer * You can gain an understanding of what malware is doing based on connections it opens * Functions * `InternetOpen` - used to initialize a connection to the Internet * `InternetOpenUrl` - used to connect to a URL * `InternetReadFile` - allows the program to read the data from a file downloaded from the Internet * Malware can use this to connect to a remote server and get further instructions for execution ## Following Running Malware * First and most common way to access code outside a single file is through the use of DLLs ### DLLs * ***Dynamic Link Libraries (DLLs)*** * Windows' way to use libraries to share code among multiple applications * An executable file that does not run alone, but exports functions that can be used by other applications. * Main advantages * Memory used by the DLLs can be shared among running processes * When distributing an executable, you can use DLLs that are known to be on the host Windows system without needing to redistribute them * DLLs are useful code-reuse mechanism * Maintain a single library of common code and distribute it only when needed. #### How Malware Authors use DLLs * **To store malicious code** * Store malicious code in a DLL rather than in an `.exe` file * Malware sometimes uses DLLs to load itself into another process * **By using Windows DLLs** * Functionality needed to interact with the OS * **By using third-party DLLs** * Malware can use third-party DLLs to interact with other programs * Example - use the Mozilla Firefox DLL to connect back to a server, rather than connecting directly through the Windows API #### Basic DLL Structure * DLLs use the PE file format * Only a single flag indicates that the file is a DLL * Often have more exports and fewer imports * Other than these there is no real difference between a DLL and an `.exe` * `DllMain` * Main DLL function * It has no label * Is not an export in the DLL, but it is specified in the PE header as the file's entry point * Function is called to notify the DLL whenever a process * Loads or unloads the library * Creates a new thread * Finishes an existing thread * This notification allows the DLL to manage any per-process or per-thread resources ### Processes * Malware can execute code outside the current program by creating a new process or modifying an existing one * Windows uses processes as containers to manage resources and keep separate programs from interfering with each other * Each process is given a memory space that is separate from all other processes and that is a sum of memory addresses that the process can use * When the process requires memory, the OS allocates memory and give the process an address that it can sue to access the memory * Processes can share memory addresses * Addresses are the same, but the physical memory that stores the data is not the same * A malicious program that accesses a memory address, will affect only what is stored at that address for the process that contains the malicious code #### **Creating a New Process** * `CreateProcess` - most commonly used function by malware to create a new process * Malware could call this function to create a process to execute it malicious code to bypass host-based firewalls and other security mechanisms * Commonly used by malware to create a simple remote shell with just a single function call * `STARTUPINFO` parameter * includes a handle to the standard input, standard output and standard error streams for a process * malicious programs could set these values to a socket, so that when the program writes to standard output, it is really writing to the socket, allowing an attacker to execute a shell remotely without running anything other than the call to `CreateProcess`

* Call to `CreateProcess` * creates a new process so that all input and output are redirected to a socket * Malware often creates a new process by storing one program inside another in the resource section * When the program runs * Extracts the additional executable from the PE header, writes it to disk and then call `CreateProcess` to run the program ### Threads * Processes contain threads * Threads are what the Windows OS executes * Threads are independent sequences of instructions that are execute by the CPU without waiting for other threads * Threads within a process all share the same memory space, but each has its own processor registers and stack #### Thread Context * Running threads have complete control of the CPU * When an OS switches between threads, all values in the CPU are saved in a structure (***thread context***) #### Creating a Thread * `CreateThread` function * Used to create new threads * Caller specifies a start address, often called the `start` function * Execution begins at the start address and continues until the function returns * Caller of `CreateThread` can specify the function where the thread starts and a single parameters to be passed to the `start` function * Malware can use `CreateThread` in multiple ways * Used to load a new malicious library into a process * The address of `LoadLibrary` specified as the start address * Argument passed to `CreateThread` is the name of the library to be loaded * The new DLL is loaded into memory in the process and `DllMain` is called * Create two new threads for input and output * One to listen on a socket or pipe and then output that to standard input of a process * The other to read from standard output and send that to a socket or pipe * Goal is to send all information to a single socket or pipe in order to communicate seamlessly with the running application

* Fibers are like threads, but are managed by a thread, rather than by the OS #### Interprocess Coordination with Mutexes * ***Mutexes*** * Also called ***mutants*** when in the kernel * Are global objects that coordinate multiple processes and threads * Mainly used to control access to shared resources * Example * If two threads must access a memory structure, but only one can safely access it at a time, a mutex can be used to control access * Only one thread can own a mutex at a time * Important to malware analysis because they often use hard-coded names, making them good host-based indicators * Hard-coded names are common because mutex's name must be consistent it used by two processes * Threads gains access to the mutex with a call to `WaitForSingleObject` * When a thread is done using a mutex it uses `ReleaseMutex` * `CreateMutex` function * Creates a mutex * Malware will commonly create a mutex and try to open an existing mutex with the same name to make sure that only one version of the malware is funning at a time

### Services * ***Service*** * Another way for malware to execute additional code * Services run as background applications * Scheduled and run by the Windows service manager without user input * Advantages for malware writers * Services are normally run as `SYSTEM` or another privileged account * `SYSTEM` account has more access than administrator or user accounts * Provide another way to maintain persistence on a system * Users wouldn't find anything suspicious, because malware is not running in a separate process * Key Windows API functions related to services: * `OpenSCManager` * Returns a handle to the service control manager * Used for all subsequent service-related function calls * Any code that interacts with services will call this function * `CreateService` * Adds a new service to the service control manager * The caller can specify whether the service will start automatically at boot time or has to be started manually * `StartService` * Starts a service * Used only if the service is set to be started manually * Most common service types used by malware * `WIN32_SHARE_PROCESS` * Stores the code for the service in a DLL * Combines several different services in a single, shared process. * `WIN32_OWN_PROCESS` * Stores the code in an ***.exe*** file and runs as an independent process * `KERNEL_DRIVER` * Used for loading code into the kernel * Information about services is stored in the registry under `HKLM\SYSTEM\CurrentControlSet\Services`

#### **SC Program** * Used to investigate and manipulate services * Commands for adding, deleting, starting, stopping and querying services

### The Component Object Model * An interface standard that makes it possible for different software components to call each other's code without knowledge of specifics about each other * Works with any programming language * Designed to support reusable software components * Implemented as a client/server framework * Each ***thread*** that uses COM has to call the `OleInitialize` or `CoInitializeEx` function at least once prior to calling any other COM library functions #### CLSIDs, IIDs, and the Use of COM Objects * COM objects are accessed via * ***Globally Unique Identifiers (GUIDs)*** * ***Class Identifiers (CLSIDs)*** * ***Interface Identifiers (IIDs)*** * `CoCreateInstance` function * Used to get access to COM functionality * `Navigate` function * Common function used by malware * Allows a program to launch Internet Explorer and access a web address * Interfaces are identified with a GUID called an IID, and classes are identified with a GUID called a CLSID

* The OS uses information in the registry to determine which file contains the request COM code when a program call `CoCreateInstance` * To identify what a malicious program is doing when it calls a COM function, malware analysts have to determine which offset a function is stored at * One strategy for identifying the function called by a COM client to check the header files for the interface specified in the call to `CoCreateInstance` * Some COM objects are implemented as DLLs - loaded into the process space of the COM client executable * COM object is set up to be loaded as a DLL, the registry entry for the CLSID #### COM Server Malware * Malware can implement a malicious COM server that can then be used by other applications * `Browser Helper Objects (BHOs)` * provide common COM server functionality for malware * Third-party plug-ins for Internet Explorer * No restrictions, so malware authors use them to run code running inside the IE process * This allows them to monitor Internet traffic, track browser usage, communicate with the Internet, without running their own process * Usually easy to detect because it exports several functions * `DllCanUnloadNow` * `DllGetClassObject` * `DllInstall` * `DllRegisterServer` * `DllUnregisterServer` ### Exceptions: When Things Go Wrong * Exceptions * Allow a program to handle events outside the flow of normal execution * Caused by errors * When they happen, execution transfers to a special routine that resolves the exception * When an exception occurs, Windows looks in `fs:0` for the stack location that stores the exception information and then the exception handler is called * After the exception is handled, execution returns to the main thread * `Structured Exception Handling (SEH)` * Windows mechanism for handling exceptions * SEH information is stored on the stack

* If the exception handle for the current frame does not handle an exception, it's passed to the exception handler for the caller's frame * If none of the exception handlers responds to an exception, the top-level exception handler crashes the application * Exception handlers can be used in exploit code to gain execution * A pointer to exception-handling information is stored on the stack * During a stack overflow, an attacker can overwrite the pointer * By specifying a new exception handler, the attacker gains execution when an exception happens ### Kernel vs User Mode * **User Mode** * Each process has its own memory, security permissions, and resources * When a program executes an invalid instruction and crashes, Windows can reclaim all the resources and terminate the program. * Cannot access hardware directly * Restricted to only a subset of all the registers and instructions available on the CPU * Relies on the Windows API to manipulate hardware or change the state in the kernel * Presence of `SYSENTER`, `SYSCALL`, `INT 0x2E` instructions in disassembly indicates that a call is being made into the kernel * **Kernel Mode** * All processes running in the kernel share resources and memory addresses * Kernel code has fewer security checks * If the code contains invalid instructions, then the OS cannot continue running, resulting in the famous Windows BSoD * Code running in kernel can manipulate code running in user space, but code running in user space can affect the kernel only through well-defined interfaces * Most security programs (AV and Firewalls) run in kernel mode * Malware running in kernel mode can more easily interfere with security programs or bypass firewalls * OS's auditing features don't apply to the kernel * Nearly all rootkits use code running in the kernel * Only sophisticated malware runs in the kernel * Most malware has no kernel component ### The Native API * Lower-level interface for interacting with Windows that is rarely used by non-malicious programs * Bypasses the normal Windows API

* User applications get access to user APIs like `kernel32.dll` and other DLLs which call `ntdll.dll` * `ntdll.dll` * a special DLL that manages interactions between user space and the kernel * `ntdll` functions use APIs and structures just like the ones used in the kernel * functions make up the Native API * Programs are **not** supposed to call the Native API but nothing in the OS prevents them from doing so * Calling the Native API is attractive for malware because * it allows them to do things that might not otherwise be possible * Additional functionality that is not exposed in the regular Windows API * Stealthier

* Native API calls that provide information about the system, processes, threads, handles and other items * `NtQuerySystemInformation` * `NtQueryInformationProcess` * `NtQueryInformationThread` * `NtQueryInformationFile` * `NtQueryInformationKey` * `NtContinue` * Native API function popular with malware authors * Meant to transfer execution back to the main thread of a program after an exception has been handled * Location to return to is specified in the exception context and it can be changed * Malware often uses this function to transfer execution in complicated ways to confuse an analyst and make a program more difficult to debug * ***Native applications*** * Applications that do not use the Win32 subsystem * Issue calls to the Native API only * Rare for malware but almost nonexistent for non-malicious software, so native applications are likely malicious * Subsystem in the PE header indicates if a program is a native application