# Analyzing Malicious Windows Programs

## The Windows API

* What is the Windows API?
  * A broad set of functionality that governs the way that malware interacts with the Microsoft libraries
  * Uses its own names to represent C types
  * ***Hungarian Notation***
    * used for API function identifiers
    * Uses a prefix naming scheme that makes it easy to identify a variable's type

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2F4zjeIAUl5Pn6rCr5su7U%2Fimage.png?alt=media&#x26;token=4726da37-10ce-4282-a5d5-51d2e141dbde" alt=""><figcaption><p>Common Windows API Types</p></figcaption></figure>

### Handles

* What are handles?
  * Items that have been opened or created in the OS:
    * Window
    * Process
    * Module
    * Menu
    * File
  * Cannot be used in arithmetic operations
  * Do not always represent the object's address
  * Only thing you can do with handles is store it and use it in a later function call to refer to the same object
  * Example:
    * `CreateWindowEx` function - returns an `HWND`, which is a handle to a window

### File System Functions

* Common ways that malware interacts with the system:
  * creating or modifying files
  * Distinct filenames
  * Changes to existing filenames
* Functions for accessing the file system
  * `CreateFile`
    * Used to create and open files
    * Can open existing files, pipes, streams, and I/O devices
    * Can also create new files
    * `dwCreationDisposition` parameter controls whether the function creates a new file or opens an existing one
  * `ReadFile` and `WriteFile`
    * Used for reading and writing to files
    * Operate on files as a stream
  * `CreateFileMapping` and `MapViewOfFile`
    * ***File mappings*** are commonly used by malware writers because they allow a file to be loaded into memory and manipulated easily
    * `CreateFileMapping` - loads a file from disk into memory
    * `MapViewOfFile` - returns a pointer to the abuse address of the mapping, can be used to access the file in memory
    * Malware calling these functions could use the pointer returned from `MapViewOfFile` to read and write anywhere in the file
    * Handy when parsing a file format
    * Malware can obtain map of file, make changes in memory and execute the PE file as if it had been loaded by the OS loader

### Special Files

* Not accessed by their drive letter and folder
* Stealthier than regular ones because they don't show up in directory listings
* Provide greater access to system hardware and internal data
* Can be passed as strings to any of the file-manipulation functions and operate on a file as if it were a normal file

### Shared Files

* Special files with names that start with ***\serverName\share***
* Access directories or files in a shared folder stored on a network
* The `\\?\` prefix tells the OS to disable all string parsing and allows access to longer filenames

### Files Accessible via Namespaces

* ***Namespaces***
  * Thought of as a fixed number of folders, each storing different types of objects.
  * **NT Namespace**
    * Lowest level namespace is the NT namespace with the `\` prefix
    * The `NT namespace` has access to all devices and all other namespaces exist within the NT namespace
  * **Win32 device namespace**
    * Prefix `\\.\`
    * Often used by malware to access physical devices directly, and read and write to them like a file
    * Example: `\\.\PhysicalDisk1` to directly access Disk1 (ignoring the file system) allowing it to modify it in ways not possible using the API
    * Malware might be able to read and write data to an unallocated sector without creating or accessing files, allows it to avoid detection by AV and security programs
      * Example:
        * Witty worm
          * accessed `\Device\PhysicalDisk1` via the NT namespace to corrupt its victim's file system
          * Would open it and write to a random space on the drive at regular intervals, eventually corrupting the victim's OS and rendering it unable to boot
        * Malware can also access physical memory directly, allows user-space programs to write to kernel space.
          * This technique is used by malware to modify the kernel and hide programs in user space

### Alternate Data Streams

* Allows additional data to be added to an existing file within NTFS, essentially adding one file to another
* Extra data does not show up in a directory listing and it is not shown when displaying the contents of the file; only visible when you access the stream
* Named according to the convention ***normalFile.txt:Stream:$DATA***
  * Allows a program to read and write to a stream
* Malware authors like ADS because it can be used to hide data

## The Windows Registry

* Malware often uses the Registry for persistence or configuration data
* Malware adds entries into the registry that will allow it to run automatically when the computer boots
* Writing entries to the `Run` subkey set up software to run automatically - often used by malware to launch itself automatically

### Common Registry Functions

* Malware uses registry functions that are part of the Windows API to modify the registry to run automatically when the system boots
* Common Functions:
  * `RegOpenKeyEx` - opens a registry for editing and querying
  * `RegSetValueEx` - adds a new value to the registry and sets its data
  * `RegGetValue` - returns the data for a value entry in the registry
* If you see these in malware, you need to identify the registry keys they are accessing

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2Fb8jrR09SwjwgtKHNDzdd%2Fimage.png?alt=media&#x26;token=5cad713c-9be7-4b2a-b7e0-44e0567900fe" alt=""><figcaption></figcaption></figure>

### Registry Scripting with .reg Files

* They are like scripts for changing the registry
* Files with a ***.reg*** extension contain human-readable registry data.
* When a user double-clicks a ***.reg*** file, it automatically modifies the registry by merging the information the file contains into the registry
* Malware uses ***.reg*** files to modify the registry

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2FuhSdpOXtL4yvhhIOlpmC%2Fimage.png?alt=media&#x26;token=493e8dea-66c8-435b-96ba-b685d64f9fd4" alt=""><figcaption></figcaption></figure>

### Networking APIs

* Malware relies on network functions to do its dirty work
* Malware most commonly uses Berkeley compatible sockets (primarily implemented in ***ws2\_32.dll***)

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2FXFLzQt60zWVHXwgAKaxW%2Fimage.png?alt=media&#x26;token=cc94cf00-6f98-420b-a804-90cf962a8b97" alt=""><figcaption></figcaption></figure>

* `WSAStartup` function has to be called before any other networking functions to allocate resources for the networking libraries.
  * While debugging code, set a breakpoint on `WSAStartup`

#### Server and Client Sides

* ***Server side*** - maintains an open socket waiting for incoming connections
  * Steps:
    * socket
    * bind
    * listen
    * accept
    * send/recv
* ***Client side*** - connects to a waiting socket
  * Steps:
    * socket call
    * connect call
    * send/recv calls

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2FuPvaJrHdAtP7PiPU1wSZ%2Fimage.png?alt=media&#x26;token=ee6ae18b-32ca-42c4-a58b-bc0dfaebe011" alt=""><figcaption></figcaption></figure>

### The WinINet API

* A higher-level API
* Functions are stored in ***Wininet.dll***
* Implements protocols like HTTP and FTP at the application layer
* You can gain an understanding of what malware is doing based on connections it opens
* Functions
  * `InternetOpen` - used to initialize a connection to the Internet
  * `InternetOpenUrl` - used to connect to a URL
  * `InternetReadFile` - allows the program to read the data from a file downloaded from the Internet
* Malware can use this to connect to a remote server and get further instructions for execution

## Following Running Malware

* First and most common way to access code outside a single file is through the use of DLLs

### DLLs

* ***Dynamic Link Libraries (DLLs)***
  * Windows' way to use libraries to share code among multiple applications
  * An executable file that does not run alone, but exports functions that can be used by other applications.
  * Main advantages
    * Memory used by the DLLs can be shared among running processes
    * When distributing an executable, you can use DLLs that are known to be on the host Windows system without needing to redistribute them
    * DLLs are useful code-reuse mechanism
    * Maintain a single library of common code and distribute it only when needed.

#### How Malware Authors use DLLs

* **To store malicious code**
  * Store malicious code in a DLL rather than in an `.exe` file
  * Malware sometimes uses DLLs to load itself into another process
* **By using Windows DLLs**
  * Functionality needed to interact with the OS
* **By using third-party DLLs**
  * Malware can use third-party DLLs to interact with other programs
  * Example - use the Mozilla Firefox DLL to connect back to a server, rather than connecting directly through the Windows API

#### Basic DLL Structure

* DLLs use the PE file format
* Only a single flag indicates that the file is a DLL
* Often have more exports and fewer imports
* Other than these there is no real difference between a DLL and an `.exe`
* `DllMain`
  * Main DLL function
  * It has no label
  * Is not an export in the DLL, but it is specified in the PE header as the file's entry point
  * Function is called to notify the DLL whenever a process
    * Loads or unloads the library
    * Creates a new thread
    * Finishes an existing thread
  * This notification allows the DLL to manage any per-process or per-thread resources

### Processes

* Malware can execute code outside the current program by creating a new process or modifying an existing one
* Windows uses processes as containers to manage resources and keep separate programs from interfering with each other
* Each process is given a memory space that is separate from all other processes and that is a sum of memory addresses that the process can use
* When the process requires memory, the OS allocates memory and give the process an address that it can sue to access the memory
* Processes can share memory addresses
  * Addresses are the same, but the physical memory that stores the data is not the same
* A malicious program that accesses a memory address, will affect only what is stored at that address for the process that contains the malicious code

#### **Creating a New Process**

* `CreateProcess` - most commonly used function by malware to create a new process
  * Malware could call this function to create a process to execute it malicious code to bypass host-based firewalls and other security mechanisms
  * Commonly used by malware to create a simple remote shell with just a single function call
  * `STARTUPINFO` parameter
    * includes a handle to the standard input, standard output and standard error streams for a process
    * malicious programs could set these values to a socket, so that when the program writes to standard output, it is really writing to the socket, allowing an attacker to execute a shell remotely without running anything other than the call to `CreateProcess`

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2Fgq7wT0u16nvkyz2sZwrr%2Fimage.png?alt=media&#x26;token=14128165-f70d-4dd6-960c-7fc7682e6401" alt=""><figcaption></figcaption></figure>

* Call to `CreateProcess`
  * creates a new process so that all input and output are redirected to a socket
  * Malware often creates a new process by storing one program inside another in the resource section
  * When the program runs
    * Extracts the additional executable from the PE header, writes it to disk and then call `CreateProcess` to run the program

### Threads

* Processes contain threads
* Threads are what the Windows OS executes
* Threads are independent sequences of instructions that are execute by the CPU without waiting for other threads
* Threads within a process all share the same memory space, but each has its own processor registers and stack

#### Thread Context

* Running threads have complete control of the CPU
* When an OS switches between threads, all values in the CPU are saved in a structure (***thread context***)

#### Creating a Thread

* `CreateThread` function
  * Used to create new threads
  * Caller specifies a start address, often called the `start` function
  * Execution begins at the start address and continues until the function returns
  * Caller of `CreateThread` can specify the function where the thread starts and a single parameters to be passed to the `start` function
* Malware can use `CreateThread` in multiple ways
  * Used to load a new malicious library into a process
    * The address of `LoadLibrary` specified as the start address
    * Argument passed to `CreateThread` is the name of the library to be loaded
    * The new DLL is loaded into memory in the process and `DllMain` is called
* Create two new threads for input and output
  * One to listen on a socket or pipe and then output that to standard input of a process
  * The other to read from standard output and send that to a socket or pipe
  * Goal is to send all information to a single socket or pipe in order to communicate seamlessly with the running application

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2FQfkvPPDtMTrdwAxAJ95W%2Fimage.png?alt=media&#x26;token=bcf33075-ad4e-4be8-88a4-d3229a199e7f" alt=""><figcaption></figcaption></figure>

* Fibers are like threads, but are managed by a thread, rather than by the OS

#### Interprocess Coordination with Mutexes

* ***Mutexes***
  * Also called ***mutants*** when in the kernel
  * Are global objects that coordinate multiple processes and threads
  * Mainly used to control access to shared resources
  * Example
    * If two threads must access a memory structure, but only one can safely access it at a time, a mutex can be used to control access
  * Only one thread can own a mutex at a time
  * Important to malware analysis because they often use hard-coded names, making them good host-based indicators
    * Hard-coded names are common because mutex's name must be consistent it used by two processes
* Threads gains access to the mutex with a call to `WaitForSingleObject`
* When a thread is done using a mutex it uses `ReleaseMutex`
* `CreateMutex` function
  * Creates a mutex
* Malware will commonly create a mutex and try to open an existing mutex with the same name to make sure that only one version of the malware is funning at a time

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2Fb3r6qeaRqpZEnrVB2ShL%2Fimage.png?alt=media&#x26;token=d7198374-c24e-4e48-8b25-4e45ca713ffc" alt=""><figcaption></figcaption></figure>

### Services

* ***Service***
  * Another way for malware to execute additional code
  * Services run as background applications
  * Scheduled and run by the Windows service manager without user input
  * Advantages for malware writers
    * Services are normally run as `SYSTEM` or another privileged account
    * `SYSTEM` account has more access than administrator or user accounts
    * Provide another way to maintain persistence on a system
    * Users wouldn't find anything suspicious, because malware is not running in a separate process
* Key Windows API functions related to services:
  * `OpenSCManager`
    * Returns a handle to the service control manager
    * Used for all subsequent service-related function calls
    * Any code that interacts with services will call this function
  * `CreateService`
    * Adds a new service to the service control manager
    * The caller can specify whether the service will start automatically at boot time or has to be started manually
  * `StartService`
    * Starts a service
    * Used only if the service is set to be started manually
* Most common service types used by malware
  * `WIN32_SHARE_PROCESS`
    * Stores the code for the service in a DLL
      * Combines several different services in a single, shared process.
  * `WIN32_OWN_PROCESS`
    * Stores the code in an ***.exe*** file and runs as an independent process
  * `KERNEL_DRIVER`
    * Used for loading code into the kernel
    * Information about services is stored in the registry under `HKLM\SYSTEM\CurrentControlSet\Services`

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2Fi7pRO8clPvbPMjRgBoqV%2Fimage.png?alt=media&#x26;token=61f1a0d9-b1db-46b3-853e-0dc6895418ce" alt=""><figcaption></figcaption></figure>

#### **SC Program**

* Used to investigate and manipulate services
* Commands for adding, deleting, starting, stopping and querying services

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2FKmd6bqPF5C4oHycfH3bQ%2Fimage.png?alt=media&#x26;token=a19f83bb-2a2a-4059-8417-628a9ce0679c" alt=""><figcaption></figcaption></figure>

### The Component Object Model

* An interface standard that makes it possible for different software components to call each other's code without knowledge of specifics about each other
* Works with any programming language
* Designed to support reusable software components
* Implemented as a client/server framework
* Each ***thread*** that uses COM has to call the `OleInitialize` or `CoInitializeEx` function at least once prior to calling any other COM library functions

#### CLSIDs, IIDs, and the Use of COM Objects

* COM objects are accessed via
  * ***Globally Unique Identifiers (GUIDs)***
  * ***Class Identifiers (CLSIDs)***
  * ***Interface Identifiers (IIDs)***
* `CoCreateInstance` function
  * Used to get access to COM functionality
* `Navigate` function
  * Common function used by malware
  * Allows a program to launch Internet Explorer and access a web address
* Interfaces are identified with a GUID called an IID, and classes are identified with a GUID called a CLSID

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2FREtMYbFiZwvJhynPy4WM%2Fimage.png?alt=media&#x26;token=94eef16c-c573-4a52-8658-323003b6300b" alt=""><figcaption></figcaption></figure>

* The OS uses information in the registry to determine which file contains the request COM code when a program call `CoCreateInstance`
* To identify what a malicious program is doing when it calls a COM function, malware analysts have to determine which offset a function is stored at
* One strategy for identifying the function called by a COM client to check the header files for the interface specified in the call to `CoCreateInstance`
* Some COM objects are implemented as DLLs - loaded into the process space of the COM client executable
* COM object is set up to be loaded as a DLL, the registry entry for the CLSID

#### COM Server Malware

* Malware can implement a malicious COM server that can then be used by other applications
* `Browser Helper Objects (BHOs)`
  * provide common COM server functionality for malware
  * Third-party plug-ins for Internet Explorer
  * No restrictions, so malware authors use them to run code running inside the IE process
  * This allows them to monitor Internet traffic, track browser usage, communicate with the Internet, without running their own process
* Usually easy to detect because it exports several functions
  * `DllCanUnloadNow`
  * `DllGetClassObject`
  * `DllInstall`
  * `DllRegisterServer`
  * `DllUnregisterServer`

### Exceptions: When Things Go Wrong

* Exceptions
  * Allow a program to handle events outside the flow of normal execution
  * Caused by errors
  * When they happen, execution transfers to a special routine that resolves the exception
  * When an exception occurs, Windows looks in `fs:0` for the stack location that stores the exception information and then the exception handler is called
  * After the exception is handled, execution returns to the main thread
* `Structured Exception Handling (SEH)`
  * Windows mechanism for handling exceptions
  * SEH information is stored on the stack

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2FmX81WjFKhgpqObkeXQMZ%2Fimage.png?alt=media&#x26;token=5935425d-8949-40ce-a4b6-dee3a4de8752" alt=""><figcaption></figcaption></figure>

* If the exception handle for the current frame does not handle an exception, it's passed to the exception handler for the caller's frame
* If none of the exception handlers responds to an exception, the top-level exception handler crashes the application
* Exception handlers can be used in exploit code to gain execution
  * A pointer to exception-handling information is stored on the stack
  * During a stack overflow, an attacker can overwrite the pointer
  * By specifying a new exception handler, the attacker gains execution when an exception happens

### Kernel vs User Mode

* **User Mode**
  * Each process has its own memory, security permissions, and resources
  * When a program executes an invalid instruction and crashes, Windows can reclaim all the resources and terminate the program.
  * Cannot access hardware directly
  * Restricted to only a subset of all the registers and instructions available on the CPU
  * Relies on the Windows API to manipulate hardware or change the state in the kernel
    * Presence of `SYSENTER`, `SYSCALL`, `INT 0x2E` instructions in disassembly indicates that a call is being made into the kernel
* **Kernel Mode**
  * All processes running in the kernel share resources and memory addresses
  * Kernel code has fewer security checks
    * If the code contains invalid instructions, then the OS cannot continue running, resulting in the famous Windows BSoD
  * Code running in kernel can manipulate code running in user space, but code running in user space can affect the kernel only through well-defined interfaces
  * Most security programs (AV and Firewalls) run in kernel mode
  * Malware running in kernel mode can more easily interfere with security programs or bypass firewalls
  * OS's auditing features don't apply to the kernel
  * Nearly all rootkits use code running in the kernel
    * Only sophisticated malware runs in the kernel
    * Most malware has no kernel component

### The Native API

* Lower-level interface for interacting with Windows that is rarely used by non-malicious programs
* Bypasses the normal Windows API

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2FOKWlqVEoM0h1e349fmh0%2Fimage.png?alt=media&#x26;token=823b15ca-a32b-4328-97a8-3f875092a1ac" alt=""><figcaption></figcaption></figure>

* User applications get access to user APIs like `kernel32.dll` and other DLLs which call `ntdll.dll`
* `ntdll.dll`
  * a special DLL that manages interactions between user space and the kernel
  * `ntdll` functions use APIs and structures just like the ones used in the kernel
  * functions make up the Native API
  * Programs are **not** supposed to call the Native API but nothing in the OS prevents them from doing so
* Calling the Native API is attractive for malware because
  * it allows them to do things that might not otherwise be possible
  * Additional functionality that is not exposed in the regular Windows API
  * Stealthier

<figure><img src="https://3699533910-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F4f68Nvlg17lsGgMp9UGy%2Fuploads%2FqjYePNV71HePM1m76ini%2Fimage.png?alt=media&#x26;token=2ee314d6-f9ae-4d96-a34f-f3c9dbf81b83" alt=""><figcaption></figcaption></figure>

* Native API calls that provide information about the system, processes, threads, handles and other items
  * `NtQuerySystemInformation`
  * `NtQueryInformationProcess`
  * `NtQueryInformationThread`
  * `NtQueryInformationFile`
  * `NtQueryInformationKey`
  * `NtContinue`
    * Native API function popular with malware authors
    * Meant to transfer execution back to the main thread of a program after an exception has been handled
    * Location to return to is specified in the exception context and it can be changed
    * Malware often uses this function to transfer execution in complicated ways to confuse an analyst and make a program more difficult to debug
* ***Native applications***
  * Applications that do not use the Win32 subsystem
  * Issue calls to the Native API only
  * Rare for malware but almost nonexistent for non-malicious software, so native applications are likely malicious
  * Subsystem in the PE header indicates if a program is a native application
