Malware Analysis Collection & IOC Extraction System

Overview

Production-ready malware analysis system designed to safely collect and analyze malware behavior in an isolated VirtualBox sandbox environment. The system captures system activity (Procmon) and network traffic (tshark), then automatically extracts Indicators of Compromise (IOCs).

Key Features

✅ Automated malware execution in isolated VirtualBox sandbox
✅ VirtualBox snapshot restore after each analysis
✅ Procmon + tshark concurrent data collection
✅ IOC extraction (IPs, domains, registry keys, file paths, hashes)
✅ Behavior classification (Ransomware, Trojan, Botnet, Worm, etc.)
✅ Structured data parsing (CSV from Procmon, PCAP from tshark)
✅ Command-line interface with flexible options
✅ Comprehensive error handling and safety features

Quick Start

30-Second Setup

# 1. Install dependencies
pip install -r requirements.txt

# 2. Enable sandbox (once VM is configured)
# Edit config.yaml: sandbox.enabled = true

# 3. Collect malware data
python collect_malware.py --sample samples/test.exe

Basic Command

# Analyze a sample
python collect_malware.py --sample samples/malware.exe

# Outputs to: logs/ (procmon.csv, capture.pcap)

Installation

Requirements

Hardware

8GB+ RAM (16GB recommended)
20GB+ free disk (VM snapshots + logs)
Windows 10+ host OS

Software - Host Machine

Python 3.8+
VirtualBox 6.1+
Wireshark (for tshark)

Software - Guest VM

Windows 7/10/11
Process Monitor (Procmon.exe)

Python Packages

pip install -r requirements.txt
# Includes: pyyaml, pandas, scapy, reportlab, pyshark

Installation Steps

Step 1: Setup Virtual Environment

python -m venv venv
# Windows PowerShell:
.\venv\Scripts\Activate.ps1
# Windows CMD:
venv\Scripts\activate.bat

Step 2: Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Step 3: Configure VirtualBox VM

Open VirtualBox
Create Windows VM:
- Name: analysis_vm (match config.yaml)
- RAM: 4GB+
- Disk: 50GB (dynamic)
Disable:
- Network (or use FakeNet-NG)
- Shared folders
- Clipboard/Drag-drop
Install Procmon.exe on guest
Take snapshot: Name it clean

Step 4: Create Directories

mkdir samples logs

Usage

Basic Analysis

Execute malware and collect system/network activity:

python collect_malware.py --sample samples/test.exe

With Custom Config

python collect_malware.py --sample samples/malware.exe --config custom_config.yaml

Process Flow

When you run python collect_malware.py --sample samples/malware.exe:

✅ Load Configuration → reads config.yaml
✅ Create Directories → ensures logs/ exist
✅ Snapshot Restore → reverts VM to clean state
✅ Start VM → powers on analysis_vm
✅ Start Collectors → launches Procmon + tshark
✅ Copy Sample → transfers sample to guest VM
✅ Execute Sample → runs binary inside VM for execution_timeout seconds
✅ Stop Collectors → terminates Procmon + tshark
✅ Parse Logs → converts PML → CSV, reads PCAP
✅ Extract IOCs → finds IPs, domains, registry keys, file paths
✅ Cleanup → stops VM, restores snapshot

Output

logs/procmon.csv - System activity (file, registry, process operations)
logs/capture.pcap - Network traffic (packets)
Console output showing collected IOCs and behavior

Configuration

config.yaml Reference

sandbox:
  vm_name: "analysis_vm"              # VirtualBox VM name
  snapshot: "clean"                   # Snapshot to restore
  vbox_path: "C:\\Program Files\\Oracle\\VirtualBox\\VBoxManage.exe"
  enabled: false                      # Set true when VM ready
  vm_user: "Administrator"
  vm_password: "Password123!"
  guest_sample_path: "C:/Windows/Temp/sample.exe"
  procmon_export_timeout: 120         # Max seconds to wait for CSV export

tools:
  procmon_path: "tools/Procmon.exe"
  tshark_path: "tshark"               # Must be in PATH
  execution_timeout: 300              # Malware execution timeout (seconds)

paths:
  sample_dir: "samples/"
  logs_dir: "logs/"

analysis:
  network_simulation: false           # Use FakeNet-NG simulation

Architecture

Data Collection Pipeline

VirtualBox VM
│
├─ Procmon.exe ──→ procmon.pml ──→ procmon.csv (system activity)
│
└─ Network ──→ tshark ──→ capture.pcap (network packets)
   │                              │
   └──────────────────┬───────────┘
                      │
            parser.py (parsing)
                      │
         ┌────────────┼────────────┐
         │            │            │
    ioc_extractor   analyzer    (console output)

├─ sandbox.py          → VM lifecycle (restore/start/stop)
├─ execution.py        → Malware execution with timeout
├─ collector.py        → Procmon + tshark management
├─ parser.py           → CSV/PCAP parsing
├─ ioc_extractor.py    → IOC extraction (IP, domain, file, registry)
└─ analyzer.py         → Behavior classification

Module Responsibilities

Module	Purpose
sandbox.py	VirtualBox VM lifecycle: snapshot/start/stop/copy/execute
execution.py	Executes sample in sandbox with timeout
collector.py	Manages Procmon + tshark data collection
parser.py	Converts raw logs (PML, PCAP) to structured formats
ioc_extractor.py	Regex-based IOC detection (IPs, domains, hashes, registry, files)
analyzer.py	Behavioral classification (Ransomware, Trojan, Botnet, etc.)

Output

Console Output

Malware Analysis Collection - IOC Extraction System
Starting analysis: samples/malware.exe

Collected Data:
- Procmon CSV: logs/procmon.csv (542 entries)
- Network PCAP: logs/capture.pcap (1,234 packets)

IOCs Extracted:

IP Addresses:
  192.168.1.100
  10.0.0.5
  172.16.0.50

Domains:
  malicious.net
  c2-server.com
  attacker.io

Registry Keys Modified: 15
Files Created: 42

Classification: Trojan/Backdoor
Risk Level: CRITICAL

Detected Behaviors:
  - Persistence mechanism
  - Process injection
  - C2 communication
  - Credential harvesting

Analysis Complete. Data saved to logs/

Generated Logs

logs/procmon.csv - System activity parsed from Procmon
logs/capture.pcap - Raw network packets from tshark
logs/analysis_results.json - Extracted IOCs and analysis (if exported)

File Structure

malware-analyzer/
├── README.md                      # This file
├── config.yaml                    # Configuration
├── requirements.txt               # Python dependencies
├── TODO.md                        # Progress tracking
│
├── collect_malware.py       (200 LOC) # Main collection script
│
├── src/
│   ├── sandbox.py           (200 LOC) # VM control
│   ├── execution.py         (150 LOC) # Malware execution
│   ├── collector.py         (200 LOC) # Data collection
│   ├── parser.py            (200 LOC) # Log parsing
│   ├── ioc_extractor.py     (250 LOC) # IOC extraction
│   ├── analyzer.py          (200 LOC) # Behavior analysis
│   └── __init__.py
│
├── tools/                         # External utilities
│   └── Procmon.exe                # (download from Sysinternals)
│
├── samples/                       # Benign test samples
│   └── test.exe                   # (user-provided)
│
└── logs/                          # Generated during analysis
    ├── procmon.csv
    ├── procmon.pml
    ├── capture.pcap
    └── analysis_results.json      # Optional: IOC export

Safety & Ethics

⚠️ Critical Guidelines

Isolation Requirements
- Run malware ONLY inside VM snapshot
- Never expose host to untrusted binaries
- Disable: shared folders, network (unless intentional), clipboard, drag-drop
Snapshot Restore
- ALWAYS restore VM after each analysis
- ALWAYS restore VM after each analysis
- Prevents malware persistence
- Recreate clean snapshot monthly
Storage Best Practices
- Never store actual malware in repository
- Use separate encrypted drive for samples
- Version control only metadata/hashes
Network Safety
- Run with network disabled by default
- Use FakeNet-NG for network-based malware analysis
- Monitor all connections without exposing real network
Legal Compliance
- Only analyze authorized samples
- Use in controlled lab only
- Never run on production systems
- Comply with local cybersecurity laws
Access Control
- Dedicated analysis machine only
- Restrict unauthorized access
- Log all analyses with timestamps
- Document analysis intentions

Recommended Practices

✅ Git track configuration changes only
✅ Maintain audit trail with timestamps
✅ Test benign files before malware
✅ Update tools monthly
✅ Use hardware virtualization (AMD-V/Intel VT-x)
✅ Disable VM acceleration if concerned about bypasses

Troubleshooting

Problem	Solution
VBoxManage not found	Verify VirtualBox path in config.yaml
tshark not found	Add Wireshark to PATH or update config.yaml
VM not starting	Check VM name matches config.yaml; verify snapshot exists
Procmon not capturing	Run Procmon manually once to accept EULA
Pipeline timeout	Increase tools.execution_timeout in config.yaml
CSV parsing fails	Verify Procmon export successful; check encoding
Python errors	Verify all dependencies: `pip install -r requirements.txt`
Permission denied	Check write permissions for logs/ directory

Command Reference

Collect Malware

python collect_malware.py \
  --sample SAMPLE_PATH       # Malware to execute
  --config CONFIG_PATH       # Optional: custom config

Support

Getting Started

Follow Installation section
Configure config.yaml (update VM name, paths)
Test with benign sample: python collect_malware.py --sample samples/test.exe
Review logs/ directory for output files

For Questions

Check Troubleshooting section
Review config.yaml comments
Verify Procmon/tshark setup
Test with benign file first

License

This project is for educational and authorized malware analysis only. Users must comply with all applicable laws and regulations in their jurisdiction.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
samples		samples
src		src
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md
collect_malware.py		collect_malware.py
config.yaml		config.yaml
requirements.txt		requirements.txt
test_fixes.py		test_fixes.py

Folders and files

Latest commit

History

Repository files navigation

Malware Analysis Collection & IOC Extraction System

Overview

Key Features

Table of Contents

Quick Start

30-Second Setup

Basic Command

Installation

Requirements

Hardware

Software - Host Machine

Software - Guest VM

Python Packages

Installation Steps

Step 1: Setup Virtual Environment

Step 2: Install Dependencies

Step 3: Configure VirtualBox VM

Step 4: Create Directories

Usage

Basic Analysis

With Custom Config

Process Flow

Output

Configuration

config.yaml Reference

Architecture

Data Collection Pipeline

Module Responsibilities

Output

Console Output

Generated Logs

File Structure

Safety & Ethics

⚠️ Critical Guidelines

Recommended Practices

Troubleshooting

Command Reference

Collect Malware

Support

Getting Started

For Questions

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages