A production-quality Python framework for automating malware analysis by integrating multiple external sandbox APIs, normalizing results, and generating comprehensive intelligence reports.
- Multi-Sandbox Integration: Parallel submission to Hybrid Analysis, VirusTotal, and Triage
- Unified Schema: Normalizes responses from different APIs into a canonical data model
- Report Merging: Intelligently combines results from multiple sources with deduplication
- IOC Intelligence: Extracts and deduplicated Indicators of Compromise
- MITRE ATT&CK Mapping: Correlates observed behaviors with MITRE techniques
- Professional Reports: Generates beautifulbootstrap-styled HTML and structured JSON reports
- Resilient: Handles API failures, timeouts, and network errors gracefully
- Extensible: Plugin-style architecture for adding new sandbox integrations
- Proper Logging: Clean, structured logging for debugging and monitoring
ENTRY LAYER
└── analyzer.py (CLI orchestrator)
API LAYER
├── clients/
│ ├── base.py (abstract interface)
│ ├── hybrid_analysis.py (HybridAnalysisClient)
│ ├── virustotal.py (VirusTotalClient)
│ └── triage.py (TriageClient - optional)
NORMALIZATION LAYER
├── normalizers/
│ ├── base.py (BaseNormalizer)
│ ├── hybrid_analysis.py
│ ├── virustotal.py
│ └── triage.py
CORE SCHEMA
└── schema.py (UnifiedReport, IOC, MergedReport, etc.)
MERGER & CORRELATION
├── merger.py (combines multiple reports)
└── correlator/
├── correlation_engine.py (enrichment)
└── ioc_extractor.py (IOC extraction)
REPORTING
├── report_generator/
│ ├── html_generator.py (Bootstrap HTML reports)
│ └── json_export.py (Structured JSON)
└── templates/ (HTML templates)
ARTIFACTS
└── artifacts/{SHA256}/ (raw API responses, final reports)
cd SOARpip install -r requirements.txt- Hybrid Analysis: https://hybrid-analysis.com/api
- VirusTotal: https://www.virustotal.com/gui/my-apikey
- Triage (optional): https://triage.com/
cp .env.example .env
# Edit .env and add your API keys
nano .envpython -c "from analyzer import MalwareAnalyzer; MalwareAnalyzer(); print('✓ Setup OK')"python analyzer.py samples/malware.exepython analyzer.py samples/malware.exe --output results/python analyzer.py samples/malware.exe --verbosepython analyzer.py samples/malware.exe \
--output artifacts/analysis_2024 \
--verboseAnalysis generates two reports in the output directory:
- Executive summary with risk score
- File hashes (MD5, SHA1, SHA256)
- Verdict from all sandbox sources
- IOC table (IPs, domains, URLs, hashes)
- Network activity timeline
- Process execution details
- MITRE ATT&CK technique mapping
- AV engine detections
- Beautiful Bootstrap styling
Structured machine-readable data with:
- File metadata
- Verdict analysis by source
- Risk assessment
- Complete IOC list
- Network activity details
- Process tree
- File operations
- MITRE techniques
- Consolidated metadata
SOAR/
├── analyzer.py # Main entry point
├── schema.py # Unified data models
├── merger.py # Report merging logic
│
├── clients/ # Sandbox API clients
│ ├── base.py
│ ├── hybrid_analysis.py
│ ├── virustotal.py
│ └── triage.py
│
├── normalizers/ # API response normalization
│ ├── base.py
│ ├── hybrid_analysis.py
│ ├── virustotal.py
│ └── triage.py
│
├── correlator/ # Intelligence correlation
│ ├── correlation_engine.py # Engine & MITRE mapping
│ └── ioc_extractor.py # IOC extraction
│
├── report_generator/ # Report generation
│ ├── html_generator.py # HTML reports
│ └── json_export.py # JSON export
│
├── artifacts/ # Analysis output
│├── requirements.txt # Dependencies
├── .env.example # Configuration template
└── README.md # This file
Main orchestrator that:
- Initializes all sandbox clients
- Submits files in parallel
- Merges reports
- Generates output
analyzer = MalwareAnalyzer()
output_files = analyzer.analyze("samples/malware.exe", "results/")Canonical report format containing:
- File metadata (hashes, size, type)
- Analysis results (verdict, detections)
- Behavioral data (processes, network, files)
- IOCs and MITRE techniques
- Risk assessment
Consolidated report from multiple sources:
- Deduplicates IOCs
- Merges network activity
- Consensus verdict voting
- Weighted risk scoring
Convert sandbox-specific formats to UnifiedReport:
HybridAnalysisNormalizer: Converts Hybrid Analysis API responsesVirusTotalNormalizer: Converts VirusTotal API responsesTriageNormalizer: Converts Triage API responses
Intelligently combines multiple reports:
- IOC deduplication (by type+value)
- Process merging (by name+PID)
- Network activity merging
- Consensus verdict determination
- Weighted risk score calculation
Enriches merged reports:
- Extracts additional IOCs from behavior
- Maps processes to MITRE ATT&CK techniques
- Performs threat intelligence enrichment
- Builds attack narratives
IOC(
ioc_type=IOCType.DOMAIN, # enum: IP, DOMAIN, URL, MD5,SHA1, SHA256, etc.
value="malicious.com",
source=[SandboxSource.HYBRID_ANALYSIS, SandboxSource.VIRUSTOTAL],
confidence=0.95, # 0.0 - 1.0
metadata={}
)NetworkActivity(
protocol="https", # tcp, udp, dns, http, https
direction="outbound",
destination_ip="192.0.2.1",
destination_port=443,
domain="c2.malicious.com",
url="https://c2.malicious.com/beacon",
user_agent="Mozilla/5.0..."
)Process(
name="malware.exe",
pid=1234,
parent_pid=456,
command_line="malware.exe /c evil.ps1",
user="admin"
)# API Keys (required)
HYBRID_ANALYSIS_API_KEY=xxx
VIRUSTOTAL_API_KEY=yyy
TRIAGE_API_KEY=zzz
# Optional Settings
API_TIMEOUT=30 # seconds
MAX_WORKERS=3 # parallel submissions
LOG_LEVEL=INFO
OUTPUT_DIR=artifactsThe pipeline is designed to be resilient:
- Partial Failures: If one sandbox fails, analysis continues with others
- Retry Logic: Automatic retries with exponential backoff
- Rate Limiting: Handles API rate limits gracefully
- Timeouts: Configurable timeouts with warnings
- Graceful Degradation: Works with partial data
try:
analyzer.analyze("malware.exe")
except FileNotFoundError:
print("File not found")
except RuntimeError as e:
print(f"Analysis failed: {e}")- Create new client in
clients/newsandbox.py:
from clients.base import BaseSandboxClient
class NewSandboxClient(BaseSandboxClient):
def submit_file(self, file_path):
# Implementation
pass
# ... implement other methods- Create normalizer in
normalizers/newsandbox.py:
from normalizers.base import BaseNormalizer
class NewSandboxNormalizer(BaseNormalizer):
def normalize(self, raw_response):
# Convert to UnifiedReport
pass- Register in
analyzer.py:
from clients.newsandbox import NewSandboxClient
# In _initialize_clients():
if newsandbox_key:
self.clients[SandboxSource.NEWSANDBOX] = NewSandboxClient(...)pytest tests/ -v --cov# Format code
black .
# Lint
flake8 .
# Type checking
mypy .
# Security scan
bandit -r .- Parallel Submissions: Uses ThreadPoolExecutor for concurrent API calls
- Typical Analysis Time: 2-5 minutes (depends on sandbox queue)
- Memory Usage: ~50-100MB per analysis
- Report Generation: <1 second
RuntimeError: No sandbox clients initialized. Check API keys in .env
Solution: Ensure .env file exists and contains valid API keys.
WARNING: {sandbox} analysis timeout
Solution: Increase API_TIMEOUT in .env or use --verbose to debug.
RequestException: Connection refused
Solution: Check internet connection and API endpoint availability.
- API Key Storage: Never commit
.envwith real keys - Input Validation: File paths are validated before submission
- Output Sanitization: Reports are HTML-escaped to prevent XSS
- Logging: Sensitive data is not logged
- Rate Limiting: Respects API rate limits
- No actual sandbox environment (uses external APIs only)
- Dependent on third-party sandbox availability
- Analysis quality varies by sandbox sophistication
- Large files may not be supported by all sandboxes
- Real-time API failures affect analysis
- Caching layer for duplicate file hashes
- Machine learning-based risk scoring
- Real-time alerting integration
- Database backend for historical analysis
- Web UI dashboard
- CLI configuration file support
- Batch analysis mode
- Integration with SIEM platforms
This project is provided as-is for security research and analysis purposes.
For issues or questions, refer to the architecture documentation and code comments within each module.
Built with: Python 3.10+, Requests, Jinja2, Bootstrap 5, JSON
Version: 1.0.0
Last Updated: 2024