Skip to content

calebevans/mulder

Repository files navigation

mulder

Mulder takes a directory of forensic evidence (disk images, memory dumps, PCAPs, event logs) and runs a five-phase autonomous investigation with hard quality gates between each phase. It produces structured incident reports with MITRE ATT&CK mappings, IOC exports, and a full audit trail. An adversarial "Alternative Narrative" phase challenges every finding before the report is generated. All tool invocations go through typed MCP interfaces - never through a shell - and an append-only audit log validates every evidence citation at the API boundary, making findings with fabricated evidence citations structurally impossible to submit.

Results

Four autonomous investigations against real forensic datasets, unmodified from tool output. Each case has an interactive HTML report on GitHub Pages (sidebar navigation, dark/light theme, audit trail). See the examples index for all report links.

Case Systems Evidence Sources Tool Calls Findings Runtime Tokens Report
Rocba 1 ~8 GB 67 292 7 (1 high) 66 min 313K HTML
SRL-2015 4 ~30 GB 159 610 29 (4 crit, 9 high) 126 min 300K HTML
SRL-2018 11 ~120 GB 457 1,508 55 (11 crit, 19 high) 336 min 698K HTML
NIST Data Leakage 4 ~8 GB 88 723 33 (15 high) 102 min 330K HTML

The NIST Data Leakage case has a detailed accuracy report validated against published NIST ground truth: 60% full match, 90% detection rate, 5% false positive rate. The single false positive involved incorrect causal attribution (blaming CCleaner for artifact destruction when the answer key confirms it was launched and closed without action).

How It Works

Mulder Architecture and Security Boundaries

Each investigation runs through five phases with quality gates between them. Phases 2-4 use a plan-and-execute pipeline with three specialized roles (planner, executor, analyst) that can optionally be assigned to different models for cost optimization.

  1. Catalog - scan evidence directory, classify file types, identify distinct systems
  2. Extraction - run applicable forensic tools per system, index results into FTS5 database
  3. Cross-System Analysis - correlate events across systems, map MITRE ATT&CK techniques, deduplicate findings
  4. Alternative Narrative - challenge the primary narrative with counter-evidence, test alternative hypotheses, audit for tool and evidence coverage gaps
  5. Report - write the investigation narrative, generate Markdown/HTML reports, export IOCs and ATT&CK Navigator layers

Each gate validates structural criteria (minimum sources indexed, findings submitted, MITRE mappings present, audit tools invoked). Failed gates trigger retries with escalating turn budgets and gap-specific remediation instructions. See Architecture for the full pipeline design.

Key Design Decisions

No shell access. All 140+ tool invocations go through typed MCP interfaces with validated parameters. The agent never gets a shell. Every action is auditable and every parameter is constrained to its declared type.

Anti-hallucination at the API boundary. Every finding must cite evidence_refs that are real tool_call_id values from the append-only audit log. The MCP server validates these references at submission time and rejects findings that cite nonexistent tool calls. Timestamps are validated as ISO-8601 and auto-nullified when they appear fabricated. This is enforced architecturally, not by prompting.

Adversarial self-review. Phase 4 explicitly challenges the primary narrative before report generation. It formulates counter-hypotheses, searches for disconfirming evidence, and runs coverage audits to identify which tools were applicable but never invoked and which evidence sources were indexed but never cited.

Token efficiency. The SRL-2018 investigation (11 systems, 120 GB, 1,508 tool calls across 336 minutes) consumed 698K tokens. For cost optimization, the three pipeline roles (planner, executor, analyst) can be assigned to different models - routing mechanical tool-calling to a cheaper model while preserving reasoning quality for analysis.

Quick Start

docker pull ghcr.io/calebevans/mulder:1.3
mkdir -p ~/mulder-cases

docker run -it --privileged \
  -v /path/to/evidence:/evidence:ro \
  -v ~/mulder-cases:/home/mulder/.mulder/cases \
  -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
  ghcr.io/calebevans/mulder:1.3
mulder investigate /evidence my-case-id

For Vertex AI, Amazon Bedrock, non-Anthropic models via LiteLLM, and full CLI options, see the Usage Guide.

Case Briefing (Optional)

Drop a MULDER.md file in your evidence directory to provide case context:

## What We Know
- The network was breached on March 15
- Suspect account: jsmith

## What We're Looking For
- How did the attacker gain initial access?
- Was data exfiltrated?

The briefing is injected into every investigation phase, guiding tool selection, analysis focus, and report framing. See the Usage Guide for details.

Forensic Tools

Mulder integrates 35+ open-source forensic tools exposed as 140+ typed MCP operations:

Category Tools
Memory Volatility 3 (14 plugins)
Disk Sleuthkit, Plaso, foremost, PhotoRec, Scalpel
Windows artifacts EZ Tools (Prefetch, Amcache, ShimCache, MFT, USN Journal, Jump Lists, Shellbags, SRUM), RegRipper, Hayabusa (3,700+ Sigma rules), Chainsaw
Event logs python-evtx, Zircolite
Network tshark, Zeek, Suricata, tcpflow, tcpxtract
Malware YARA, CAPA, FLOSS, Detect-It-Easy, ClamAV, radare2
Documents oletools, PDF tools, pst-utils
Mobile ALEAPP, iLEAPP, MVT
Other bulk_extractor, binwalk, ExifTool, ssdeep, hashdeep, steghide, Hindsight

Full API reference: Tool Manifest

Output

Each investigation produces:

  • Markdown and HTML reports - executive summary, attack timeline, findings with MITRE ATT&CK mappings, IOC tables, and audit trail (example HTML reports)
  • Per-case SQLite database - FTS5 full-text search across all indexed evidence
  • Append-only audit log - JSONL recording every tool invocation with BLAKE2b output hashes
  • Optional exports - STIX 2.1 IOC bundle, CSV IOC list, and MITRE ATT&CK Navigator layer via mulder export-iocs and mulder export-navigator

Documentation

Document Description
Usage Guide Installation, providers, CLI reference, Docker configuration
Architecture System design, pipeline phases, quality gates, data flow
Tool Manifest API reference for all MCP tools
Adding Tools Contributor guide for adding new forensic tools
Glossary Terminology and definitions

License

Apache-2.0