DeepSIFT

AI-Driven Forensic Investigation for SANS SIFT Workstation

DeepSIFT is a Model Context Protocol (MCP) middleware layer that turns Claude into a zero-hallucination digital forensics analyst. Instead of letting an LLM guess at raw CLI output, DeepSIFT parses every SIFT tool response into structured JSON, injects per-tool forensic discipline (caveats, advisories, corroboration hints), enriches findings with MITRE ATT&CK tags and RAG-backed threat intelligence, and enforces chain-of-custody audit logging before the LLM ever sees a single byte of evidence.

148 typed forensic MCP tools (+ environment preflight self-check) · 23 tool modules · 15 parser modules · Per-tool RAG enrichment · Post-hoc grounding verification · 4-axis quantified confidence scoring · 3,700+ Sigma rules via Hayabusa · 6-type contradiction detection · case-agnostic benchmark runner · zero-dependency Examiner Portal

Status: Production-ready. Every tool executes a real forensic binary or parser — no simulated, demo-only, or placeholder analysis paths. All evidence paths are supplied per invocation (nothing is hard-coded to a specific image), each EZ Tools run clears its own output directory to prevent cross-case contamination, dirty registry hives are parsed as-acquired, and the RAG knowledge base ships case-agnostic (case IOCs are opt-in, never auto-loaded). Originally built for the Find Evil! — SANS DFIR challenge.

🧑‍⚖️ For judges (and judging agents)

Start here: AGENTS.md (agent orientation, entry points, 60-second run) and docs/JUDGING.md (every Stage-2 criterion → exact code + how to verify).
Measured, not asserted: ROCBA 4/4 and FOR500 "Abducted Zebrafish" 4/4 vs Protocol SIFT, 0 hallucinations, 100 % claim grounding — scored by benchmark/scorer.py against published ground truth.
See real output without running anything: docs/sample/ holds committed example outputs for both cases — grounded findings.json + a rendered Examiner report (verdict, hypothesis ledger, evidence grounding, full chain of custody).
Don't trust the score — verify the evidence: python3 verify_findings.py independently re-checks every claim against the cited raw tool output and recomputes the audit hash chain. The ground-truth files are derived from the organizer case scenario (see each file's _provenance), so trust rests on reproducible grounding, not our number.
Verify in minutes (no API key): python3 preflight.py · pytest -q (74 pass) · python3 examiner_portal.py (review UI + live audit-chain integrity).
Drive it as an agent: connect Claude Code to the MCP server (.mcp.json) and ask it to investigate /mnt/evidence — disk-only is a first-class autonomous case.

Why DeepSIFT

Protocol SIFT (the prompt-only baseline) passes raw CLI output directly into LLM context, relies on natural-language safety rules, and has no structured parsing. This creates the failure modes that DeepSIFT eliminates architecturally:

Problem	Protocol SIFT	DeepSIFT
Raw CLI output → hallucination	Volatility/log2timeline text enters context unparsed	Python parsers produce typed JSON — raw text never reaches the LLM
Safety via prompt → bypassable	"Do not write to /cases/" is a suggestion	`guard_output_path()` raises `PermissionError` at OS level
No context → generic analysis	LLM has no threat intel during tool execution	ChromaDB RAG + MITRE ATT&CK injected into every tool response
Unverifiable LLM claims	No grounding check — analyst must manually verify	`verify_findings` checks every claim token against raw export bytes
Qualitative confidence	"high/low" with no definition	4-axis 0-100 score: Tool Reliability + Corroboration + IOC Specificity + MITRE Accuracy
No Sigma rule coverage	Raw event log text to LLM	Hayabusa 3,700+ Sigma rules → structured MITRE-tagged alerts
Contradictions ignored	No cross-artifact consistency check	`detect_contradictions` finds 6 contradiction types (DKOM, ghost PIDs, log wipes, etc.)

Architecture

flowchart TD
    A["Claude Code\n(LLM Agent)"] -->|"Typed MCP calls only\nno generic shell"| B

    B["DeepSIFT MCP Server\nmcp_server/server.py"]
    B -->|"Structured JSON only\nnever raw text"| A

    B --> C["Tool Modules\n23 modules · 148 typed functions"]
    C --> D["SIFT Tools\nVolatility · log2timeline · Sleuthkit\nEZ Tools · YARA · Hayabusa\nbulk_extractor · capa · FLOSS · exiftool"]
    D -->|"raw output"| E["Middleware Parsers\npslist · netscan · malfind · timeline\nbrowser · cloud · document · network_log\nlinux · mitre_auto_map · rag_enrichment"]
    E -->|"structured dict"| F["Forensic Knowledge Envelope\ncaveats · advisories · corroboration"]
    F -->|"enriched JSON"| B

    B --> G["RAG Pipeline\nChromaDB + sentence-transformers"]
    G --> H["Knowledge Sources (case-agnostic)\nMITRE ATT&CK · LOLBAS · Hunt Evil baseline\n+ opt-in per-case IOCs / threat intel"]
    H --> G

    B --> I["Audit Logger\naudit_id · SHA-256 · forensic_audit.log"]
    I --> J["exports/\nRaw tool output SHA-256 indexed\nanalysis/forensic_audit.log"]

Tool Inventory (155 MCP tools)

DeepSIFT exposes 155 MCP tools: 148 typed forensic tools across 18 categories, plus 7 control/utility tools (preflight self-check, the hypothesis-ledger trio, and the evidence-index trio). No run_shell, no execute_command — every tool has a typed signature, a middleware parser, and returns RAG-enriched structured JSON. Run python3 preflight.py first to see which tool groups are operational in your environment; a tool whose backing binary is missing returns a clear "unavailable" status with an install hint instead of crashing the investigation.

Memory Forensics — Core (Volatility 3)

Tool	Purpose	Key Output Fields
`get_process_list`	EPROCESS walk; SANS Hunt Evil baseline comparison	`suspicious`, `anomaly_details`, `mitre_techniques`
`scan_hidden_processes`	pslist vs psscan diff → DKOM detection (T1014)	`hidden_processes`, `dkom_suspected`
`find_injected_code`	malfind with injection type classification	`risk_level`, `injection_type`, `mitre_techniques`
`get_running_services`	svcscan with suspicious binary path detection (T1543.003)	`suspicious_services`
`get_network_connections`	netscan with external IP flagging + MITRE tags	`external_connections`, `mitre_techniques`
`get_command_history`	cmdline with suspicious pattern detection	`suspicious_cmdlines`, `mitre_techniques`
`get_loaded_dlls`	DLL listing for a specific PID	`dlls`, `unsigned_count`
`get_registry_hives`	List hives in memory image	`hives`
`get_registry_key`	Read a specific registry key from memory	`key`, `values`

Memory Forensics — Extended (Volatility 3)

Tool	Purpose	Key Forensic Value
`get_privileges`	Token privilege enumeration per PID	SeDebugPrivilege on non-system process = T1134
`get_mutexes`	Mutex object scan (mutantscan)	Malware-family mutex fingerprinting
`get_env_vars`	Process environment block variables	PATH hijacking, unusual TEMP locations
`get_vad_info`	Virtual Address Descriptor tree	Private RWX non-file-backed regions = injection staging
`get_ldrmodules`	Compare InLoad / InMem / InInit PEB lists	DLLs absent from all three = reflective injection (T1055.001)
`get_ssdt`	System Service Descriptor Table hooks	Non-ntoskrnl hooks = rootkit (T1014)
`get_callbacks`	Kernel callback registrations	Unknown driver callbacks = rootkit
`get_filescan`	FILE_OBJECT pool scan	Open handles to files not visible in process DLL list
`get_timeliner`	Memory-resident timestamp timeline	Process / DLL / registry chronology
`get_devicetree`	Kernel device tree	Hidden filter drivers, rootkit stack position

Timeline Analysis (log2timeline / Plaso)

Tool	Purpose
`create_super_timeline`	Build a Plaso super-timeline from a disk image (long-running)
`filter_timeline`	Extract events for a specific time window; highlights suspicious keywords
`get_browser_history`	Extract WEBHIST events (URLs, downloads, searches) from timeline

Disk Forensics (Sleuth Kit)

Tool	Purpose
`get_partition_table`	Read partition layout; returns sector offsets for follow-up calls
`get_file_listing`	Recursive file listing with deleted-file flags
`extract_file`	Extract file by inode number to `exports/`
`search_deleted_files`	List only deleted/unallocated entries

Windows Artifact Analysis (EZ Tools)

Tool	Source Artifact	Key Evidence
`parse_event_logs`	.evtx via EvtxECmd	Logon, service install, task create, PS script blocks, WMI, RDP
`parse_shimcache`	SYSTEM hive via AppCompatCacheParser	Executable existence (proves file was on disk)
`parse_amcache`	Amcache.hve via AmcacheParser	Execution evidence + SHA1 hash per executable
`parse_prefetch`	C:\Windows\Prefetch via PECmd	Execution history with last 8 run times
`parse_mft`	$MFT via MFTECmd	Full file-system timeline; detects timestamp anomalies
`parse_lnk_files`	Recent Items via LECmd	Recently accessed file paths with timestamps
`parse_jump_lists`	AutomaticDestinations via JLECmd	Application-specific recent file access
`parse_registry_hive`	Any hive via RECmd	Raw key/value search with pattern matching
`parse_recycle_bin`	$Recycle.Bin via RBCmd	Deleted file recovery with original paths
`parse_srum`	SRUDB.dat via SrumECmd	Network bytes sent/received per application (exfil quantification)
`parse_usn_journal`	$UsnJrnl:$J via MFTECmd	File system change journal; burst deletion detection
`lookup_ip_reputation`	AbuseIPDB + VirusTotal APIs	Confidence score, country, ISP, VT malicious count

Windows Event Log — Hayabusa / Sigma

Tool	Purpose	Key Output Fields
`parse_hayabusa`	Apply 3,700+ community Sigma rules to .evtx directory	`alerts`, `critical_count`, `mitre_techniques`
`list_hayabusa_rules`	Show available Hayabusa rule profiles	`profiles`, `rule_count`

Static File Analysis

Tool	Purpose	Key Output Fields
`get_pe_metadata`	PE header, sections, imports, compile timestamp, entropy	`high_entropy_sections`, `suspicious_imports`, `timestamp_anomaly`
`extract_strings`	String extraction + IOC pattern scan (IPs, URLs, base64, registry)	`iocs_found`, `ioc_summary`
`detect_packer`	Entropy analysis + UPX/MPRESS/Themida signature detection	`verdict`, `overall_entropy`, `packer_signatures_found`

Network Traffic Analysis

Tool	Purpose	Key Output Fields
`parse_pcap_summary`	TShark PCAP summary — top talkers, exfil signals	`large_transfers`, `external_conversations`
`extract_dns_queries`	DNS extraction — DGA detection, beaconing, DNS tunneling	`suspicious_domains`, `beaconing_candidates`
`parse_arp_cache`	Volatility netstat as host enumeration proxy	`unique_hosts_seen`, `hosts`

Cross-Artifact Correlation

Tool	Purpose
`correlate_artifacts`	Join findings across memory/disk/network/registry by PID, path, IP, user
`adversarial_review`	Challenge current hypothesis with counter-arguments before `finish_analysis`
`detect_contradictions`	Find UNRESOLVED_CONTRADICTION findings: DKOM, ghost PIDs, log wipes, hidden services

Investigation Control & Autonomous Reasoning

Tool	Purpose
`record_hypothesis`	Record an explicit, falsifiable hypothesis before testing it (returns `H1`, `H2`, …)
`update_hypothesis`	Confirm / disprove / mark-inconclusive a hypothesis with confidence + evidence `audit_ids` (captures self-correction)
`get_investigation_state`	Review the live hypothesis ledger + summary (confirmed/disproved/self-corrections)
`verify_findings`	Verbatim token grounding check — every claim vs raw export bytes (run before `finish_analysis`)
`finish_analysis`	Structured report with grounding score, 4-axis confidence score, hypothesis ledger, `audit_ids` citation

Scale, Health & Self-Verification

Tool	Purpose
`index_evidence`	Ingest the full artifact rows (EZ tools' `exports/*.csv`) into a stdlib SQLite store
`query_evidence`	Return only the matching subset from the indexed store — reach a 100k-row MFT without dumping it
`evidence_store_stats`	Row counts per indexed artifact source
`check_tool_availability`	Preflight: which external tool groups are operational in this environment, with install hints

YARA Hunting

Tool	Purpose
`list_yara_rule_sets`	Enumerate available rule sets
`scan_memory_with_yara`	Yarascan via Volatility 3 (finds memory-resident payloads)
`scan_file_with_yara`	Static file scan against named rule set

Built-in YARA rule sets: suspicious_strings · webshells · ransomware · rats · packers

Memory Forensics — Advanced (Volatility 3)

Tool	Purpose	Key Output Fields
`get_modules`	Kernel module list; flags unsigned/suspicious drivers	`suspicious_modules`, `mitre_techniques`, `threat_intel`
`get_driverirp`	IRP dispatch table hook detection (rootkit)	`hooked_handlers`, `threat_intel`
`get_getsids`	Security identifiers per process (privilege enumeration)	`sids`, `admin_processes`
`get_hashdump`	NTLM password hash extraction from SAM in memory	`accounts`, `non_empty_hashes`, `threat_intel`
`get_lsadump`	LSA secrets from memory (service account passwords)	`secrets`, `threat_intel`
`get_cachedump`	Domain cached credential hashes (DCC2)	`cached_accounts`
`get_clipboard`	Clipboard contents at time of acquisition	`clipboard_text`
`get_atoms`	Windows atom table (GUI attack staging)	`atoms`
`get_sessions`	Terminal Services / RDP session list	`sessions`, `rdp_sessions`
`get_mft_memory`	In-memory MFT record extraction	`mft_records`
`get_ads_memory`	Alternate Data Stream detection from memory image	`ads_entries`
`dump_process`	Dump a suspicious process to disk for static analysis	`output_path`, `sha256`

Browser Artifacts

Tool	Purpose	Key Output Fields
`parse_chrome_history`	SQLite history + downloads; cloud exfil domain classification	`suspicious_visits`, `suspicious_downloads`, `parser_summary`, `threat_intel`
`parse_firefox_history`	places.sqlite history + downloads; threat flags	`suspicious_visits`, `parser_summary`, `threat_intel`
`parse_chrome_extensions`	Installed extensions; flags risky permissions	`suspicious_extensions`, `high_risk_count`
`parse_browser_cookies`	Cookie store extraction; session token discovery	`cookies`, `suspicious_domains`
`run_hindsight`	Full Chrome/Chromium browser artifact extraction	`output_dir`, `summary`
`parse_browser_passwords`	Saved password store; credential theft evidence	`credentials`, `domain_count`
`parse_ie_edge_legacy_history`	IE/Edge Legacy WebCacheV01.dat history	`visits`, `downloads`
`parse_chromium_cache`	Chromium disk cache; cached malware delivery pages	`cache_entries`, `suspicious_urls`

Email Artifacts

Tool	Purpose	Key Output Fields
`parse_pst_ost`	Outlook PST/OST via readpst; exfiltration email search	`email_count`, `suspicious_emails`, `attachments`
`parse_thunderbird`	Thunderbird mbox profile extraction	`emails`, `suspicious_emails`
`parse_eml_file`	Single .eml file; header analysis + attachment extraction	`headers`, `attachments`, `iocs`
`extract_email_attachments`	Bulk attachment extraction for malware analysis	`extracted_count`, `suspicious_attachments`
`analyze_email_headers`	RFC 5322 header forensics; spoofing + routing analysis	`spf_result`, `dkim_result`, `hop_analysis`, `mitre_techniques`

Cloud Storage Artifacts

Tool	Purpose	Key Output Fields
`parse_dropbox_logs`	Dropbox sync logs; exfiltration risk classification	`sync_events`, `parser_summary`, `threat_intel`
`parse_onedrive_logs`	OneDrive sync/activity logs	`sync_events`, `parser_summary`, `threat_intel`
`parse_google_drive_logs`	Google Drive desktop sync logs	`sync_events`, `parser_summary`
`parse_slack_artifacts`	Slack desktop app data; workspace + channel forensics	`workspaces`, `suspicious_events`
`parse_teams_artifacts`	Microsoft Teams SQLite databases; chat + call forensics	`accounts`, `messages`, `suspicious_events`
`parse_icloud_logs`	iCloud for Windows sync logs	`sync_events`, `parser_summary`

Document Analysis

Tool	Purpose	Key Output Fields
`analyze_pdf_doc`	pdfid keyword scan; JavaScript/OpenAction/launch classification	`risk_score`, `suspicious_keywords`, `mitre_techniques`, `threat_intel`
`analyze_ole_doc`	oletools VBA macro extraction + malicious pattern detection	`macros`, `classified_risks`, `mitre_techniques`
`analyze_rtf_doc`	rtfobj embedded object extraction; malicious CLSID detection	`objects`, `clsid_risks`
`analyze_zip_archive`	ZIP entry inspection; password-protected + double-ext detection	`entries`, `suspicious_entries`
`detect_dde_payload`	DDE/DDEAUTO command injection in Office documents	`dde_found`, `commands`, `threat_intel`

Linux / macOS Forensics

Tool	Purpose	Key Output Fields
`get_linux_processes`	Volatility linux.pslist; attack command + LD_PRELOAD detection	`suspicious`, `threat_flags`, `threat_intel`
`get_linux_bash_history`	Bash command history with attack pattern classification	`commands`, `classified_suspicious`, `threat_intel`
`get_linux_network`	linux.netstat via Volatility	`connections`, `external`
`get_linux_modules`	Kernel module list; rootkit LKM detection	`modules`, `suspicious`
`get_linux_syscall`	System call table hook detection	`hooks`
`get_linux_malfind`	malfind equivalent for Linux memory images	`injected`
`get_linux_envars`	Process environment variables	`envars`, `suspicious`
`get_linux_mounts`	Mount table; network share + hidden mount detection	`mounts`, `suspicious`
`parse_syslog`	Syslog/auth.log parsing; auth failure + sudo classification	`classified_events`, `classified_summary`, `threat_intel`
`parse_linux_crontab`	Crontab persistence detection across all users	`cron_entries`, `suspicious_schedules`

Network Forensics — Extended

Tool	Purpose	Key Output Fields
`parse_zeek_logs`	Zeek conn/dns/http/ssl/files log parsing; DNS tunneling detection	`suspicious_dns`, `external_conns`, `threat_intel`
`parse_iis_logs`	IIS W3C access logs; web shell + SQLi + scanner detection	`suspicious_requests`, `web_shells`, `threat_intel`
`parse_apache_logs`	Apache access/error logs; same threat classification	`suspicious_requests`, `port_scans`
`extract_pcap_files`	Extract files from PCAP via NetworkMiner/tshark	`extracted_files`
`parse_firewall_logs`	Firewall deny/allow logs; lateral movement flagging	`suspicious_flows`, `internal_scanning`
`decode_rdp_bitmap_cache`	RDP bitmap cache → screenshot reconstruction	`output_dir`, `image_count`
`parse_netflow`	NetFlow/IPFIX analysis; top talkers + exfil signals	`top_talkers`, `large_flows`, `exfil_candidates`

Anti-Forensics Detection

Tool	Purpose	Key Output Fields
`detect_timestomping`	SI vs FN MACB delta comparison; round-number timestamps	`si_fn_delta_anomalies`, `mitre_techniques`, `threat_intel`
`detect_log_wiping`	Event ID 1102/104/4719; zero-byte EVTX detection	`log_clear_events`, `threat_intel`
`detect_secure_deletion`	SDelete/Eraser/CCleaner traces in prefetch + shimcache	`secure_deletion_indicators`, `threat_intel`
`detect_ads_streams`	NTFS Alternate Data Stream discovery	`suspicious_streams`, `threat_intel`
`analyze_vss_shadows`	Volume Shadow Copy inventory; deletion evidence	`shadow_copy_count`, `rag_context`
`detect_prefetch_anomalies`	Temp path execution + anti-forensics tool execution	`suspicious_entries`, `anti_forensics_tools`
`detect_event_log_tampering`	Event ID 1102/4719/7040 audit policy changes	`findings`, `threat_intel`

File Carving and Static Analysis

Tool	Purpose	Key Output Fields
`run_bulk_extractor`	Bulk feature extraction: emails, URLs, IPs, CCNs, Base64	`top_iocs`, `enriched_email_iocs`, `enriched_url_iocs`
`carve_files_foremost`	Header/footer file carving from unallocated space	`recovered_files_by_type`, `total_recovered`
`carve_files_scalpel`	Configurable signature-based file carving	`recovered_files_by_type`
`analyze_with_exiftool`	Metadata extraction (GPS, author, software, revision)	`interesting_fields`, `full_metadata`
`calculate_file_hashes`	MD5/SHA1/SHA256/SHA512 + ssdeep fuzzy hash	`hashes`, `ssdeep`
`detect_capabilities_capa`	capa: capability detection mapped to MITRE ATT&CK	`capabilities`, `mitre_techniques`, `threat_intel`
`extract_floss_strings`	FLOSS: XOR/stack/tight decoded string extraction	`decoded_strings`, `ioc_ips_in_decoded`, `threat_intel`
`get_file_type`	Magic byte vs extension mismatch (masquerade detection)	`extension_mismatch`, `mitre_techniques`

Extended Registry Forensics

Tool	Purpose	Key Output Fields
`parse_shellbags`	Folder navigation history; deleted dir + USB + share access	`suspicious_path_accesses`, `threat_intel`
`parse_windows_timeline`	ActivitiesCache.db: app launches + file opens	`file_opens`, `app_launches`
`parse_bam_dam`	BAM/DAM last-execution timestamps per user SID	`suspicious_executions`, `threat_intel`
`parse_typed_paths`	Explorer address bar history; network share + admin share paths	`network_share_paths`, `removable_media_paths`
`parse_run_mru`	Run dialog (Win+R) execution history	`suspicious_run_commands`, `threat_intel`
`parse_open_save_mru`	Open/Save dialog recent file access	`entries`
`parse_wordwheelquery`	Windows Search query history; sensitive file discovery	`suspicious_searches`, `threat_intel`
`parse_installed_software`	Installed programs; RAT/hacking tool detection	`suspicious_software`, `threat_intel`
`parse_sam_hive`	Local user accounts and last logon info	`entries`
`parse_logon_history`	Cached domain credentials in SECURITY hive	`entries`, `forensic_note`

Extended Disk Forensics

Tool	Purpose	Key Output Fields
`get_fs_statistics`	fsstat: block size, volume name, creation/mount timestamps	`fs_type`, `block_size`, `creation_time`
`get_image_info`	ewfinfo/mmls: image format, acquisition hash, partition table	`ewf_metadata`, `partition_table`
`create_mac_timeline`	mactime: body-file MAC(B) timeline generation	`total_timeline_entries`, `output_path`
`read_raw_block`	blkcat: hexdump specific sectors; magic byte detection	`hexdump`, `detected_structure`
`analyze_slack_space`	blkls: file slack space extraction + IOC scanning	`ips_in_slack`, `urls_in_slack`, `threat_intel`
`verify_image_integrity`	MD5/SHA256 + ewfverify chain-of-custody verification	`integrity_verified`, `chain_of_custody`

Threat Intelligence

Tool	Purpose	Key Output Fields
`lookup_hash_reputation`	VirusTotal file hash lookup (MD5/SHA1/SHA256)	`detection_ratio`, `verdict`, `mitre_techniques`, `threat_intel`
`lookup_domain_reputation`	VirusTotal + WHOIS domain reputation check	`verdict`, `mitre_techniques`, `threat_intel`
`search_mitre_technique`	RAG query for MITRE ATT&CK technique details	`rag_results`, `static_knowledge`
`search_ioc_database`	Search all IOCs in the RAG knowledge base	`matches`, `match_count`
`calculate_fuzzy_hash_similarity`	ssdeep similarity between two files/hashes (malware variants)	`similarity_score`, `interpretation`

How DeepSIFT Eliminates Hallucination

flowchart LR
    A["Raw Tool Output\nVolatility / EZ Tools / etc."] --> B["Python Parser\nStructured dict"]
    B --> C["MITRE Auto-Map\nmap_process_anomalies\nmap_injection\nmap_network_connection"]
    C --> D["RAG Enrichment\nChromaDB query\nMITRE · threat intel · case IOCs"]
    D --> E["Forensic Knowledge Envelope\ncaveats · advisories · corroboration"]
    E --> F["audit_id\nSHA-256 of raw output\nTimestamp + export file path"]
    F --> G["Structured JSON\nto LLM Context"]

Every tool call generates a unique audit_id (e.g. dsift-2026-06-11-a3f9b2c1). finish_analysis requires an audit_ids list — fabricated findings without a traced audit_id are structurally impossible to submit.

Investigation Workflow

Memory Image

flowchart TD
    A["get_process_list\nHunt Evil baseline + MITRE tags"] --> B["scan_hidden_processes\nDKOM rootkit detection"]
    B --> C["find_injected_code\nmalfind injection classification"]
    C --> D["get_running_services\nsuspicious binary paths"]
    D --> E["get_network_connections\nexternal IP flagging"]
    E --> F["get_command_history\nsuspicious pattern detection"]
    F --> G["lookup_ip_reputation\nAbuseIPDB + VirusTotal"]
    G --> H["correlate_artifacts\ncross-source PID/path/IP joins"]
    H --> I["adversarial_review\nchallenge hypothesis"]
    I --> J["finish_analysis\nobservation + interpretation\naudit_ids required"]

Windows Artifact Analysis

flowchart TD
    A["parse_event_logs\nlogon · service · task · PS · WMI · RDP"] --> B["parse_shimcache\nexecutable existence"]
    B --> C["parse_amcache\nSHA1 hash per executable"]
    C --> D["parse_prefetch\nexecution history x8 runs"]
    D --> E["parse_mft\nfull FS timeline + timestamp anomalies"]
    E --> F["parse_srum\nbytes sent per application\nexfil quantification"]
    F --> G["parse_usn_journal\nburst deletion detection"]
    G --> H["correlate_artifacts"]
    H --> I["adversarial_review"]
    I --> J["finish_analysis"]

What Sets DeepSIFT Apart

The challenge is to take Protocol SIFT — Claude Code wired directly to the SIFT Workstation — and make it production-grade. DeepSIFT does exactly that. Against the prompt-only baseline, every dimension a DFIR agent is judged on is upgraded from a prompt-level suggestion to an architecturally enforced guarantee:

Judging dimension	Protocol SIFT (prompt-only baseline)	DeepSIFT
Tool output → LLM	10k+ lines of raw CLI text in-context	Typed JSON from 15 middleware parsers — the model never sees raw text
Hallucination control	Natural-language "be careful" rules	Per-claim grounding verification against raw export bytes (`verify_findings.py`)
Confidence	Qualitative "high/low"	4-axis quantified score (0–100)
Safety boundaries	Prompt instructions	`guard_command` + `guard_output_path` raise at the OS layer — evidence is read-only by construction
Audit trail	None	SHA-256 hash chain + optional HMAC signing (forgery-resistant), one entry per tool call
Threat intel	Training-time memory	RAG (MITRE ATT&CK + LOLBAS + Hunt Evil) injected into every tool call
Autonomy evidence	Lives in the chat, then lost	Server-side hypothesis ledger with confirm/disprove/self-correction + confidence
Detection breadth	~30 event IDs	3,700+ Sigma rules (Hayabusa) + 6-type contradiction detection
Scale	Dump artifacts into context	Indexed SQLite evidence store — query the full set, page only the matches
Human review	None	Interactive Examiner Portal with HMAC sign-off, drill-down, multi-case
Accuracy (must-identify)	25% on ROCBA, missed disk-only FOR500	100% on both, 0 hallucinations, 100% grounding

Capabilities unique to DeepSIFT's design:

Grounding at the tool layer — every claim token is matched against the raw evidence its audit_id cites; findings are reproducible from first principles, not taken on trust.
Quantified, multi-axis confidence — tool reliability + corroboration + IOC specificity + MITRE accuracy, not an adjective.
Forgery-resistant chain of custody — hash-chained and HMAC-signable; tampering (modify / insert / delete) is provably detectable, and signatures cannot be forged without the key.
Captured autonomy with no API key — Claude Code drives the typed tools and records its reasoning server-side, so the senior-analyst loop is auditable, not anecdotal.
Per-tool forensic knowledge envelope — caveats, advisories, and corroboration hints wrap every response, so the model reasons with forensic discipline at every step.
Client-agnostic — the same tool surface serves over stdio or HTTP (SSE/streamable-http) to any MCP client or remote agent.

How to Run It (three ways)

DeepSIFT runs three ways:

Claude Code + MCP server (how a judge can drive it directly, no extra API key): point Claude Code at the DeepSIFT MCP server via .mcp.json and ask it to investigate /mnt/evidence. Claude Code is the agent; every action goes through the typed, parsed, audited, guard-railed tools — it cannot run a raw shell command. The session records its reasoning via record_hypothesis/update_hypothesis/finish_analysis, producing an auditable autonomy trail with no API key. Client-agnostic: set DEEPSIFT_MCP_TRANSPORT=sse to serve the same tools over HTTP to any MCP client (Claude Desktop, Cherry Studio, LibreChat, a remote agent, or a gateway).
investigate.py — agentic reasoning (the senior-analyst mode): an LLM forms explicit hypotheses, chooses which typed MCP tool to run next, reads the parsed/audited JSON, marks each hypothesis confirmed / disproved / inconclusive with a confidence, self-corrects when a tool fails or a result contradicts a hypothesis, and reconstructs the attack chain. Works on any evidence shape and adapts its first triage step accordingly:
```
export ANTHROPIC_API_KEY=sk-ant-...
# disk-only (no memory image) — a first-class autonomous run
python3 investigate.py --evidence-mount /mnt/evidence
# memory + disk
python3 investigate.py --image /cases/<case>/memory.raw --evidence-mount /mnt/evidence
# memory-only
python3 investigate.py --image /cases/<case>/memory.raw
```
demo.py — deterministic pipeline: fixed multi-agent sequence (no LLM/key) for reproducible, scriptable benchmark runs.

Examiner Portal — Human Review

A reviewer or judge can inspect a completed investigation in one command — no pip installs (Python standard library only):

python3 examiner_portal.py                       # interactive live UI → http://127.0.0.1:8420
python3 examiner_portal.py --cases-root /cases   # adds a multi-case picker across investigations
python3 examiner_portal.py --html reports/examiner_review.html   # static read-only file (no server)

The portal shows the verdict + confidence, the autonomous-reasoning hypothesis ledger (confirmed/disproved/self-corrections), every finding (suspicious processes, exfil IOCs, named MITRE ATT&CK badges, timeline, files accessed), the evidence-grounding result (verified vs unverified claims), and the full chain of custody — every audited tool call with the SHA-256 of its raw output plus a recomputed hash-chain integrity verdict that detects tampering. It is interactive: click any audit row to drill into the raw evidence (with a live SHA-256 match check), browse multiple cases, and perform an examiner sign-off — approve/reject each finding and produce an HMAC-signed, tamper-evident manifest binding the findings hash + audit-chain head. This directly answers the "usability" and "audit trails" judging criteria.

Architectural Guardrails

Enforced in code, not prompts — these raise exceptions; the model cannot talk its way past them:

Every tool hard-codes its own forensic binary and builds an argv list (never shell=True, never a shell string) — the model cannot choose the binary or smuggle a second command, so an arbitrary destructive command is unreachable by construction.
mcp_server.audit.guard_command adds defense-in-depth on the parameter-rich binary launchers (Volatility, EZ Tools / Windows-artifact, and registry exec paths): it blocks destructive/ exfiltration binaries (rm, dd, shred, mkfs, wget, curl, scp, ssh, nc, shells…) and shell redirection/chaining tokens, and rejects shell-string commands outright.
guard_output_path blocks writes under evidence roots (/cases/, /mnt/, /media/).
Tool output is parsed to JSON before reaching the LLM; every call is logged with a SHA-256 of the raw output (analysis/forensic_audit.log).
Tamper-evident and tamper-resistant audit chain. Entries form a SHA-256 hash chain (any modify/insert/delete breaks it). Set DEEPSIFT_AUDIT_KEY (held off the evidence host) to also HMAC-sign the chain — an attacker who rewrites the entire log still cannot forge valid signatures without the key. verify_audit_chain() reports both; the Examiner Portal shows it live.
Token-scale by design + a queryable store. The LLM only ever sees each tool's parsed, capped summary JSON; the full raw evidence (up to MBs) goes to the on-disk audit record, never into the prompt (AGENT_TOOL_RESULT_CHARS). For full-disk scale, index_evidence ingests the complete artifact rows (the EZ tools' exports/*.csv) into a stdlib SQLite store and query_evidence returns only the matching subset — reach a 100k-row MFT or full shellbag set without dumping it. A dependency-light alternative to standing up OpenSearch.

Validated Results

DeepSIFT has been validated end-to-end on two organizer-provided SANS cases — one memory+disk, one disk-only — each scored by benchmark/scorer.py against published ground truth and independently reproducible by a judge via python3 verify_findings.py.

Case	Evidence	Protocol SIFT baseline	DeepSIFT	Hallucinations	Claim grounding
ROCBA (FOR508)	memory + disk	0 / 4 (0 %)	4 / 4 (100 %)	0	100 %
Abducted Zebrafish / Vanko (FOR500)	disk-only	3 / 4 (75 %)	4 / 4 (100 %)	0	100 %

ROCBA — FOR508 (memory + disk)

End-to-end benchmark on the SANS FOR508 ROCBA case (Rocba-Memory.raw 18 GB + rocba-cdrive.e01 81 GiB C: volume).

The memory image was captured 3 days after the 2020-11-13 incident, so the break-in evidence exists only on disk. DeepSIFT's disk + browser analysis reconstructs it with zero hallucinations:

Unauthorized access (2020-11-13) — wave of Event 4625 Failed Logon (RDP brute force).
IP theft / exfiltration — LNK artifacts show SRL project files (Megaforce Specs & Research.docx, Blue Thunder blueprint, Files from SRL system) copied to an external F:\ USB drive on 2020-11-13.
Cloud usage + incident-window browsing — Google Drive + SharePoint (starkresearchlabs-my.sharepoint.com) access on Nov 14 UTC (= Nov 13 evening EST), and a Google search for sdelete download (anti-forensics).

Reproduce (deterministic, no LLM/API key required):

python3 demo.py \
  --image /cases/ROCBA/Rocba-Memory.raw \
  --evidence-mount /mnt/evidence \
  --baseline benchmark/baselines/protocol_sift_rocba_findings.json \
  --ground-truth benchmark/ground_truth/rocba_ground_truth.json

Abducted Zebrafish / Vanko — FOR500 (disk-only)

A disk-only case (physical Microsoft Surface 3 image, no memory capture) — the scenario the prompt-only baseline handles least well, and a first-class autonomous run for DeepSIFT. DeepSIFT scored 4/4 must-identify, 0 hallucinations, 100 % claim grounding, and was the only configuration to recover the classified research subject matter the baseline missed. Every claim below traces to an audited tool call:

Access to classified StarkResearch directories (Level 5–8) — shellbags (SbECmd, 252 entries) record Explorer browsing of the \\192.168.1.5\StarkResearch\Level 5–8 Classified SMB share and the locally-staged Downloads\vacation photos\ cover-named copies.
Classified subject matter (the criterion the baseline missed) — jump lists / LNK recover zebrafish.pdf, ZF DNA splice test notes.docx, and Rapid cell regeneration research.docx.
Tooling & staging — decoded UserAssist shows just-in-time 7-Zip (2016-06-29 16:01) and VeraCrypt 1.17 (6 runs, last 2016-06-30 01:56), plus StarkCollector.exe and sdelete.exe.
Exfiltration channels — parse_usb_history enumerates 9 USB mass-storage devices (WD My Passport, SanDisk Cruzer, Verbatim Store-N-Go, PNY, Innostor) and Chrome history shows an icloud.com cloud-storage visit.

This case is what motivated DeepSIFT's Production Hardening below — the disk-artifact/registry path was hardened so analysis is correct and case-isolated on any acquired image.

Production Hardening

The EZ Tools / registry path was hardened so disk-artifact analysis is correct and case-isolated on any acquired image (no behaviour is specific to a particular case):

Cross-case isolation — every EZ Tools run clears its own CSV output directory first, so a prior case's output (e.g. another user profile's LNK history) can never be re-read as the current case's evidence.
Dirty-hive parsing — RECmd/SbECmd are invoked with --nl so live-acquired registry hives (which ship TxR .blf logs, not the .LOG1/.LOG2 files those tools replay) are parsed as-acquired instead of aborting and silently returning zero rows.
Offline-hive keys — registry lookups resolve ControlSet001 (acquired hives have no CurrentControlSet symlink), and single-key (--kn) dumps that write to stdout rather than CSV are still parsed into structured entries.
Correct artifact decoding — UserAssist entries are read from the decoded program-path / run-count / last-executed columns (not the raw ROT13 value name); SbECmd output (named per hive) is read by scanning all CSVs it produces.
Case-agnostic knowledge base — the offline RAG corpus contains only general forensic knowledge (MITRE catalog, LOLBAS, Hunt Evil baseline). Per-case IOCs are opt-in (--case-ioc-json / --load-rocba) so one case never biases another.

Running on SIFT Workstation (Linux) — notes

EZ Tools are invoked as .NET assemblies (dotnet /opt/zimmermantools/<Tool>.dll, subdir-aware), not Windows .exe — works on stock SIFT with the dotnet runtime.
Evidence mounting is read-only; NTFS volume images with a truncated backup-boot sector mount via the kernel ntfs3 driver (mount -t ntfs3 -o ro <loop> /mnt/evidence).
Offline / air-gapped RAG — if a GPU build of torch/sentence-transformers or the embedding model is unavailable, the knowledge base falls back to an offline hashing embedder and seeds from the bundled Hunt Evil process baseline + case IOCs (no network needed).
Event-log scope — disk_agent parses live security/system/RDP/PowerShell channels plus the most recent rotated Security archives (bounded), and retains events date-stratified so the incident window is never truncated away.
Browser coverage — all profiles of all installed browsers (Chrome/Edge/Brave + Firefox) are analysed, auto-discovered from the evidence mount.

Setup & Installation

Prerequisites

SANS SIFT Workstation (Ubuntu 20.04+)
Python 3.10+
Volatility 3, log2timeline, Sleuth Kit (pre-installed on SIFT)
EZ Tools at /opt/zimmermantools/ (install with SIFT EZ Tools script) — run via the dotnet runtime

Installation

git clone https://github.com/ahammadshawki8/DeepSIFT
cd DeepSIFT

# Install Python dependencies
pip3 install -r requirements.txt

# Copy environment config
cp .env.example .env
nano .env   # Add ABUSEIPDB_API_KEY and VIRUSTOTAL_API_KEY (optional but recommended)

# Initialize RAG knowledge base (first run only, ~3-5 minutes)
python3 rag/ingest/run_all.py

# Run tests
pytest tests/
# Expected: 75 passed, 1 skipped

Connect to Claude Code

Add to ~/.claude.json (or .claude/settings.json in your project):

{
  "mcpServers": {
    "deepsift": {
      "command": "python3",
      "args": ["/path/to/deepsift/mcp_server/server.py"]
    }
  }
}

Start the server in a separate terminal:

python3 mcp_server/server.py

Verify It Yourself (no API key)

Everything below runs offline with no API key — a judge can confirm DeepSIFT from first principles in a few minutes.

# 1. Works on a fresh clone immediately:
python3 preflight.py        # which forensic tool groups are operational here (honest, per-host)
pytest -q                   # full test suite → 75 passed, 1 skipped

# 2. See a completed result instantly (committed sample — no run needed):
#    open docs/sample/vanko_examiner_report.html  (or rocba_examiner_report.html) in a browser,
#    or render the portal view of a sample:
python3 examiner_portal.py --findings docs/sample/vanko_findings.json --html /tmp/review.html

After you run your own investigation (next section) the live results land in analysis/, and these confirm them independently:

python3 verify_findings.py     # re-checks every claim vs raw evidence + recomputes the hash chain
python3 examiner_portal.py     # live review UI → http://127.0.0.1:8420

Reproduce the head-to-head benchmark (deterministic; no LLM/API key required):

python3 demo.py \
    --image /cases/ROCBA/Rocba-Memory.raw \
    --evidence-mount /mnt/evidence \
    --baseline benchmark/baselines/protocol_sift_rocba_findings.json \
    --ground-truth benchmark/ground_truth/rocba_ground_truth.json

Drive a live investigation as the agent — connect Claude Code to the MCP server (.mcp.json) and ask, e.g.:

Investigate /mnt/evidence for unauthorized access and data exfiltration. Use DeepSIFT
tools only: record hypotheses, confirm or disprove each with the right tool, then call
finish_analysis citing every audit_id.

Claude Code follows the workflow, records its hypotheses and self-corrections, cross-correlates artifacts, challenges its own conclusions with adversarial_review, and calls finish_analysis with a grounded, structured report citing every audit_id — no external API key required.

Evidence Integrity & Chain of Custody

Every tool call generates an immutable, hash-chained audit record:

{
  "audit_id": "dsift-2026-06-11-a3f9b2c1",
  "timestamp": "2026-06-11T14:23:07.412Z",
  "tool": "get_process_list",
  "command": "python3 -m volatility3 -f /cases/ROCBA/Rocba-Memory.raw windows.pslist.PsList",
  "raw_output_sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
  "raw_output_file": "exports/get_process_list_2026-06-11T14-23-07-412Z.txt",
  "prev_hash": "…",
  "entry_hash": "…"
}

Provenance-gated reporting — finish_analysis requires an audit_ids list. Any finding not traceable to a prior tool call is structurally blocked — the tool errors and no report is written.
Tamper-evident chain — each entry binds the previous entry's hash, so any modify/insert/delete breaks the chain (verify_audit_chain()).
Tamper-resistant (optional) — set DEEPSIFT_AUDIT_KEY (held off the evidence host) to additionally HMAC-sign the chain; an attacker who rewrites the whole log cannot forge valid signatures without the key.
Independently checkable — python3 verify_findings.py recomputes both the grounding and the chain integrity from the on-disk artifacts.

RAG Knowledge Base

The RAG pipeline (ChromaDB + sentence-transformers, with an offline hashing-embedder fallback) ships a case-agnostic corpus — only general forensic knowledge. One case's indicators are never baked in by default, so an investigation is never biased by an unrelated case.

Source (default corpus)	Coverage
MITRE ATT&CK technique catalog	Technique IDs + names mapped by the parsers (kept in sync with `mitre_auto_map`)
LOLBAS reference	Commonly abused signed Windows binaries and how attackers misuse them
SANS Hunt Evil baseline	Known-normal Windows process baseline for anomaly detection

Per-case IOCs are opt-in and loaded only for that investigation (python3 rag/ingest/run_all.py --case-ioc-json <findings.json>; a bundled ROCBA example pack is available via --load-rocba). RAG context is injected into tool responses at call time — the model sees threat intelligence alongside the parsed artifact data, not as a separate lookup step.

Benchmarking

Protocol SIFT vs DeepSIFT (ROCBA case)

python3 demo.py \
    --image /cases/ROCBA/Rocba-Memory.raw \
    --baseline benchmark/baselines/protocol_sift_rocba_findings.json \
    --ground-truth benchmark/ground_truth/rocba_ground_truth.json \
    --report-output docs/accuracy_report.html

The HTML report shows:

Side-by-side finding comparison (DeepSIFT vs Protocol SIFT)
Color-coded MITRE ATT&CK badges
Precision, recall, and F1 scores vs ground truth
Chain-of-custody audit trail summary

vigia-cases Standardized Benchmark

DeepSIFT supports the annatchijova/vigia-cases standardized benchmark dataset used across multiple hackathon submissions for objective cross-system comparison:

# Clone vigia-cases dataset
git clone https://github.com/annatchijova/vigia-cases

# Run DeepSIFT against all cases
python3 benchmark/vigia_runner.py \
    --vigia-root ./vigia-cases \
    --results-root ./benchmark/deepsift_results \
    --output-json benchmark/reports/vigia_report.json \
    --output-md benchmark/reports/vigia_report.md

Scored dimensions: MITRE Recall · IOC Recall · Narrative Recall · Hallucination Rate · Grounding Score · Confidence Score · Contradictions Found

Project Structure

DeepSIFT/
├── mcp_server/
│   ├── server.py                    ← MCP server entry point (148 tools, 23 modules)
│   ├── config.py                    ← Tool paths, environment config
│   ├── audit.py                     ← audit_id generation, tool counter, chain-of-custody log
│   ├── tools/
│   │   ├── volatility.py            ← 12 core Volatility tools + verify_findings + finish_analysis
│   │   ├── volatility_extended.py   ← 10 advanced Volatility tools (privileges, VAD, SSDT, callbacks)
│   │   ├── volatility_advanced.py   ← 12 Volatility tools (modules, IRP hooks, hashdump, dump_process)
│   │   ├── windows_artifacts.py     ← 16 EZ Tools wrappers (event logs, registry, execution artifacts)
│   │   ├── registry_extended.py     ← 10 registry tools (shellbags, BAM/DAM, MRU, SAM, timeline)
│   │   ├── browser_artifacts.py     ← 8 browser tools (Chrome, Firefox, Edge, Hindsight, cache)
│   │   ├── email_artifacts.py       ← 5 email tools (PST/OST, Thunderbird, EML, header forensics)
│   │   ├── cloud_artifacts.py       ← 6 cloud tools (Dropbox, OneDrive, Google Drive, Slack, Teams)
│   │   ├── document_analysis.py     ← 5 document tools (PDF, OLE/VBA, RTF, ZIP, DDE)
│   │   ├── linux_forensics.py       ← 10 Linux tools (processes, bash history, syslog, crontab)
│   │   ├── network_analysis.py      ← 3 network tools (PCAP, DNS, ARP)
│   │   ├── network_extended.py      ← 7 network tools (Zeek, IIS, Apache, firewall, netflow, RDP)
│   │   ├── anti_forensics.py        ← 7 anti-forensics detection tools (timestomp, log wipe, ADS, VSS)
│   │   ├── file_carving.py          ← 8 tools (bulk_extractor, foremost, scalpel, capa, FLOSS, exiftool)
│   │   ├── file_analysis.py         ← 3 static analysis tools (PE metadata, strings, packer detection)
│   │   ├── disk_extended.py         ← 6 disk tools (fsstat, ewfinfo, mactime, blkcat, slack, integrity)
│   │   ├── threat_intel_extended.py ← 5 threat intel tools (VT hash/domain, MITRE search, IOC DB, ssdeep)
│   │   ├── log2timeline.py          ← 3 Plaso tools
│   │   ├── sleuthkit.py             ← 4 Sleuth Kit tools
│   │   ├── yara_tools.py            ← 3 YARA tools
│   │   ├── hayabusa.py              ← 2 Hayabusa tools (3,700+ Sigma rules)
│   │   └── correlation.py           ← 3 tools: correlate_artifacts, adversarial_review, detect_contradictions
│   └── parsers/
│       ├── pslist_parser.py         ← SANS Hunt Evil baseline (31 processes), masquerade detection
│       ├── netscan_parser.py        ← External IP extraction and flagging
│       ├── malfind_parser.py        ← Injection type classification (PE/shellcode/reflective)
│       ├── timeline_parser.py       ← Suspicious keyword detection in Plaso timeline
│       ├── mitre_auto_map.py        ← Rule-based MITRE ATT&CK mapping (80+ rules, 19 categories)
│       ├── rag_enrichment.py        ← Shared RAG enrichment helpers (enrich_findings, build_rag_summary)
│       ├── browser_parser.py        ← Browser URL/download threat classification
│       ├── cloud_parser.py          ← Cloud sync exfiltration risk classification
│       ├── document_parser.py       ← PDF/OLE/RTF/DDE/ZIP malicious document classification
│       ├── network_log_parser.py    ← Web/firewall/DNS log threat classification
│       ├── linux_parser.py          ← Linux process/command/syslog threat classification
│       ├── grounding_verifier.py    ← Post-hoc verbatim token grounding check
│       ├── confidence_scorer.py     ← 4-axis quantified confidence scoring (0-100)
│       └── forensic_knowledge.py    ← Per-tool forensic caveats/advisories/corroboration (148 entries)
├── rag/
│   ├── knowledge_base.py            ← ChromaDB vector store
│   ├── query.py                     ← Semantic search interface
│   └── ingest/
│       ├── knowledge_corpus.py      ← Case-agnostic offline corpus (MITRE catalog + LOLBAS + Hunt Evil)
│       ├── mitre_attack.py          ← MITRE ATT&CK Enterprise ingestion
│       ├── case_history.py          ← Per-case findings ingestion (opt-in, per investigation)
│       ├── rocba_iocs.py            ← Example case-IOC pack (opt-in via --load-rocba; not auto-loaded)
│       └── run_all.py               ← One-command RAG initialization
├── agents/
│   ├── orchestrator.py              ← LangGraph multi-agent coordination (deterministic pipeline)
│   └── reasoning_agent.py           ← Agentic LLM reasoning loop over the typed tools
├── benchmark/
│   ├── scorer.py                    ← must-identify / hallucination scoring vs ground truth
│   ├── compare.py                   ← Case-agnostic side-by-side comparison + HTML report
│   ├── vigia_runner.py              ← vigia-cases standardized multi-case benchmark
│   ├── ground_truth/                ← Per-case ground-truth scoring files
│   ├── baselines/                   ← Protocol SIFT reference findings
│   └── reports/html_report.py       ← Visual HTML comparison report
├── tests/                           ← pytest unit tests (74 passing, 1 skipped)
├── yara_rules/
│   ├── suspicious_strings.yar       ← T1059.001, T1003, T1218, T1547.001
│   ├── webshells.yar                ← T1505.003
│   ├── ransomware.yar               ← T1486, T1490
│   ├── rats.yar                     ← T1219, T1071
│   └── packers.yar                  ← T1027.002
├── analysis/                        ← findings.json + forensic_audit.log (runtime)
├── exports/                         ← raw tool outputs SHA-256 indexed (runtime)
├── docs/                            ← architecture.md, dataset.md, devpost_submission.md, JUDGING.md
├── investigate.py                   ← Autonomous agentic investigation (memory / disk-only / both)
├── demo.py                          ← Deterministic multi-agent pipeline (no LLM/key)
├── examiner_portal.py               ← Interactive human-review UI (sign-off, drill-down, multi-case)
├── verify_findings.py               ← Independent re-verification of claims + audit chain
├── preflight.py                     ← Environment self-check (operational tool groups)
├── AGENTS.md                        ← Orientation for coding/judging agents
├── .env.example                     ← Environment template
└── requirements.txt

Additional first-class modules: mcp_server/preflight.py (dependency map), mcp_server/evidence_store.py (stdlib SQLite evidence index), and the MCP tool modules tools/system_health.py, tools/investigation_state.py (hypothesis ledger), tools/evidence_index.py (index_evidence / query_evidence).

MITRE ATT&CK Coverage

DeepSIFT's mitre_auto_map.py (80+ rules, 19 categories) tags findings at the tool layer:

Finding	Technique
Process injection (PE header in RWX region)	T1055 — Process Injection
PowerShell encoding (`-enc`, `-e` flags)	T1059.001 — PowerShell
Registry run key modification	T1547.001 — Registry Run Keys
Active external network connection from suspicious process	T1071 — Application Layer Protocol
LSASS memory access	T1003.001 — LSASS Memory
DKOM-hidden process (pslist vs psscan gap)	T1014 — Rootkit
Service install (event 7045 / 4697)	T1543.003 — Windows Service
Scheduled task (event 4698 / 106)	T1053.005 — Scheduled Task
WMI event subscription (event 5860 / 5861)	T1546.003 — WMI Persistence
Lateral movement (RDP / SMB)	T1021.001 / T1021.002
Executable in temp dir (shimcache)	T1036.005 — Match Legitimate Name
PowerShell script block (event 4104)	T1059.001 — PowerShell
Cloud storage upload (SRUM high bytes_sent)	T1567.002 — Exfiltration to Cloud Storage
Burst file deletion (USN Journal)	T1070 — Indicator Removal
Timestamp anomaly (MFT 0x10 vs 0x30)	T1070.006 — Timestomping
Browser visit to cloud exfil domain	T1567.002 — Exfiltration to Cloud Storage
DNS query subdomain length > 40 chars	T1048.003 — DNS Tunneling
Web shell URL pattern (cmd.php, shell.aspx)	T1505.003 — Web Shell
VBA AutoOpen / Shell / PowerShell call	T1566.001 — Spearphishing Attachment
DDE/DDEAUTO in Office document	T1559.002 — Dynamic Data Exchange
LD_PRELOAD in process environment	T1574.006 — LD_PRELOAD
Linux crontab persistence entry	T1053.003 — Cron
History file wiped (`.bash_history` → `/dev/null`)	T1070.003 — Clear Command History
Port scan (10+ unique ports from one host)	T1046 — Network Service Discovery
IRP hook in driver dispatch table	T1014 — Rootkit
Secure deletion tool in prefetch	T1070.004 — File Deletion
VSS shadow count = 0	T1490 — Inhibit System Recovery
NTFS Alternate Data Stream	T1564.004 — Hide Artifacts: NTFS ADS
File extension / magic byte mismatch	T1036.007 — Masquerading
Remote access tool installed (AnyDesk, TeamViewer)	T1219 — Remote Access Software

Hard Rules (Architectural Enforcement)

These are not prompts — they are code:

Read-only evidence — guard_output_path() raises PermissionError for any write attempt under /cases/, /mnt/, or /media/. No prompt override possible.
No shell escape — There is no run_command or execute_shell tool on the MCP surface. The server exposes only the typed tools listed above; each runs a fixed binary via an argv list (no shell), and guard_command additionally blocks destructive/exfiltration binaries and shell-string commands on the Volatility / EZ Tools / registry exec paths.
Bounded, evidence-driven budget — audit.py counts every tool call; the agent runs until the evidence is sufficient (configurable via MAX_ITERATIONS) and then calls finish_analysis. The count is recorded in the report, so depth of analysis is transparent.
Provenance-gated reporting — finish_analysis requires a non-empty audit_ids list. An empty list returns an error — fabricated findings structurally cannot be submitted.
Observation/interpretation split — finish_analysis takes separate observation (factual, what tools showed) and interpretation (analytical, what it means) parameters. This separation reduces hallucination by preventing blending of artifact data with inference.

Environment Variables

Copy .env.example to .env and configure:

# SIFT tool commands (usually pre-configured on SIFT VM)
VOLATILITY_CMD=python3 -m volatility3
LOG2TIMELINE_CMD=log2timeline.py
PSORT_CMD=psort.py
FLS_CMD=fls
MMLS_CMD=mmls
ICAT_CMD=icat
YARA_CMD=yara

# EZ Tools directory (SIFT default)
EZ_TOOLS_DIR=/opt/zimmermantools

# Hayabusa event log analyzer (3,700+ Sigma rules)
HAYABUSA_CMD=hayabusa

# Optional — enables IP reputation lookups
ABUSEIPDB_API_KEY=your_key_here
VIRUSTOTAL_API_KEY=your_key_here

# Investigation constraints
MAX_TOOL_TIMEOUT=300
MAX_ITERATIONS=40

# MCP transport (stdio default; sse / streamable-http expose an HTTP endpoint for any client)
DEEPSIFT_MCP_TRANSPORT=stdio
# Optional: HMAC-sign the chain of custody (held off the evidence host) for forgery resistance
DEEPSIFT_AUDIT_KEY=

Development

# Run tests (74 passing, 1 skipped)
pytest tests/ -v

# Syntax check
python -m py_compile mcp_server/tools/*.py mcp_server/parsers/*.py

# Seed the case-agnostic RAG knowledge base (MITRE + LOLBAS + Hunt Evil baseline)
python3 rag/ingest/run_all.py

# Optionally load a case's own IOCs for that investigation (per case, opt-in)
python3 rag/ingest/run_all.py --case-ioc-json analysis/findings.json
# (the bundled ROCBA example pack: --load-rocba)

License

MIT License — see LICENSE file.

DeepSIFT was built for the Find Evil! hackathon hosted by SANS DFIR.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
agents		agents
benchmark		benchmark
docs		docs
mcp_server		mcp_server
rag		rag
tests		tests
yara_rules		yara_rules
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
examiner_portal.py		examiner_portal.py
examiner_review.py		examiner_review.py
investigate.py		investigate.py
preflight.py		preflight.py
requirements.txt		requirements.txt
verify_findings.py		verify_findings.py

Folders and files

Latest commit

History

Repository files navigation

DeepSIFT

🧑‍⚖️ For judges (and judging agents)

Table of Contents

Why DeepSIFT

Architecture

Tool Inventory (155 MCP tools)

Memory Forensics — Core (Volatility 3)

Memory Forensics — Extended (Volatility 3)

Timeline Analysis (log2timeline / Plaso)

Disk Forensics (Sleuth Kit)

Windows Artifact Analysis (EZ Tools)

Windows Event Log — Hayabusa / Sigma

Static File Analysis

Network Traffic Analysis

Cross-Artifact Correlation

Investigation Control & Autonomous Reasoning

Scale, Health & Self-Verification

YARA Hunting

Memory Forensics — Advanced (Volatility 3)

Browser Artifacts

Email Artifacts

Cloud Storage Artifacts

Document Analysis

Linux / macOS Forensics

Network Forensics — Extended

Anti-Forensics Detection

File Carving and Static Analysis

Extended Registry Forensics

Extended Disk Forensics

Threat Intelligence

How DeepSIFT Eliminates Hallucination

Investigation Workflow

Memory Image

Windows Artifact Analysis

What Sets DeepSIFT Apart

How to Run It (three ways)

Examiner Portal — Human Review

Architectural Guardrails

Validated Results

ROCBA — FOR508 (memory + disk)

Abducted Zebrafish / Vanko — FOR500 (disk-only)

Production Hardening

Running on SIFT Workstation (Linux) — notes

Setup & Installation

Prerequisites

Installation

Connect to Claude Code

Verify It Yourself (no API key)

Evidence Integrity & Chain of Custody

RAG Knowledge Base

Benchmarking

Protocol SIFT vs DeepSIFT (ROCBA case)

vigia-cases Standardized Benchmark

Project Structure

MITRE ATT&CK Coverage

Hard Rules (Architectural Enforcement)

Environment Variables

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages