Graph-Linked Adversarial Investigation & Verification Engine
Protocol SIFT lets Claude Code run forensic tools and asks it nicely not to hallucinate. GLAIVE makes hallucination architecturally impossible by forcing every finding to correspond to a path in a typed evidence graph.
GLAIVE is a submission to the FIND EVIL! hackathon (SANS Institute, Apr–Jun 2026). It extends Protocol SIFT — the SANS AI-orchestration POC that pairs Claude Code with the SIFT Workstation — with an architectural hallucination-prevention layer built on a typed evidence graph.
| Layer | Status | Tests |
|---|---|---|
| Typed evidence graph | Done | 187 passing |
| Content-addressed evidence store | Done | 24 passing |
| Ingestion (Defender + Volatility) | Done | 35 passing |
| EVTX binary adapter | Done | 15 passing |
| Orchestrator | Done | 11 passing |
| Finding report + gate | Done | 13 passing |
| MCP server (5 tools) | Done | 42 passing |
| Agent-loop integration test | Done | 2 passing |
| Volatility binary execution | Week 2 | — |
graph-verification skill (Protocol SIFT integration) |
Done | — (markdown asset) |
| Hunter agent + Claude Code config | Week 2 | — |
| Accuracy harness + ground-truth cases | Week 3 | — |
| Bypass test suite (5 attacks) | Done | 21 passing |
| Demo video | Week 3 | — |
Total: 327 tests passing, 18 integration tests opt-in (real malware data, ~7 min).
[ DEMO VIDEO LINK — added before submission ]
What the demo shows, against a real 16 MB Windows Defender event log (15,911 records, 10 detection events, 2 actual Trojan signatures):
-
Ingestion. GLAIVE's MCP server receives
ingest_artifact("Defender.evtx", "defender_evtx"). The file is SHA-256 hashed into a content-addressed store; 15,901 unsupported event types are filtered out; 10 supported detection events become typedAntivirusDetectionnodes in the graph. -
Hunt. Claude Code calls
query_graph(node_type="AntivirusDetection", filters=[{"field": "threat_name", "op": "contains", "value": "Trojan"}]). The graph returns real findings —Trojan:Win32/Cloxerdetected at08:21:44, quarantined at08:21:49. -
Audit. Claude Code calls
get_node_provenance(canonical_key=...). The node traces back through the graph → evidence hash → source file. Every byte is recoverable. -
The gate. Claude Code calls
commit_finding(claim, supporting_node_keys=[real_key], confidence_hint="confirmed"). The gate checks the graph evidence and downgrades to "inferred" — there's no corroborating edge yet, so "confirmed" isn't earned. The finding is committed, transparently downgraded. -
The gate refuses bypass. Claude Code attempts
commit_findingwith a fabricatedsupporting_node_keyreferencing a process that was never observed. The gate rejects withdecision: rejected_missing_node. Not via prompting — by construction.
| Protocol SIFT's stated rule | How GLAIVE enforces it |
|---|---|
| "No hallucinations" | Findings reference graph nodes; nodes are only created from validated tool output |
| "Deterministic execution" | Tool outputs flow through Pydantic-validated MCP handlers, not raw stdout |
| "Evidence integrity" | Content-addressed evidence store (SHA-256), read-only path enforcement |
| "Verification" | commit_finding refuses any claim whose evidence_hash is not resolvable |
Protocol SIFT writes these as prompt instructions. GLAIVE writes them as code.
GLAIVE adds four things to Protocol SIFT (see Status for what's shipped today):
- A typed evidence graph (Pydantic + NetworkX). Every forensic observation becomes a typed node or edge with provenance. Reasoning happens over the graph, not over LLM-summarized text. (Shipped.)
- A graph-verification MCP layer. A small server (5 tools, not 50) that
sits between Claude Code and the graph. The only way findings can be
committed is through
commit_finding, which rejects any claim that doesn't trace to a graph path. (Shipped.) - A
graph-verificationskill for Protocol SIFT. ASKILL.mdthat tells Claude Code how to use the graph layer — drops in alongside the existing memory-analysis / plaso-timeline / etc. skills. (Shipped.) - A bypass test suite. Five adversarial tests against GLAIVE's own constraints (hallucinated keys, confidence inflation, prompt injection, path traversal, resource exhaustion) with the architectural reason each one fails. See BYPASS_TESTS.md. (Shipped.)
GLAIVE does not replace Protocol SIFT. The base CLAUDE.md, the 5 existing skills, the case template, and the bash-driven SIFT tool invocations are all unchanged. GLAIVE plugs in.
Tested on: SANS SIFT (WSL2 Ubuntu 22.04), Python 3.11 Status: Week 1 complete (ingestion + graph + MCP server). Agent-driver CLI and demo recording in Weeks 2-3.
git clone https://github.com/aliyaalias19/glaive.git
cd glaive
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"pytest
# Expected: 327 passed, 18 deselectedThe 18 deselected tests are integration tests that run against a real binary EVTX file. To execute them, drop a real Windows Defender event log at test_evidence/Defender.evtx (instructions in docs/EVIDENCE.md), then:
pytest -m integration
# Expected: 18 passed in ~7 minutes (binary EVTX parsing is heavy)The single test that proves the architectural promise end-to-end:
pytest tests/mcp_server/test_agent_loop.py -m integration -vThis test simulates Claude Code calling all 5 MCP tools in sequence against real malware data, including a deliberate bypass attempt that the gate must reject. If this passes, every layer of GLAIVE — schema, graph, ingestion, MCP boundary, gate — works.
Wire the server into Claude Code by adding to ~/.claude/mcp.json:
{
"mcpServers": {
"glaive": {
"command": "python",
"args": ["-m", "glaive.mcp_server"]
}
}
}Then install the graph-verification skill that teaches Claude Code how to use the MCP tools alongside Protocol SIFT's existing skills:
ln -s "$(pwd)/docs/skills/graph-verification" ~/.claude/skills/graph-verification(The actual python -m glaive.mcp_server entry point is added in Week 2.)
| Path | What's in it |
|---|---|
glaive/graph/ |
Pydantic schema: 10 node types, 12 edge types, NetworkX wrapper |
glaive/evidence/ |
Content-addressed evidence store (SHA-256 + manifest) |
glaive/ingestion/ |
Parsers (Defender EVTX, Volatility) + EVTX binary adapter + orchestrator |
glaive/reporting/ |
FindingReport — the gate (confidence-downgrade enforcement) |
glaive/mcp_server/ |
MCP server (5 tools: ingest, query, provenance, commit, list) |
tests/ |
327 tests; 18 marked integration (run against real binary EVTX) |
docs/EVIDENCE_GRAPH_SCHEMA.md |
The full schema spec — 10 nodes, 12 edges, 5 principles |
docs/DECISIONS.md |
29 strategic and design decisions with rationale |
ARCHITECTURE.md |
System design and Trust Model |
LIMITATIONS.md |
What GLAIVE does not do |
evidence_samples/ |
Manifest pointing at public evidence datasets |
verification/bypass_tests/ |
21 adversarial tests covering 5 attack classes (see BYPASS_TESTS.md) |
BYPASS_TESTS.md |
Judge-facing narrative: 5 attacks, defenses, honest limitations |
| Path | Status |
|---|---|
ACCURACY_REPORT.md |
Filled by verification/harness.py against ground-truth cases (Week 3) |
glaive/cli.py |
The glaive investigate command-line driver |
| Volatility integration | vol.py shell-out for memory dump ingestion (requires SRL evidence pack) |
| Demo video | 5-minute screencast against real evidence |
Built for the FIND EVIL! hackathon (SANS Institute, Apr–Jun 2026). This project is substantially new work created during the hackathon period. Pre-existing dependencies (Protocol SIFT, Volatility 3, Plaso, python-evtx, NetworkX, Pydantic) are unmodified open-source libraries. The graph schema, MCP verification layer, graph-verification skill, and bypass test suite are original contributions.
MIT — see LICENSE.