Theory->Engineering: consent in TCB (A3), semantic gate (A4/A5), Mahdavi compass (A7) + red-team by Aliipou · Pull Request #4 · Aliipou/authgate-kernel

Aliipou · 2026-06-11T06:29:13Z

What & why

Closes the three open gaps in Theory_to_Engineering_Plan.md, mapping axioms
A3/A4/A5/A7 of the Theory of Freedom (نظریه آزادی) from prose into Rust. Each
gap previously existed only as bypassable Python; this lands the parts that
belong in code, at the correct trust level — and nothing more.

Changes

Axiom	Module	Trust level
A3 — consent recorded, not assumed	`src/tcb/consent.rs` (+ `engine.rs` Layer-4 gate, `types.rs` fields)	TCB
A4/A5 — no coercion / deception	`src/semantic_gate.rs`	NOT TCB — advisory heuristic
A7 — Mahdavi compass	`src/compass/`	NOT TCB — advisory post-hoc scorer

A3: ConsentRecord (ed25519, identity-bound, expiry, revocable). When the
adapter sets requires_consent, no Permit issues without a valid, unexpired,
unrevoked consent covering actor+resource+rights, folded into the binding hash.
A4/A5: typed SemanticGate trait + CoercionAnalyzer (exit-blocking, HHI,
deception markers). Returns a verdict; never structurally denies.
A7: C(a) = w1*RVD + w2*VOI + w3*CD as a post-hoc scorer that annotates,
never denies. The deny threshold is operator policy, not theory.

Honest scope (no overclaim)

The TCB does not verify a consent grantor is the resource's rightful owner —
that's the L2 trust-root boundary, out of scope by design (documented + tested).
Semantic gate is heuristic; a known unicode-homoglyph evasion is tested, not hidden.
Compass cannot deny — asserted by test, not a formal proof.
TCB surface growth is minimal: engine.rs +28, types.rs +14, call_gate.rs +2.

Tests

cargo test --lib → 293 passed, 0 failed, including 47 red-team attack tests:
tcb/consent_redteam.rs (18), semantic_gate_redteam.rs (15), compass/redteam.rs (14).

Not in this PR (follow-ups from the 10-week plan)

Lean4 theorems (ConsentRequired, VerifierMonotonicity), Kani harnesses for the
consent path, TLC/Java run, the Constitutional-AI benchmark, and wiring the
semantic gate into CallGate.

🤖 Generated with Claude Code

Engineering unchanged. Adds a code-free reading layer mapping each Theory-of-Freedom component (axioms, ownership hierarchy, consent, divine justice, guidance, Mahdavi compass, conflict-by-clarification) to its file:line in src/, with an honest coverage matrix marking enforced vs. extension-only vs. documented gaps.

…rs (0% -> ~100%)

…ted coverage exclusions

…fensive branches

…n edges

…ardened verifier

…icy DSL, multi-agent; pragma unreachable branches

…, goal violations

… federation; pragma defensive branches

…Error fallback)

… regression guard)

Companion to COVERAGE_MATRIX.md, adding the formal-artifact column it omits: the actual Lean theorem / Kani harness behind each axiom AND its true strength. Honest findings (no spin): - A4/A6 are Kani-proven (prop_ownerless_machine_blocked, prop_machine_governs_human_blocked); their Lean theorems are `True := trivial` stubs. - A5/A7 attenuation + epoch revocation are real Lean proofs. - forbidden-flag theorems are real but SHALLOW: they prove "flag set => Blocked" (enforcement of a declared flag), NOT detection of coercion/deception. - verify_deterministic is `:= rfl` (vacuous). - Consent is Python-only/partial and absent from the Rust TCB; Justice, Guidance, and the Mahdavi compass are extensions-only with no proofs. Conclusion stated plainly: AuthGate is a proven (narrow) Rights *Verification* Kernel; the book's Rights-based *Decision* Kernel (detect coercion; choose the most legitimate among permissible actions) is not built and has no formal content. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

47 attack tests across the three Theory-to-Engineering gaps: - consent_redteam.rs (18): forged grantor keys, pubkey substitution, resource/rights swaps, consent-id forgery, revocation bypass, post-seal injection/reorder, documented L2 trust-root limit, permit<->valid invariant. - semantic_gate_redteam.rs (15): exit-block disguise, HHI boundary, case/ substring/unicode-homoglyph evasion (documented), NaN/negative shares, 50+ case totality sweep, advisory-never-denies. - compass/redteam.rs (14): score-gaming, NaN/Inf, zero/negative/huge weights, registry poisoning, and the safety proof that the Compass never denies. Full suite: 293 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…verage - Replace the misleading 'Python layer enforces the same logical invariants as the Rust TCB' claim with an honest 'compatibility runtime, not a security boundary; bypassable; only the Rust TCB carries guarantees'. - Add a Theory->Engineering coverage table for the three now-implemented axioms (A3 consent = TCB; A4/A5 semantic gate + A7 compass = advisory, NOT TCB), each with explicit trust level and no overclaim (notes the L2 owner-binding limit; calls the compass 'never denies' an asserted test, not a proof). - Update the verified Rust lib test count to 293 (incl. 47 red-team tests). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

# Conflicts: # README.md

The zero-panic clippy gate runs -D warnings over --all-targets; the 8-arg consent_signed_by test helper tripped clippy::too_many_arguments. Scoped allow on the test-only helper. Verified clean locally with the exact CI flags. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…, E702) These test files came in via nazariye-azadi and had never run through main's ruff gate. Auto-fixed import sorting + unused import; split semicolon multi-statements (E702) onto separate lines. ruff check src tests now clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ayer above) Explains why AuthGate (authority/enforcement, Rust TCB, crypto) and the Freedom Decision Kernel (legitimacy/decision, pure Python, no crypto) are separate sibling repos: legitimacy != authority. FDK decides whether an action is legitimate and hands the chosen action to AuthGate, which enforces whether the actor is authorized. Legitimacy first, then authority. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…act) Lock the responsibility split between the two products and connect them through ONE serialisable contract, not shared code: - spec/policy_decision.schema.json: the boundary contract FDK emits (verdict ALLOW/DENY/DEFER, action_id, reasons, axiom_trace, fail_closed). No 'confidence' field — FDK is a deterministic categorical gate; DEFER already expresses 'unsure, ask a human'. - authgate.integrations.fdk.enforce_legitimacy(): the ~50-line seam. Runs the CallGate (capability + scope + signature + TCB) ONLY on an explicit ALLOW bound to the same action_id. Fail-closed on DENY, DEFER, fail_closed, malformed payload, or action_id mismatch — no path from non-ALLOW to a tool call. Imports NO FDK code (the contract is the only coupling). - tests/test_fdk_bridge.py: golden end-to-end flow + JSON round-trip + every non-ALLOW/malformed path, against a real registry/verifier/CallGate. 15 tests, module 100% covered. - examples/fdk_authgate_flow.py: decoupled runnable demo (3 outcomes). - DECISIONS.md: records the contract-not-code boundary decision. ruff + mypy clean; AuthGate stays the sole source of truth for authority, FDK only interprets legitimacy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…hema Contract-drift guard for the AuthGate half of the hand-synced FDK<->AuthGate boundary: asserts spec/policy_decision.schema.json's required fields are exactly what parse_policy_decision enforces, the verdict enum matches, and a fully populated schema payload parses. A schema change now breaks CI here instead of silently in production. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ased] Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… discipline as FDK) Director's directive: continue AuthGate only if treated like FDK — hunt its death, not its proof. Question: does AuthGate solve anything OPA/Cedar/Zanzibar INHERENTLY cannot? Run the three best-case scenarios and try to kill each. - Scenario 1 (purpose violation): decision-time purpose = policy-expressible in Rego/Cedar (collapses). But purpose-as-FLOW (read for A, used for B) is structurally beyond point-in-time policy engines = a real gap = information-flow control. AuthGate ships an IFC extension (NonInterferenceChecker/SecurityLattice). SURVIVES narrowly — but IFC is a known 1970s technique AuthGate bundles, not invents. - Scenario 2 (revoked consent): KILLED — Zanzibar revocation is native; consent-as- owner-controlled-relationship is modelable in all three. - Scenario 3 (delegation provenance / flawed origin): KILLED as a differentiator — real gap but SHARED; no system (incl. AuthGate) audits real-world root legitimacy (the provenance/ownership-genesis problem). Verdict: AuthGate survives on exactly ONE narrow front (capability authority + IFC / purpose-over-time), better than FDK (which had none), but it's a recombination of known techniques (capabilities 1966 + IFC 1976) made usable — engineering value, not new science. The honest answer to 'why AuthGate vs OPA/Cedar' is 'usage/flow/purpose control on capabilities', NOT 'a better authz engine' (Cedar is verified, Zanzibar scales). Implied sprint (2-4 wks, no new code): find 3 real agent purpose-violation-as-flow cases only capability+IFC catches; if they collapse to DLP/logging, close AuthGate like FDK and use the whole body as a backend/cloud/product-engineering portfolio. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…nce stack WHY_NOT_OPA left ONE survivor (purpose-as-flow via capability+IFC). But IFC/data-gov is itself crowded, so re-kill it against DLP / classification / lineage / PBAC / confidential computing / IFC. Discipline: 'could agent data-flow matter?' (yes) is NOT the question; 'what does AuthGate solve that DLP+lineage+PBAC+IFC do not?' is. - Each incumbent mapped + its structural limit for AGENTS: DLP = content-at-egress, not capability/purpose-aware; classification = labels not enforcement; lineage = observability/batch not runtime-blocking; PBAC = query-time read, not cross-step flow; confidential computing = isolation, orthogonal; IFC = right mechanism but language-level + not capability-aware. All built for humans/ETL, not an agent moving data through an LLM context across tool calls. - Candidate gap (narrow): capability-bound, runtime, per-tool-call IFC enforcement INSIDE the agent loop, label tied to the capability/purpose the data was read under, BLOCKED at the CallGate. AuthGate already has the 3 pieces in one runtime (capability DAG + IFC extension + CallGate); incumbents have them in different products at different layers. - Held the discipline — 3 ways it still dies: (1) 'DLP with extra steps' (output-DLP catches the same); (2) PBAC + logging/detection is enough (runtime blocking = costly over-engineering); (3) DEEPEST: label propagation through an LLM is undecidable in general -> degrades to heuristic tainting ~ DLP. Risk #3 may be fatal alone. - Probabilities (user's): new paradigm low; Agent Governance product notable; complementary capability on agentic AI most likely. More promising than FDK because the question is practical-2026, not philosophical. - Sprint: 3 real agent scenarios that survive risks #1-#3; if they collapse to output-DLP+audit, close AuthGate; if one holds with SOUND label propagation, it has its single real reason to exist (agent data governance). Status: most promising of the three projects, still unproven. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Apply the kill-discipline to the single surviving question: can a purpose/capability label be carried usefully+reliably through an agent's execution (incl. LLM transforms) to egress? Split it: - A. SOUND fine-grained propagation (track meaning through the LLM): DEAD. LLM is not a transparent function; IFC needs a known program; no general answer to 'how much label remains' (SSN->summary->embedding->text). This is the FDK mistake in new costume. - B. COARSE conservative tool-boundary propagation (label at capability/tool granularity; any P-tainted input -> output tainted P unless a declassifier ran; CallGate blocks off-purpose egress): ALIVE + buildable on existing pieces, reliable in the conservative sense. The real crux is NOT soundness but LABEL-CREEP: conservative taint over-approximates -> after a few steps everything is tainted -> blocks all egress -> 'sound but useless'. So the deciding question is empirical+measurable: does coarse capability-taint catch real purpose-violations WITHOUT over-blocking legitimate work (given a few declassifiers)? High TP + tolerable FP -> real product; label-creep dominates -> DLP-renamed -> close. Honest competition: capability+flow-control for LLM agents is exactly where frontier work is moving (DeepMind CaMeL, dual-LLM, agent taint-tracking) -> validates the problem as real (unlike FDK) BUT means AuthGate isn't first; must answer 'what over CaMeL?' (likely deployable runtime gate + DAG + audit = engineering, not new idea). Probabilities (user): paradigm <10%; fundamental research 10-20%; Agent-Governance product 40-60%; distributed-systems+security showcase 80%+. Sprint fully specified: Gate 1 (no code) = 3 scenarios survive risks #1-#2; Gate 2 (the decider, needs a MINIMAL prototype) = coarse capability-taint at the CallGate, run on real agent traces, MEASURE true-positive vs label-creep. That one measurement decides product-vs-DLP. Status: most promising of the 3, still unproven, reduced to one buildable+measurable experiment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rol) The honest top-level marker (parallel to FDK's STATUS.md). Collapsed: authorization (OPA/Cedar/Zanzibar own it), revocation, relationship graphs, purpose-at-request-time. Surviving thesis: Authorization != Purpose Control -> information-flow/purpose governance for autonomous agents (a real, new 2026 problem; validated by CaMeL et al.). Reduced to ONE empirical experiment: coarse capability-taint at the CallGate, measured for label-creep on real agent traces (sound fine-grained propagation is dead; coarse conservative is alive iff label-creep is tolerable). Disposition: 2-4 wk sprint (Gate 1 scenarios -> Gate 2 prototype+measure), then product-vs-DLP-vs-close. The pattern across the whole program: ideas collapsed toward philosophy, survived toward runtime/systems/security/capabilities -> the advantage is systems engineering. Scorecard: FDK theory closed; Lock-in frozen; AuthGate authz absorbed; AuthGate+purpose-control = the one live thread, one experiment from a verdict. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…purpose-flow UNDECIDED) (a) AuthGate authorization gap -> CONFIRMED: no authz scenario it alone handles; the only angle with teeth (sequence reasoning) belongs to IFC, not authorization. (b) Capability-bound runtime IFC vs DLP/lineage/PBAC/IFC -> UNDECIDED (the 'DLP-renamed' framing is premature): the PII-laundering case ('summarise then email') defeats content-DLP yet the purpose violation survives on a capability-taint = a structural {capability-binding x CallGate x in-loop-blocking} gap no incumbent occupies. But soundness is dead and usefulness hinges on label-creep -> only the Gate-2 measurement on real agent traces decides. Premature closure averted; premature victory avoided. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ise falsified, value reframed Per director ('test it in real ai agents' = directly use the subagents). Built + RAN, verified by lead re-run: - examples/capability_taint_experiment.py: synthetic label-creep sim. Taint TP 100% vs content-DLP 61.5% (DLP misses all 5 laundering cases); creep FP 100%->0% with declassifiers. - examples/agent_taint_harness.py: provider-agnostic REAL agent harness (inject call_llm -> real OpenAI/Anthropic drops in). Runs end-to-end on a deterministic stub; conservative taint fires REGARDLESS of LLM content-transform (no sound label propagation needed). NOT run on a live LLM (no API key) — that is the one open measurement. - REAL_AGENT_TEST.md: tested DIRECTLY on 8 real frontier-LLM subagents (synthetic PII records). RESULT CONTRADICTS the synthetic premise: SSN never leaked (0/8); under honest 'summarize for marketing' framing all 5 refused/self-scrubbed (model alignment enforced purpose-limitation). BUT disguised as an ETL/CSV-formatting task, 2/3 passed email+phone (dropped only the salient SSN) — real, framing-dependent purpose-creep. Reframing: the naive 'taint catches laundering DLP misses' pitch is WEAK for capable aligned models; the real value is PROVENANCE-based, framing/salience-INVARIANT, MODEL-INDEPENDENT defense-in-depth (catches the disguised leak; protects weak/jailbroken/tool-mediated cases). Cost = label-creep over-blocks legit scrubbed flows (also shown on real agents). Complementary to alignment, not competing. Limits: n=8 one model family; model SAW data as text (tool-mediated untested). Verdict: reframed+sharpened, not confirmed/dead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e next kill-test is MARKET Per director, fold in the real-agent finding + the next gate: - Reframed positioning: 'Runtime purpose-bound data governance for AI agents' (industrial, legible) — enforcement independent of whether the model can be trusted, NOT 'a new authz model'. - REAL_AGENT_TEST result recorded: original pitch (agents leak PII -> AuthGate stops it) FALSIFIED on real agents (self-defend, 0/8 SSN, 5/5 refuse honest framing); but 2/3 leaked email+phone under a formatting DISGUISE = policy-robustness failure (framing/prompt/model-dependent). So the legitimate question shifts from 'can the model be trusted?' to 'can enforcement be made INDEPENDENT of model behavior?' = the provenance/framing-blind gate's job = the surviving thesis. - NEXT KILL-TEST IS MARKET, not technical: 'do companies feel this pain enough to pay, or is Microsoft Purview + DLP + audit logs good enough?' Kill/confirm by CUSTOMER evidence — find teams running agents on sensitive data who hit a framing-robustness leak and don't consider their stack sufficient. None / Purview-suffices -> close like FDK. They exist -> a product. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

alexanderthenth and others added 23 commits June 4, 2026 19:12

tests: cover settings, redteam scenarios, mcp_gate & langgraph adapte…

6192902

…rs (0% -> ~100%)

tests+config: cover errors/tracing/wire_validator (100%); add documen…

0583bd6

…ted coverage exclusions

tests: cover call_gate/registry/verifier paths; pragma unreachable de…

5ed617e

…fensive branches

tests: cover persuasion boundary model + sovereignty metrics scoring

084c7e9

tests: cover coercion analyzer patterns and risk levels

c801ea6

tests: cover cli subcommands, key rotation, api error paths, detectio…

dd93ced

…n edges

tests: cover authority sources, override detector, exit guarantees, h…

0e7e64b

…ardened verifier

tests: cover consent algebra, consent registry, policy delegate branch

dc773fc

tests: cover economy, sandbox, schema version, extensions facade, pol…

60b01d7

…icy DSL, multi-agent; pragma unreachable branches

tests: cover langchain/anthropic adapters, audit loader, claim covers…

a0fedd3

…, goal violations

tests: cover anti-capture, recursive governance, distributed kernel &…

da9fbfe

… federation; pragma defensive branches

tests: close final coverage gaps (policy DSL indent, langchain Import…

dd3f99d

…Error fallback)

tests: relax overfit 1000-claim perf bound (1s machine-specific -> 5s…

a7f590b

… regression guard)

Gap 2: SemanticGate trait + CoercionAnalyzer (A4/A5)

e3f5a79

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Gap 3: Mahdavi Compass scorer (A7)

7420cbe

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Gap 1: ConsentRecord in Rust TCB (A3)

709d991

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge branch 'worktree-agent-a9ad27e8acc4787eb' into feat/theory-gaps

efd48bc

Merge branch 'worktree-agent-a2bad7a240b0cab83' into feat/theory-gaps

eaa4ab5

Merge branch 'worktree-agent-a66e125d45856cacf' into feat/theory-gaps

aafc65d

Aliipou changed the base branch from nazariye-azadi to main June 11, 2026 06:31

Aliipou closed this Jun 11, 2026

Aliipou reopened this Jun 11, 2026

alexanderthenth and others added 4 commits June 11, 2026 09:32

ci: trigger full CI on main-targeted PR (no code change)

3edfd4f

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into feat/theory-gaps

19781d3

# Conflicts: # README.md

Aliipou changed the base branch from main to nazariye-azadi June 11, 2026 06:50

Aliipou changed the base branch from nazariye-azadi to main June 11, 2026 06:51

alexanderthenth and others added 11 commits June 12, 2026 00:12

docs(changelog): record the FDK->AuthGate boundary seam under [Unrele…

cb0d5eb

…ased] Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Theory->Engineering: consent in TCB (A3), semantic gate (A4/A5), Mahdavi compass (A7) + red-team#4

Theory->Engineering: consent in TCB (A3), semantic gate (A4/A5), Mahdavi compass (A7) + red-team#4
Aliipou wants to merge 38 commits into
mainfrom
feat/theory-gaps

Aliipou commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Aliipou commented Jun 11, 2026

What & why

Changes

Honest scope (no overclaim)

Tests

Not in this PR (follow-ups from the 10-week plan)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants