diff --git a/docs/hooks.md b/docs/hooks.md index 9fdd905..c6f96b7 100644 --- a/docs/hooks.md +++ b/docs/hooks.md @@ -86,6 +86,21 @@ hooks.Register(engine.AfterToolExec, func(ctx context.Context, hctx *engine.Hook }) ``` +## Skill Guardrail Hooks + +When skills declare guardrails in their `SKILL.md` frontmatter, the runner registers four hooks that enforce skill-specific security policies across the entire agent loop: + +| Hook Point | Guardrail Type | Behavior | +|------------|---------------|----------| +| `BeforeLLMCall` | `deny_prompts` | Blocks user messages that probe agent capabilities (e.g., "what tools can you run") | +| `AfterLLMCall` | `deny_responses` | Replaces LLM responses that enumerate internal binary names | +| `BeforeToolExec` | `deny_commands` | Blocks `cli_execute` commands matching deny patterns (e.g., `kubectl get secrets`) | +| `AfterToolExec` | `deny_output` | Blocks or redacts `cli_execute` output matching deny patterns (e.g., Secret manifests) | + +These hooks complement the global guardrail hooks (secrets/PII scanning) and fire in addition to them. Skill guardrails are loaded from build artifacts or parsed at runtime from `SKILL.md` — no `forge build` step is required. + +For pattern syntax and configuration, see [Skill Guardrails](security/guardrails.md#skill-guardrails). + ## Audit Logging The runner registers `AfterLLMCall` hooks that emit structured audit events for each LLM interaction. Audit fields include: diff --git a/docs/runtime.md b/docs/runtime.md index 88b9990..bd493d3 100644 --- a/docs/runtime.md +++ b/docs/runtime.md @@ -203,7 +203,7 @@ For details on session persistence, context window management, compaction, and l The engine fires hooks at key points in the loop. See [Hooks](hooks.md) for details. -The runner registers four hook groups: logging, audit, progress, and guardrail hooks. The guardrail `AfterToolExec` hook scans tool output for secrets and PII, redacting or blocking before results enter the LLM context. See [Tool Output Scanning](security/guardrails.md#tool-output-scanning). +The runner registers five hook groups: logging, audit, progress, global guardrail hooks, and skill guardrail hooks. The global guardrail `AfterToolExec` hook scans tool output for secrets and PII, redacting or blocking before results enter the LLM context. Skill guardrail hooks enforce domain-specific rules declared in `SKILL.md` — blocking commands, redacting output, intercepting capability enumeration probes, and replacing binary-enumerating responses. Skill guardrails are loaded from build artifacts or parsed directly from `SKILL.md` at runtime (no `forge build` required). See [Tool Output Scanning](security/guardrails.md#tool-output-scanning) and [Skill Guardrails](security/guardrails.md#skill-guardrails). ## Streaming diff --git a/docs/security/guardrails.md b/docs/security/guardrails.md index e67a960..97d0c81 100644 --- a/docs/security/guardrails.md +++ b/docs/security/guardrails.md @@ -110,6 +110,81 @@ Additionally, `cmd.Dir` is set to `workDir` so relative paths in subprocess exec | `jq '.' /tmp/data.json` | Allowed — system path outside `$HOME` | | `ls ./data/` | Allowed — within workDir | +## Skill Guardrails + +Skills can declare domain-specific guardrails in their `SKILL.md` frontmatter under `metadata.forge.guardrails`. These complement the global guardrails with rules authored by skill developers to enforce least-privilege and prevent capability enumeration. + +### Guardrail Types + +| Type | Hook Point | Direction | Behavior | +|------|-----------|-----------|----------| +| `deny_commands` | `BeforeToolExec` | Inbound | Blocks `cli_execute` commands matching a regex pattern | +| `deny_output` | `AfterToolExec` | Outbound | Blocks or redacts `cli_execute` output matching a regex pattern | +| `deny_prompts` | `BeforeLLMCall` | Inbound | Blocks user messages matching a regex (capability enumeration probes) | +| `deny_responses` | `AfterLLMCall` | Outbound | Replaces LLM responses matching a regex (binary name leaks) | + +### SKILL.md Configuration + +```yaml +metadata: + forge: + guardrails: + deny_commands: + - pattern: '\bget\s+secrets?\b' + message: "Listing Kubernetes secrets is not permitted" + - pattern: '\bauth\s+can-i\b' + message: "Permission enumeration is not permitted" + deny_output: + - pattern: 'kind:\s*Secret' + action: block + - pattern: 'token:\s*[A-Za-z0-9+/=]{40,}' + action: redact + deny_prompts: + - pattern: '\b(approved|allowed|available)\b.{0,40}\b(tools?|binaries|commands?)\b' + message: "I help with Kubernetes cost analysis. Ask about cluster costs." + deny_responses: + - pattern: '\b(kubectl|jq|awk|bc|curl)\b.*\b(kubectl|jq|awk|bc|curl)\b.*\b(kubectl|jq|awk|bc|curl)\b' + message: "I can analyze cluster costs. What would you like to know?" +``` + +### Pattern Details + +**`deny_commands`** — Patterns match against the reconstructed command line (`binary arg1 arg2 ...`). Only fires for `cli_execute` tool calls. + +**`deny_output`** — Patterns match against tool output text. The `action` field controls behavior: + +| Action | Behavior | +|--------|----------| +| `block` | Returns an error, preventing the output from entering the LLM context | +| `redact` | Replaces matched text with `[BLOCKED BY POLICY]` and logs a warning | + +**`deny_prompts`** — Patterns are compiled with case-insensitive matching (`(?i)`). Designed to catch capability enumeration probes like "what are the approved tools" or "list available binaries". The `message` field provides a redirect response. + +**`deny_responses`** — Patterns are compiled with case-insensitive and dot-matches-newline flags (`(?is)`). Designed to catch LLM responses that enumerate internal binary names. When matched, the entire response is replaced with the `message` text. + +### Aggregation + +When multiple skills declare guardrails, patterns are aggregated and deduplicated across all active skills. The `SkillGuardrailEngine` runs all patterns from all skills as a single enforcement layer. + +### Runtime Fallback + +Skill guardrails fire both with and without `forge build`: + +- **With build** — Guardrails are serialized into `policy-scaffold.json` during `forge build` and loaded at runtime +- **Without build** — The runner parses `SKILL.md` files at startup and loads guardrails directly, falling back to runtime-parsed rules when no build artifact exists + +This ensures guardrails are always active during development (`forge run`) without requiring a full build cycle. + +## File Protocol Blocking + +The `cli_execute` tool blocks arguments containing `file://` URLs (case-insensitive). This prevents filesystem traversal attacks via tools like `curl file:///etc/passwd` that bypass path validation since `file://` URLs are not detected as filesystem paths by `looksLikePath()`. + +| Input | Result | +|-------|--------| +| `curl file:///etc/passwd` | Blocked — `file://` protocol detected | +| `curl FILE:///etc/shadow` | Blocked — case-insensitive check | +| `curl http://example.com` | Allowed — only `file://` is blocked | + ## Audit Events Guardrail evaluations are logged as structured audit events: diff --git a/docs/security/overview.md b/docs/security/overview.md index 3279a10..dc1fc44 100644 --- a/docs/security/overview.md +++ b/docs/security/overview.md @@ -8,14 +8,18 @@ Forge's security is organized in layers, each addressing a different threat surf ``` ┌──────────────────────────────────────────────────────────────┐ -│ Guardrails │ +│ Skill Guardrails │ +│ (deny commands/output/prompts/responses per skill) │ +├──────────────────────────────────────────────────────────────┤ +│ Global Guardrails │ │ (content filtering, PII, jailbreak) │ ├──────────────────────────────────────────────────────────────┤ │ Egress Enforcement │ │ (EgressEnforcer + EgressProxy + NetworkPolicy) │ ├──────────────────────────────────────────────────────────────┤ │ Execution Sandboxing │ -│ (env isolation, binary allowlists, arg validation) │ +│ (env isolation, binary allowlists, arg validation, │ +│ file:// blocking, shell denylist) │ ├──────────────────────────────────────────────────────────────┤ │ Secrets Management │ │ (AES-256-GCM, Argon2id, per-agent isolation) │ @@ -112,17 +116,22 @@ Skill scripts run via `SkillCommandExecutor` (`forge-cli/tools/exec.go`): ### CLIExecuteTool -The `cli_execute` tool (`forge-cli/tools/cli_execute.go`) provides 7 security layers: +The `cli_execute` tool (`forge-cli/tools/cli_execute.go`) provides 12 security layers: | # | Layer | Detail | |---|-------|--------| -| 1 | **Binary allowlist** | Only pre-approved binaries can execute | -| 2 | **Binary resolution** | Binaries are resolved to absolute paths via `exec.LookPath` at startup | -| 3 | **Argument validation** | Rejects arguments containing `$(`, backticks, or newlines | -| 4 | **Timeout** | Configurable per-command timeout (default: 120s) | -| 5 | **No shell** | Uses `exec.CommandContext` directly — no shell expansion | -| 6 | **Environment isolation** | Only `PATH`, `HOME`, `LANG`, explicit passthrough vars, and proxy vars | -| 7 | **Output limits** | Configurable max output size (default: 1MB) to prevent memory exhaustion | +| 1 | **Shell denylist** | Shell interpreters (`bash`, `sh`, `zsh`, etc.) filtered at construction and blocked at execution | +| 2 | **Binary allowlist** | Only pre-approved binaries can execute | +| 3 | **Binary resolution** | Binaries are resolved to absolute paths via `exec.LookPath` at startup | +| 4 | **Argument validation** | Rejects arguments containing `$(`, backticks, newlines, or `file://` URLs | +| 5 | **File protocol blocking** | Blocks `file://` URLs (case-insensitive) to prevent filesystem traversal | +| 6 | **Path confinement** | Path arguments inside `$HOME` but outside `workDir` are blocked | +| 7 | **Timeout** | Configurable per-command timeout (default: 120s) | +| 8 | **No shell** | Uses `exec.CommandContext` directly — no shell expansion | +| 9 | **Working directory** | `cmd.Dir` set to `workDir` for relative path resolution | +| 10 | **Environment isolation** | Only `PATH`, `HOME`, `LANG`, explicit passthrough vars, and proxy vars | +| 11 | **Output limits** | Configurable max output size (default: 1MB) to prevent memory exhaustion | +| 12 | **Skill guardrails** | Skill-declared `deny_commands` and `deny_output` patterns via hooks | ### Configuration @@ -158,6 +167,10 @@ For full details, see **[Build Signing & Verification](signing.md)**. The guardrail engine checks inbound and outbound messages against policy rules including content filtering, PII detection, and jailbreak protection. Guardrails run in `enforce` (blocking) or `warn` (logging) mode. +### Skill Guardrails + +Skills can declare domain-specific guardrails in their `SKILL.md` frontmatter. These guardrails operate at four hook points — blocking unauthorized commands (`deny_commands`), redacting sensitive output (`deny_output`), intercepting capability enumeration probes (`deny_prompts`), and replacing binary-enumerating LLM responses (`deny_responses`). Skill guardrails fire at runtime without requiring `forge build`. + For full details, see **[Content Guardrails](guardrails.md)**. --- diff --git a/docs/skills.md b/docs/skills.md index d74eaaf..cb8f048 100644 --- a/docs/skills.md +++ b/docs/skills.md @@ -356,6 +356,54 @@ This registers three tools: Requires: `jq`. Egress: `cdn.tailwindcss.com`, `esm.sh`. +## Skill Guardrails + +Skills can declare domain-specific guardrails in their `SKILL.md` frontmatter to enforce security policies at runtime. These guardrails operate at four interception points in the agent loop, preventing unauthorized commands, data exfiltration, capability enumeration, and binary name disclosure. + +### Configuration + +Add a `guardrails` block under `metadata.forge` in `SKILL.md`: + +```yaml +metadata: + forge: + guardrails: + deny_commands: + - pattern: '\bget\s+secrets?\b' + message: "Listing Kubernetes secrets is not permitted" + deny_output: + - pattern: 'kind:\s*Secret' + action: block + - pattern: 'token:\s*[A-Za-z0-9+/=]{40,}' + action: redact + deny_prompts: + - pattern: '\b(approved|allowed|available)\b.{0,40}\b(tools?|binaries)\b' + message: "I help with K8s cost analysis. Ask about cluster costs." + deny_responses: + - pattern: '\b(kubectl|jq|awk|bc|curl)\b.*\b(kubectl|jq|awk|bc|curl)\b.*\b(kubectl|jq|awk|bc|curl)\b' + message: "I can analyze cluster costs. What would you like to know?" +``` + +### Guardrail Types + +| Type | Direction | Purpose | +|------|-----------|---------| +| `deny_commands` | Input | Block `cli_execute` commands matching patterns (e.g., `kubectl get secrets`) | +| `deny_output` | Output | Block or redact tool output matching patterns (e.g., Secret manifests, tokens) | +| `deny_prompts` | Input | Block user messages probing agent capabilities (e.g., "what tools can you run") | +| `deny_responses` | Output | Replace LLM responses that enumerate internal binary names | + +### Capability Enumeration Prevention + +The `deny_prompts` and `deny_responses` guardrails form a layered defense against capability enumeration attacks: + +1. **Input-side** (`deny_prompts`) — Intercepts user messages that probe for available tools, binaries, or commands and redirects to the skill's functional description +2. **Output-side** (`deny_responses`) — Catches LLM responses that list 3+ binary names and replaces the entire response with a functional capability description + +Additionally, skill `Description()` methods and system prompt catalog entries use generic descriptions instead of listing binary names. + +For full details on guardrail types, pattern syntax, and runtime behavior, see [Content Guardrails — Skill Guardrails](security/guardrails.md#skill-guardrails). + ## Skill Instructions in System Prompt Forge injects the **full body** of each skill's SKILL.md into the LLM system prompt. This means all detailed operational instructions — triage steps, detection heuristics, output structure, safety constraints — are directly available in the LLM's context without requiring an extra `read_skill` tool call. diff --git a/docs/tools.md b/docs/tools.md index 4856b87..523b8e8 100644 --- a/docs/tools.md +++ b/docs/tools.md @@ -59,7 +59,7 @@ Provider selection: `WEB_SEARCH_PROVIDER` env var, or auto-detect from available ## CLI Execute -The `cli_execute` tool provides security-hardened command execution with 10 security layers: +The `cli_execute` tool provides security-hardened command execution with 12 security layers: ```yaml tools: @@ -73,16 +73,18 @@ tools: | # | Layer | Detail | |---|-------|--------| -| 1 | **Shell denylist** | Shell interpreters (`bash`, `sh`, `zsh`, `dash`, `ksh`, `csh`, `tcsh`, `fish`) are unconditionally blocked — they defeat the no-shell design | +| 1 | **Shell denylist** | Shell interpreters (`bash`, `sh`, `zsh`, `dash`, `ksh`, `csh`, `tcsh`, `fish`) are filtered out at construction time and unconditionally blocked at execution — they defeat the no-shell design | | 2 | **Binary allowlist** | Only pre-approved binaries can execute | | 3 | **Binary resolution** | Binaries are resolved to absolute paths via `exec.LookPath` at startup | -| 4 | **Argument validation** | Rejects arguments containing `$(`, backticks, or newlines | -| 5 | **Path confinement** | Path arguments inside `$HOME` but outside `workDir` are blocked (see [Path Containment](security/guardrails.md#path-containment)) | -| 6 | **Timeout** | Configurable per-command timeout (default: 120s) | -| 7 | **No shell** | Uses `exec.CommandContext` directly — no shell expansion | -| 8 | **Working directory** | `cmd.Dir` set to `workDir` so relative paths resolve within the agent directory | -| 9 | **Environment isolation** | Only `PATH`, `HOME`, `LANG`, explicit passthrough vars, proxy vars, and `OPENAI_ORG_ID` (when set). `HOME` is overridden to `workDir` to prevent `~` expansion from reaching the real home directory | -| 10 | **Output limits** | Configurable max output size (default: 1MB) to prevent memory exhaustion | +| 4 | **Argument validation** | Rejects arguments containing `$(`, backticks, newlines, or `file://` URLs | +| 5 | **File protocol blocking** | Arguments containing `file://` (case-insensitive) are blocked to prevent filesystem traversal via `curl file:///etc/passwd` (see [File Protocol Blocking](security/guardrails.md#file-protocol-blocking)) | +| 6 | **Path confinement** | Path arguments inside `$HOME` but outside `workDir` are blocked (see [Path Containment](security/guardrails.md#path-containment)) | +| 7 | **Timeout** | Configurable per-command timeout (default: 120s) | +| 8 | **No shell** | Uses `exec.CommandContext` directly — no shell expansion | +| 9 | **Working directory** | `cmd.Dir` set to `workDir` so relative paths resolve within the agent directory | +| 10 | **Environment isolation** | Only `PATH`, `HOME`, `LANG`, explicit passthrough vars, proxy vars, and `OPENAI_ORG_ID` (when set). `HOME` is overridden to `workDir` to prevent `~` expansion from reaching the real home directory | +| 11 | **Output limits** | Configurable max output size (default: 1MB) to prevent memory exhaustion | +| 12 | **Skill guardrails** | Skill-declared `deny_commands` and `deny_output` patterns block/redact command inputs and outputs (see [Skill Guardrails](security/guardrails.md#skill-guardrails)) | ## File Create