From 75ffc108d2e13790fcbb3dcec46dac55943094cc Mon Sep 17 00:00:00 2001
From: Vincent Koc <vincentkoc@ieee.org>
Date: Fri, 22 May 2026 22:22:16 +0800
Subject: [PATCH] chore: add maintainer setup baseline

---
 .agents/skills/autoreview/SKILL.md           | 153 ++++
 .agents/skills/autoreview/scripts/autoreview | 892 +++++++++++++++++++
 .agents/skills/crabbox/SKILL.md              | 711 +++++++++++++++
 .crabbox.yaml                                |  47 +
 .github/CODEOWNERS                           |  19 +
 .github/dependabot.yml                       |  32 +
 .github/workflows/codeql.yml                 |  40 +
 .github/workflows/crabbox-hydrate.yml        | 125 +++
 .github/workflows/stale.yml                  |  86 ++
 AGENTS.md                                    |  21 +
 SECURITY.md                                  |  30 +
 11 files changed, 2156 insertions(+)
 create mode 100644 .agents/skills/autoreview/SKILL.md
 create mode 100755 .agents/skills/autoreview/scripts/autoreview
 create mode 100644 .agents/skills/crabbox/SKILL.md
 create mode 100644 .crabbox.yaml
 create mode 100644 .github/CODEOWNERS
 create mode 100644 .github/dependabot.yml
 create mode 100644 .github/workflows/codeql.yml
 create mode 100644 .github/workflows/crabbox-hydrate.yml
 create mode 100644 .github/workflows/stale.yml
 create mode 100644 AGENTS.md
 create mode 100644 SECURITY.md

diff --git a/.agents/skills/autoreview/SKILL.md b/.agents/skills/autoreview/SKILL.md
new file mode 100644
index 0000000..d5c9dc8
--- /dev/null
+++ b/.agents/skills/autoreview/SKILL.md
@@ -0,0 +1,153 @@
+---
+name: autoreview
+description: "Auto Review closeout. Codex review is the default when no engine is set and is the recommended reviewer."
+---
+
+# Auto Review
+
+Run the bundled structured review helper as a closeout check. This is code review, not Guardian `auto_review` approval routing.
+
+Codex review is the default when no engine is set. It usually delivers the best review results and should remain the normal final closeout engine.
+
+Use when:
+
+- user asks for Codex review / Claude review / autoreview / second-model review
+- after non-trivial code edits, before final/commit/ship
+- reviewing a local branch or PR branch after fixes
+
+## Contract
+
+- Treat review output as advisory. Never blindly apply it.
+- Verify every finding by reading the real code path and adjacent files.
+- Read dependency docs/source/types when the finding depends on external behavior.
+- Reject unrealistic edge cases, speculative risks, broad rewrites, and fixes that over-complicate the codebase.
+- Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class.
+- Keep going until structured review returns no accepted/actionable findings.
+- If a review-triggered fix changes code, rerun focused tests and rerun the structured review helper.
+- For security-audit suppression changes, verify accepted findings remain auditable: suppressed findings stay in structured output, active output keeps an unsuppressible suppression notice, and aggregate findings cannot hide unrelated active risk.
+- Never switch or override the requested review engine/model. If the review hits model capacity, retry the same command a few times with the same engine/model.
+- Tools are useful in review mode. The helper allows read-only inspection tools and web search by default so reviewers can check dependency contracts, upstream docs, and current behavior.
+- Security perspective is always included, but it should not cripple legitimate functionality. Report security findings only when the change creates a concrete, actionable risk or removes an important safety check.
+- Do not invoke built-in `codex review`, nested reviewers, or reviewer panels from inside the review. The helper builds one bundle, calls one selected engine, validates one structured result, and stops.
+- Stop as soon as the helper exits 0 with no accepted/actionable findings. Do not run an extra review just to get a nicer "clean" line, a second opinion, or clearer closeout wording.
+- Treat the helper's successful exit plus absence of actionable findings as the clean review result, even if the underlying Codex CLI output is terse.
+- If rejecting a finding as intentional/not worth fixing, add a brief inline code comment only when it explains a real invariant or ownership decision that future reviewers should know.
+- If `gh`/Gitcrawl reports `database disk image is malformed`, run `gitcrawl doctor --json` once to let the portable cache repair before retrying review; do not bypass the shim unless repair fails and freshness requires live GitHub.
+- If Gitcrawl reports a portable manifest mismatch, source/runtime DB health error, or stale portable-store checkout, run `gitcrawl doctor --json` and inspect `source_db_health`, `runtime_db_health`, and `portable_store_status` before falling back to live GitHub.
+- Do not push just to review. Push only when the user requested push/ship/PR update.
+
+## Pick Target
+
+Dirty local work:
+
+```bash
+<autoreview-helper> --mode local
+```
+
+Use this only when the patch is actually unstaged/staged/untracked in the
+current checkout. For committed, pushed, or PR work, point the helper at the commit
+or branch diff instead; do not force `--mode local` / `--uncommitted` just
+because the helper docs mention dirty work first. A clean local review
+only proves there is no local patch.
+
+Branch/PR work:
+
+```bash
+<autoreview-helper> --mode branch --base origin/main
+```
+
+Optional review context is first-class:
+
+```bash
+<autoreview-helper> --mode branch --base origin/main --prompt-file /tmp/review-notes.md --dataset /tmp/evidence.json
+```
+
+If an open PR exists, use its actual base:
+
+```bash
+base=$(gh pr view --json baseRefName --jq .baseRefName)
+<autoreview-helper> --mode branch --base "origin/$base"
+```
+
+Committed single change:
+
+```bash
+<autoreview-helper> --mode commit --commit HEAD
+```
+
+or with the helper:
+
+```bash
+/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode commit --commit HEAD
+```
+
+Use commit review for already-landed or already-pushed work on `main`. Reviewing
+clean `main` against `origin/main` is usually an empty diff after push. For a
+small stack, review each commit explicitly or review the branch before merging
+with `--base`.
+
+## Parallel Closeout
+
+Format first if formatting can change line locations. Then it is OK to run tests and review in parallel:
+
+```bash
+scripts/autoreview --parallel-tests "<focused test command>"
+```
+
+Tradeoff: tests may force code changes that stale the review. If tests or review lead to code edits, rerun the affected tests and rerun review until no accepted/actionable findings remain. Once that rerun exits cleanly, stop; do not spend another long review cycle on redundant confirmation.
+
+## Context Efficiency
+
+Run the helper directly so target selection, engine choice, structured validation, and exit status all stay in one path. If output is noisy, summarize the completed helper output after it returns; do not ask another agent or reviewer to rerun the review.
+
+## Helper
+
+OpenClaw repo-local helper:
+
+```bash
+.agents/skills/autoreview/scripts/autoreview --help
+```
+
+`agent-scripts` checkout helper:
+
+```bash
+skills/autoreview/scripts/autoreview --help
+```
+
+Global helper from `agent-scripts`:
+
+```bash
+~/.codex/skills/agent-scripts/autoreview/scripts/autoreview --help
+```
+
+If installed from `agent-scripts`, path is:
+
+```bash
+/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --help
+```
+
+The helper:
+
+- chooses dirty local changes first
+- otherwise uses current PR base if `gh pr view` works
+- otherwise uses `origin/main` for non-main branches
+- supports `--engine codex`, `claude`, `droid`, `copilot`, `pi`, and `opencode`; default is `AUTOREVIEW_ENGINE` or `codex`; Codex should remain the default when nothing is set
+- `--engine pi` requires an explicit `--model` because the helper isolates Pi's config directory during review
+- use `--mode commit --commit <ref>` for already-committed work, especially clean `main` after landing
+- should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing
+- writes only to stdout unless `--output` or `--json-output` is set
+- supports `--dry-run`, `--parallel-tests`, `--prompt`, `--prompt-file`, `--dataset`, `--no-tools`, `--no-web-search`, and commit refs
+- allows read-only tools and web search by default where the selected CLI supports them; forbids nested review in the prompt; Codex is run through `codex exec` with read-only sandbox and structured output
+- prints `autoreview clean: no accepted/actionable findings reported` when the selected review command exits 0
+- exits nonzero when accepted/actionable findings are present
+
+## Final Report
+
+Include:
+
+- review command used
+- tests/proof run
+- findings accepted/rejected, briefly why
+- the clean review result from the final helper/review run, or why a remaining finding was consciously rejected
+
+Do not run another review solely to improve the final report wording. If the final helper run exited 0 and produced no accepted/actionable findings, report that exact run as clean.
diff --git a/.agents/skills/autoreview/scripts/autoreview b/.agents/skills/autoreview/scripts/autoreview
new file mode 100755
index 0000000..b0d06c6
--- /dev/null
+++ b/.agents/skills/autoreview/scripts/autoreview
@@ -0,0 +1,892 @@
+#!/usr/bin/env python3
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import re
+import subprocess
+import sys
+import tempfile
+import textwrap
+import time
+from pathlib import Path
+from typing import Any
+
+
+SCHEMA: dict[str, Any] = {
+    "type": "object",
+    "additionalProperties": False,
+    "required": [
+        "findings",
+        "overall_correctness",
+        "overall_explanation",
+        "overall_confidence",
+    ],
+    "properties": {
+        "findings": {
+            "type": "array",
+            "items": {
+                "type": "object",
+                "additionalProperties": False,
+                "required": [
+                    "title",
+                    "body",
+                    "priority",
+                    "confidence",
+                    "category",
+                    "code_location",
+                ],
+                "properties": {
+                    "title": {"type": "string", "minLength": 1, "maxLength": 140},
+                    "body": {"type": "string", "minLength": 1, "maxLength": 2000},
+                    "priority": {"type": "string", "enum": ["P0", "P1", "P2", "P3"]},
+                    "confidence": {"type": "number", "minimum": 0, "maximum": 1},
+                    "category": {
+                        "type": "string",
+                        "enum": ["bug", "security", "regression", "test_gap", "maintainability"],
+                    },
+                    "code_location": {
+                        "type": "object",
+                        "additionalProperties": False,
+                        "required": ["file_path", "line"],
+                        "properties": {
+                            "file_path": {"type": "string", "minLength": 1},
+                            "line": {"type": "integer", "minimum": 1},
+                        },
+                    },
+                },
+            },
+        },
+        "overall_correctness": {
+            "type": "string",
+            "enum": ["patch is correct", "patch is incorrect"],
+        },
+        "overall_explanation": {"type": "string", "minLength": 1, "maxLength": 3000},
+        "overall_confidence": {"type": "number", "minimum": 0, "maximum": 1},
+    },
+}
+
+
+def run(
+    args: list[str],
+    cwd: Path,
+    *,
+    input_text: str | None = None,
+    env: dict[str, str] | None = None,
+    check: bool = True,
+) -> subprocess.CompletedProcess[str]:
+    result = subprocess.run(
+        args,
+        cwd=cwd,
+        input=input_text,
+        env=env,
+        text=True,
+        stdout=subprocess.PIPE,
+        stderr=subprocess.PIPE,
+    )
+    if check and result.returncode != 0:
+        cmd = " ".join(args)
+        raise SystemExit(f"command failed ({result.returncode}): {cmd}\n{result.stderr or result.stdout}")
+    return result
+
+
+def git(repo: Path, *args: str, check: bool = True) -> str:
+    return run(["git", *args], repo, check=check).stdout
+
+
+def repo_root() -> Path:
+    result = subprocess.run(
+        ["git", "rev-parse", "--show-toplevel"],
+        text=True,
+        stdout=subprocess.PIPE,
+        stderr=subprocess.PIPE,
+    )
+    if result.returncode != 0:
+        raise SystemExit("autoreview must run inside a git repository")
+    return Path(result.stdout.strip()).resolve()
+
+
+def current_branch(repo: Path) -> str:
+    return git(repo, "branch", "--show-current", check=False).strip() or "detached"
+
+
+def is_dirty(repo: Path) -> bool:
+    return bool(git(repo, "status", "--porcelain").strip())
+
+
+def choose_target(repo: Path, mode: str, base_ref: str | None) -> tuple[str, str | None]:
+    branch = current_branch(repo)
+    if mode == "local" or (mode == "auto" and is_dirty(repo)):
+        return "local", None
+    if mode == "commit":
+        return "commit", None
+    if mode == "branch" or (mode == "auto" and branch != "main"):
+        return "branch", base_ref or detect_pr_base(repo) or "origin/main"
+    raise SystemExit("no review target: clean main checkout and no forced mode")
+
+
+def detect_pr_base(repo: Path) -> str | None:
+    if not shutil_which("gh"):
+        return None
+    result = run(["gh", "pr", "view", "--json", "baseRefName", "--jq", ".baseRefName"], repo, check=False)
+    base = result.stdout.strip()
+    return f"origin/{base}" if result.returncode == 0 and base else None
+
+
+def shutil_which(name: str) -> str | None:
+    for part in os.environ.get("PATH", "").split(os.pathsep):
+        candidate = Path(part) / name
+        if candidate.exists() and os.access(candidate, os.X_OK):
+            return str(candidate)
+    return None
+
+
+def bounded(text: str, limit: int = 180_000) -> str:
+    if len(text) <= limit:
+        return text
+    return text[:limit] + f"\n\n[truncated at {limit} characters]\n"
+
+
+def read_text(path: Path, limit: int = 40_000) -> str:
+    try:
+        data = path.read_bytes()
+    except OSError as exc:
+        return f"[unreadable: {exc}]"
+    if b"\0" in data:
+        return "[binary file omitted]"
+    text = data.decode("utf-8", errors="replace")
+    return bounded(text, limit)
+
+
+def local_bundle(repo: Path) -> str:
+    parts = [
+        "# Git Status",
+        git(repo, "status", "--short"),
+        "# Staged Diff",
+        git(repo, "diff", "--cached", "--stat"),
+        bounded(git(repo, "diff", "--cached", "--patch", "--find-renames")),
+        "# Unstaged Diff",
+        git(repo, "diff", "--stat"),
+        bounded(git(repo, "diff", "--patch", "--find-renames")),
+    ]
+    untracked = [line for line in git(repo, "ls-files", "--others", "--exclude-standard").splitlines() if line]
+    if untracked:
+        parts.append("# Untracked Files")
+        for rel in untracked:
+            path = repo / rel
+            parts.append(f"## {rel}\n{read_text(path)}")
+    return "\n\n".join(parts)
+
+
+def branch_bundle(repo: Path, base_ref: str) -> str:
+    git(repo, "fetch", "origin", "--quiet", check=False)
+    return "\n\n".join(
+        [
+            "# Branch Diff",
+            f"base: {base_ref}",
+            git(repo, "diff", "--stat", f"{base_ref}...HEAD"),
+            bounded(git(repo, "diff", "--patch", "--find-renames", f"{base_ref}...HEAD")),
+        ]
+    )
+
+
+def commit_bundle(repo: Path, commit_ref: str) -> str:
+    return "\n\n".join(
+        [
+            "# Commit Diff",
+            f"commit: {commit_ref}",
+            git(repo, "show", "--stat", "--format=fuller", commit_ref),
+            bounded(git(repo, "show", "--patch", "--find-renames", "--format=fuller", commit_ref)),
+        ]
+    )
+
+
+def review_paths(repo: Path, target: str, target_ref: str | None, commit_ref: str) -> set[str]:
+    names: set[str] = set()
+    if target == "local":
+        sources = [
+            git(repo, "diff", "--name-only", "--cached"),
+            git(repo, "diff", "--name-only"),
+            git(repo, "ls-files", "--others", "--exclude-standard"),
+        ]
+    elif target == "branch":
+        assert target_ref
+        sources = [git(repo, "diff", "--name-only", f"{target_ref}...HEAD")]
+    else:
+        sources = [git(repo, "show", "--name-only", "--format=", commit_ref)]
+    for source in sources:
+        for line in source.splitlines():
+            path = line.strip()
+            if path:
+                names.add(path)
+    return names
+
+
+def load_extra_prompt(args: argparse.Namespace) -> str:
+    chunks: list[str] = []
+    for value in args.prompt or []:
+        chunks.append(value)
+    for path in args.prompt_file or []:
+        chunks.append(Path(path).read_text())
+    return "\n\n".join(chunks)
+
+
+def load_datasets(args: argparse.Namespace) -> str:
+    chunks: list[str] = []
+    for spec in args.dataset or []:
+        path = Path(spec)
+        if path.is_dir():
+            raise SystemExit(f"--dataset must be a file, got directory: {path}")
+        chunks.append(f"# Dataset: {path}\n{read_text(path)}")
+    return "\n\n".join(chunks)
+
+
+def build_prompt(repo: Path, target: str, target_ref: str | None, bundle: str, extra_prompt: str, datasets: str) -> str:
+    target_line = f"{target} {target_ref}" if target_ref else target
+    return textwrap.dedent(
+        f"""
+        You are a senior code reviewer. Review the provided git change bundle only.
+
+        Hard rules:
+        - Return exactly one JSON object and nothing else. Do not wrap it in Markdown.
+        - The JSON object must match this schema exactly:
+        {json.dumps(SCHEMA, indent=2)}
+        - Do not modify files.
+        - Do not invoke nested reviewers or review tools.
+        - Forbidden nested review commands include: codex review, autoreview, claude review, oracle review.
+        - You may use read-only tools and web search to inspect files, dependency contracts, upstream docs, current behavior, and security implications.
+        - Shell commands, if available, must be read-only inspection commands. Do not run tests, formatters, package installs, generators, network mutation commands, git mutation commands, or commands that write files.
+        - Report only actionable defects introduced or exposed by this change.
+        - Prefer high-signal findings over style feedback.
+        - Include security findings: injection, secret leaks, authz/authn bypass, path traversal, unsafe deserialization, unsafe filesystem or shell use, privacy leaks, and credential handling.
+        - Do not reject legitimate functionality merely because it touches shell, filesystem, network, auth, or sensitive data. Report a security finding only when the patch creates a concrete exploitable risk, removes an important safety check, or lacks validation at a trust boundary.
+        - For each finding, use the smallest file/line location that demonstrates the issue.
+        - If there are no actionable findings, return an empty findings array and mark the patch correct.
+
+        Review target: {target_line}
+        Repository: {repo}
+
+        {extra_prompt}
+
+        {datasets}
+
+        # Change Bundle
+        {bundle}
+        """
+    ).strip()
+
+
+def write_json_temp(data: dict[str, Any]) -> Path:
+    handle = tempfile.NamedTemporaryFile("w", suffix=".json", delete=False)
+    with handle:
+        json.dump(data, handle)
+    return Path(handle.name)
+
+
+def run_codex(args: argparse.Namespace, repo: Path, prompt: str) -> str:
+    if not args.tools:
+        raise SystemExit("--no-tools is not supported by the Codex engine; use --engine claude --no-tools for a no-tools run")
+    schema_path = write_json_temp(SCHEMA)
+    output_path = Path(tempfile.NamedTemporaryFile("w", suffix=".json", delete=False).name)
+    cmd = [args.codex_bin, "--ask-for-approval", "never"]
+    if args.web_search:
+        cmd.append("--search")
+    if args.model:
+        cmd.extend(["--model", args.model])
+    cmd.extend(
+        [
+            "exec",
+            "--ephemeral",
+            "-C",
+            str(repo),
+            "-s",
+            "read-only",
+            "--output-schema",
+            str(schema_path),
+            "--output-last-message",
+            str(output_path),
+            "-",
+        ]
+    )
+    result = run(cmd, repo, input_text=prompt, check=False)
+    try:
+        output = output_path.read_text()
+    finally:
+        schema_path.unlink(missing_ok=True)
+        output_path.unlink(missing_ok=True)
+    if result.returncode != 0:
+        raise SystemExit(f"codex engine failed ({result.returncode})\n{result.stderr or result.stdout}")
+    return output or result.stdout
+
+
+def run_claude(args: argparse.Namespace, repo: Path, prompt: str) -> str:
+    cmd = [
+        args.claude_bin,
+        "--print",
+        "--no-session-persistence",
+        "--output-format",
+        "json",
+        "--json-schema",
+        json.dumps(SCHEMA),
+    ]
+    if args.tools:
+        cmd.extend(["--allowedTools", claude_allowed_tools(args)])
+    else:
+        cmd.extend(["--tools", ""])
+    if args.model:
+        cmd.extend(["--model", args.model])
+    result = run(cmd, repo, input_text=prompt, check=False)
+    if result.returncode != 0:
+        raise SystemExit(f"claude engine failed ({result.returncode})\n{result.stderr or result.stdout}")
+    return result.stdout
+
+
+def run_droid(args: argparse.Namespace, repo: Path, prompt: str) -> str:
+    prompt_path = Path(tempfile.NamedTemporaryFile("w", suffix=".txt", delete=False).name)
+    prompt_path.write_text(prompt)
+    cmd = [
+        args.droid_bin,
+        "exec",
+        "--cwd",
+        str(repo),
+        "--output-format",
+        "json",
+        "-f",
+        str(prompt_path),
+    ]
+    if args.model:
+        cmd.extend(["--model", args.model])
+    if not args.tools:
+        cmd.extend(["--disabled-tools", "*"])
+    result = run(cmd, repo, check=False)
+    prompt_path.unlink(missing_ok=True)
+    if result.returncode != 0:
+        raise SystemExit(f"droid engine failed ({result.returncode})\n{result.stderr or result.stdout}")
+    return result.stdout
+
+
+def run_copilot(args: argparse.Namespace, repo: Path, prompt: str) -> str:
+    if not args.tools:
+        raise SystemExit("--no-tools is not supported by the copilot engine; copilot requires a read-only file view tool to load the review bundle without exposing it in argv")
+    with tempfile.TemporaryDirectory(prefix="autoreview-copilot.") as tempdir:
+        prompt_path = Path(tempdir) / "prompt.txt"
+        prompt_path.write_text(prompt)
+        os.chmod(prompt_path, 0o600)
+        cmd = [
+            args.copilot_bin,
+            "-C",
+            tempdir,
+            "-p",
+            "Read ./prompt.txt and follow it exactly. Return only the requested JSON object.",
+            "--output-format",
+            "json",
+            "--stream",
+            "off",
+            "--no-ask-user",
+            "--disable-builtin-mcps",
+        ]
+        if args.model:
+            cmd.extend(["--model", args.model])
+        cmd.extend(
+            [
+                "--available-tools=read_agent,rg,view,web_fetch",
+                "--allow-tool=read_agent",
+                "--allow-tool=rg",
+                "--allow-tool=view",
+                "--allow-tool=web_fetch",
+            ]
+        )
+        if args.web_search:
+            cmd.append("--allow-all-urls")
+        result = run(cmd, Path(tempdir), check=False)
+    if result.returncode != 0:
+        raise SystemExit(f"copilot engine failed ({result.returncode})\n{result.stderr or result.stdout}")
+    return result.stdout
+
+
+def run_pi(args: argparse.Namespace, repo: Path, prompt: str) -> str:
+    if not args.tools:
+        raise SystemExit("--no-tools is not supported by the pi engine; use --tools read-only allowlist for review")
+    if not args.model:
+        raise SystemExit("--engine pi requires --model because autoreview isolates PI_CODING_AGENT_DIR from user settings")
+    with tempfile.TemporaryDirectory(prefix="autoreview-pi.") as tempdir:
+        temp = Path(tempdir)
+        prompt_path = temp / "prompt.txt"
+        prompt_path.write_text(prompt)
+        os.chmod(prompt_path, 0o600)
+        env = os.environ.copy()
+        agent_dir = temp / "agent"
+        agent_dir.mkdir()
+        env["PI_CODING_AGENT_DIR"] = str(agent_dir)
+        env["PI_CODING_AGENT_SESSION_DIR"] = str(temp / "sessions")
+        env["PI_TELEMETRY"] = "0"
+        cmd = [
+            args.pi_bin,
+            "--no-session",
+            "--no-context-files",
+            "--no-extensions",
+            "--no-skills",
+            "--no-prompt-templates",
+            "--no-themes",
+            "--tools",
+            pi_readonly_tools(args),
+            "--mode",
+            "json",
+        ]
+        if args.model:
+            cmd.extend(["--model", args.model])
+        cmd.extend(["-p", f"@{prompt_path}", "Read the attached review prompt and follow it exactly."])
+        result = run(cmd, repo, env=env, check=False)
+    if result.returncode != 0:
+        raise SystemExit(f"pi engine failed ({result.returncode})\n{result.stderr or result.stdout}")
+    return result.stdout
+
+
+def run_opencode(args: argparse.Namespace, repo: Path, prompt: str) -> str:
+    if not args.tools:
+        raise SystemExit("--no-tools is not supported by the opencode engine; opencode requires read-only tools to load the review bundle")
+    with tempfile.TemporaryDirectory(prefix="autoreview-opencode.") as tempdir:
+        temp = Path(tempdir)
+        config_dir = temp / "config"
+        config_dir.mkdir()
+        prompt_path = temp / "prompt.txt"
+        prompt_path.write_text(prompt)
+        os.chmod(prompt_path, 0o600)
+        env = os.environ.copy()
+        env.update(
+            {
+                "OPENCODE_CONFIG_DIR": str(config_dir),
+                "OPENCODE_CONFIG_CONTENT": json.dumps(opencode_review_config(args)),
+                "OPENCODE_DISABLE_PROJECT_CONFIG": "1",
+                "OPENCODE_PURE": "1",
+                "OPENCODE_DISABLE_AUTOUPDATE": "1",
+                "OPENCODE_DISABLE_AUTOCOMPACT": "1",
+                "OPENCODE_DISABLE_MODELS_FETCH": "1",
+            }
+        )
+        cmd = [
+            args.opencode_bin,
+            "run",
+            "--pure",
+            "--format",
+            "json",
+            "--agent",
+            "autoreview",
+            "--dir",
+            str(repo),
+            "-f",
+            str(prompt_path),
+        ]
+        if args.model:
+            cmd.extend(["--model", args.model])
+        cmd.append("Read the attached review prompt and follow it exactly. Return only the requested JSON object.")
+        result = run(cmd, repo, env=env, check=False)
+    if result.returncode != 0:
+        raise SystemExit(f"opencode engine failed ({result.returncode})\n{result.stderr or result.stdout}")
+    return result.stdout
+
+
+def pi_readonly_tools(args: argparse.Namespace) -> str:
+    return "read,grep,find,ls"
+
+
+def opencode_review_config(args: argparse.Namespace) -> dict[str, Any]:
+    permission = {
+        "*": "deny",
+        "read": "allow",
+        "grep": "allow",
+        "glob": "allow",
+        "list": "allow",
+        "edit": "deny",
+        "bash": "deny",
+        "task": "deny",
+        "todowrite": "deny",
+        "question": "deny",
+        "repo_clone": "deny",
+        "repo_overview": "deny",
+        "skill": "deny",
+    }
+    if args.web_search:
+        permission.update(
+            {
+                "webfetch": "allow",
+                "websearch": "allow",
+            }
+        )
+    return {
+        "agent": {
+            "autoreview": {
+                "description": "Read-only structured code review agent",
+                "mode": "primary",
+                "steps": 8,
+                "permission": permission,
+            }
+        }
+    }
+
+
+def claude_allowed_tools(args: argparse.Namespace) -> str:
+    tools = [tool.strip() for tool in args.claude_allowed_tools.split(",") if tool.strip()]
+    if not args.web_search:
+        tools = [tool for tool in tools if tool not in {"WebSearch", "WebFetch"}]
+    return ",".join(tools)
+
+
+def extract_json(text: str) -> dict[str, Any]:
+    stripped = text.strip()
+    if not stripped:
+        raise SystemExit("review engine returned empty output")
+    try:
+        parsed = json.loads(stripped)
+    except json.JSONDecodeError as exc:
+        fenced_report = parse_json_candidate(stripped)
+        if isinstance(fenced_report, dict) and "findings" in fenced_report:
+            return fenced_report
+        jsonl_report = extract_json_from_jsonl(stripped)
+        if jsonl_report:
+            return jsonl_report
+        raise SystemExit(f"review engine returned non-JSON output: {exc}\n{stripped[:2000]}")
+    if isinstance(parsed, dict) and "findings" in parsed:
+        return parsed
+    if isinstance(parsed, dict) and isinstance(parsed.get("structured_output"), dict):
+        return parsed["structured_output"]
+    if isinstance(parsed, dict) and isinstance(parsed.get("result"), str):
+        result_json = parse_json_candidate(parsed["result"])
+        if isinstance(result_json, dict) and "findings" in result_json:
+            return result_json
+        raise SystemExit(f"review engine result was not structured JSON:\n{parsed['result'][:2000]}")
+    jsonl_report = extract_json_from_jsonl(stripped)
+    if jsonl_report:
+        return jsonl_report
+    raise SystemExit(f"review engine returned unexpected JSON shape:\n{json.dumps(parsed)[:2000]}")
+
+
+def extract_json_from_jsonl(text: str) -> dict[str, Any] | None:
+    candidates: list[str] = []
+    assistant_stream: list[str] = []
+    for line in text.splitlines():
+        line = line.strip()
+        if not line:
+            continue
+        try:
+            event = json.loads(line)
+        except json.JSONDecodeError:
+            continue
+        if not isinstance(event, dict):
+            continue
+        if isinstance(event.get("text"), str):
+            candidates.append(event["text"])
+            assistant_stream.append(event["text"])
+        if isinstance(event.get("delta"), str):
+            assistant_stream.append(event["delta"])
+        part = event.get("part")
+        if isinstance(part, dict) and isinstance(part.get("text"), str):
+            candidates.append(part["text"])
+            assistant_stream.append(part["text"])
+        assistant_event = event.get("assistantMessageEvent")
+        if isinstance(assistant_event, dict):
+            if isinstance(assistant_event.get("content"), str):
+                candidates.append(assistant_event["content"])
+            if isinstance(assistant_event.get("delta"), str):
+                assistant_stream.append(assistant_event["delta"])
+            partial = assistant_event.get("partial")
+            if isinstance(partial, dict):
+                candidates.extend(extract_text_blocks(partial.get("content")))
+        data = event.get("data")
+        if isinstance(data, dict) and isinstance(data.get("content"), str):
+            candidates.append(data["content"])
+        if isinstance(event.get("result"), str):
+            candidates.append(event["result"])
+        message = event.get("message")
+        if isinstance(message, dict):
+            texts = extract_text_blocks(message.get("content"))
+            candidates.extend(texts)
+            if message.get("role") == "assistant":
+                assistant_stream.extend(texts)
+        messages = event.get("messages")
+        if isinstance(messages, list):
+            for item in messages:
+                if not isinstance(item, dict):
+                    continue
+                texts = extract_text_blocks(item.get("content"))
+                candidates.extend(texts)
+                if item.get("role") == "assistant":
+                    assistant_stream.extend(texts)
+    if assistant_stream:
+        candidates.append("".join(assistant_stream))
+    for candidate in reversed(candidates):
+        parsed = parse_json_candidate(candidate)
+        if isinstance(parsed, dict) and "findings" in parsed:
+            return parsed
+    return None
+
+
+def extract_text_blocks(value: Any) -> list[str]:
+    if isinstance(value, str):
+        return [value]
+    if not isinstance(value, list):
+        return []
+    result: list[str] = []
+    for item in value:
+        if isinstance(item, dict) and isinstance(item.get("text"), str):
+            result.append(item["text"])
+    return result
+
+
+def parse_json_candidate(text: str) -> Any | None:
+    stripped = text.strip()
+    if stripped.startswith("```"):
+        lines = stripped.splitlines()
+        if lines and lines[0].startswith("```") and lines[-1].strip() == "```":
+            stripped = "\n".join(lines[1:-1]).strip()
+    try:
+        parsed = json.loads(stripped)
+    except json.JSONDecodeError:
+        repaired = repair_invalid_json_escapes(stripped)
+        if repaired == stripped:
+            return None
+        try:
+            parsed = json.loads(repaired)
+        except json.JSONDecodeError:
+            return None
+    if isinstance(parsed, str) and parsed != text:
+        nested = parse_json_candidate(parsed)
+        return nested if nested is not None else parsed
+    return parsed
+
+
+def repair_invalid_json_escapes(text: str) -> str:
+    return re.sub(r'\\(?!["\\/bfnrtu])', "", text)
+
+
+def validate_report(
+    report: dict[str, Any],
+    repo: Path,
+    changed_paths: set[str],
+    required: list[str],
+    required_any: list[str],
+) -> None:
+    allowed_top = {"findings", "overall_correctness", "overall_explanation", "overall_confidence"}
+    extra_top = set(report) - allowed_top
+    if extra_top:
+        raise SystemExit(f"review JSON has unexpected top-level keys: {sorted(extra_top)}")
+    for key in SCHEMA["required"]:
+        if key not in report:
+            raise SystemExit(f"review JSON missing required key: {key}")
+    if not isinstance(report["findings"], list):
+        raise SystemExit("review JSON findings must be an array")
+    if report.get("overall_correctness") not in {"patch is correct", "patch is incorrect"}:
+        raise SystemExit(f"review JSON has invalid overall_correctness: {report.get('overall_correctness')}")
+    if not isinstance(report.get("overall_explanation"), str) or not report["overall_explanation"]:
+        raise SystemExit("review JSON overall_explanation must be a non-empty string")
+    if len(report["overall_explanation"]) > 3000:
+        raise SystemExit("review JSON overall_explanation is too long")
+    if not number_in_range(report.get("overall_confidence")):
+        raise SystemExit("review JSON overall_confidence must be numeric")
+    finding_text = ""
+    for index, finding in enumerate(report["findings"]):
+        if not isinstance(finding, dict):
+            raise SystemExit(f"finding {index} must be an object")
+        allowed_finding = {"title", "body", "priority", "confidence", "category", "code_location"}
+        extra_finding = set(finding) - allowed_finding
+        if extra_finding:
+            raise SystemExit(f"finding {index} has unexpected keys: {sorted(extra_finding)}")
+        for key in allowed_finding:
+            if key not in finding:
+                raise SystemExit(f"finding {index} missing required key: {key}")
+        title = finding.get("title")
+        if not isinstance(title, str) or not title or len(title) > 140:
+            raise SystemExit(f"finding {index} has invalid title")
+        body = finding.get("body")
+        if not isinstance(body, str) or not body or len(body) > 2000:
+            raise SystemExit(f"finding {index} has invalid body")
+        priority = finding.get("priority")
+        if priority not in {"P0", "P1", "P2", "P3"}:
+            raise SystemExit(f"finding {index} has invalid priority: {priority}")
+        if not number_in_range(finding.get("confidence")):
+            raise SystemExit(f"finding {index} has invalid confidence")
+        category = finding.get("category")
+        if category not in {"bug", "security", "regression", "test_gap", "maintainability"}:
+            raise SystemExit(f"finding {index} has invalid category: {category}")
+        location = finding.get("code_location")
+        if not isinstance(location, dict):
+            raise SystemExit(f"finding {index} missing code_location")
+        rel = str(location.get("file_path", "")).strip()
+        line = location.get("line")
+        if not rel or not isinstance(line, int) or line < 1:
+            raise SystemExit(f"finding {index} has invalid location: {location}")
+        if Path(rel).is_absolute() or ".." in Path(rel).parts:
+            raise SystemExit(f"finding {index} uses invalid file path: {rel}")
+        if rel not in changed_paths:
+            raise SystemExit(f"finding {index} points to a file outside the reviewed change: {rel}")
+        finding_text += "\n" + json.dumps(finding, sort_keys=True)
+    haystack = finding_text.lower()
+    for needle in required:
+        if needle.lower() not in haystack:
+            raise SystemExit(f"required finding text not found: {needle}")
+    for group in required_any:
+        needles = [needle.strip().lower() for needle in group.split(",") if needle.strip()]
+        if needles and not any(needle in haystack for needle in needles):
+            raise SystemExit(f"required finding text not found; need one of: {', '.join(needles)}")
+
+
+def number_in_range(value: Any) -> bool:
+    return isinstance(value, (int, float)) and not isinstance(value, bool) and 0 <= value <= 1
+
+
+def print_report(report: dict[str, Any]) -> None:
+    findings = report["findings"]
+    if findings:
+        print(f"autoreview findings: {len(findings)}")
+    elif report["overall_correctness"] == "patch is incorrect":
+        print("autoreview verdict: patch is incorrect without discrete findings")
+    else:
+        print("autoreview clean: no accepted/actionable findings reported")
+    for finding in findings:
+        loc = finding["code_location"]
+        print(f"[{finding['priority']}] {finding['title']}")
+        print(f"{loc['file_path']}:{loc['line']}")
+        print(f"{finding['body']}")
+        print()
+    print(f"overall: {report['overall_correctness']} ({report['overall_confidence']})")
+    print(report["overall_explanation"])
+
+
+def start_parallel_tests(command: str, repo: Path) -> tuple[subprocess.Popen, float]:
+    print(f"tests: {command}")
+    return subprocess.Popen(command, cwd=repo, shell=True), time.time()
+
+
+def finish_parallel_tests(proc: subprocess.Popen, started: float) -> int:
+    proc.wait()
+    print(f"tests exit: {proc.returncode} after {int(time.time() - started)}s")
+    return int(proc.returncode or 0)
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description="Bundle-driven AI code review.")
+    parser.add_argument("--mode", choices=["auto", "local", "branch", "commit"], default="auto")
+    parser.add_argument("--base")
+    parser.add_argument("--commit", default="HEAD")
+    parser.add_argument("--engine", choices=["codex", "claude", "droid", "copilot", "pi", "opencode"], default=os.environ.get("AUTOREVIEW_ENGINE", "codex"))
+    parser.add_argument("--model")
+    parser.add_argument("--codex-bin", default=os.environ.get("CODEX_BIN", "codex"))
+    parser.add_argument("--claude-bin", default=os.environ.get("CLAUDE_BIN", "claude"))
+    parser.add_argument("--droid-bin", default=os.environ.get("DROID_BIN", "droid"))
+    parser.add_argument("--copilot-bin", default=os.environ.get("COPILOT_BIN", "copilot"))
+    parser.add_argument("--pi-bin", default=os.environ.get("PI_BIN", "pi"))
+    parser.add_argument("--opencode-bin", default=os.environ.get("OPENCODE_BIN", "opencode"))
+    parser.add_argument("--no-tools", dest="tools", action="store_false", default=True, help="Disable tools for engines that support it. Codex, copilot, pi, and opencode reject no-tools review.")
+    parser.add_argument("--no-web-search", dest="web_search", action="store_false", default=True)
+    parser.add_argument(
+        "--claude-allowed-tools",
+        default=os.environ.get(
+            "AUTOREVIEW_CLAUDE_TOOLS",
+            "Read,Grep,Glob,WebSearch,WebFetch",
+        ),
+    )
+    parser.add_argument("--prompt", action="append", help="Additional review instruction text.")
+    parser.add_argument("--prompt-file", action="append", help="Additional review instruction file.")
+    parser.add_argument("--dataset", action="append", help="Extra evidence file to include in the review bundle.")
+    parser.add_argument("--output", help="Write human output to a file as well as stdout.")
+    parser.add_argument("--json-output", help="Write validated structured review JSON.")
+    parser.add_argument("--parallel-tests", help="Run a test command concurrently with review; failure fails the helper.")
+    parser.add_argument("--require-finding", action="append", default=[], help="Require finding text to contain this substring.")
+    parser.add_argument("--require-any-finding", action="append", default=[], help="Require finding text to contain at least one comma-separated substring.")
+    parser.add_argument("--expect-findings", action="store_true", help="Treat findings as success; for harness acceptance tests.")
+    parser.add_argument("--dry-run", action="store_true")
+    args = parser.parse_args()
+    if args.engine not in {"codex", "claude", "droid", "copilot", "pi", "opencode"}:
+        raise SystemExit(f"invalid --engine/AUTOREVIEW_ENGINE: {args.engine}")
+    return args
+
+
+def run_engine(args: argparse.Namespace, repo: Path, prompt: str) -> str:
+    if args.engine == "codex":
+        return run_codex(args, repo, prompt)
+    if args.engine == "claude":
+        return run_claude(args, repo, prompt)
+    if args.engine == "droid":
+        return run_droid(args, repo, prompt)
+    if args.engine == "copilot":
+        return run_copilot(args, repo, prompt)
+    if args.engine == "pi":
+        return run_pi(args, repo, prompt)
+    if args.engine == "opencode":
+        return run_opencode(args, repo, prompt)
+    raise SystemExit(f"unsupported engine: {args.engine}")
+
+
+def main() -> int:
+    args = parse_args()
+    repo = repo_root()
+    target, target_ref = choose_target(repo, args.mode, args.base)
+    print(f"autoreview target: {target}")
+    print(f"branch: {current_branch(repo)}")
+    print(f"engine: {args.engine}")
+    print(f"tools: {'on' if args.tools else 'off'}")
+    print(f"web_search: {'on' if args.web_search else 'off'}")
+    display_ref = args.commit if target == "commit" else target_ref
+    if display_ref:
+        print(f"ref: {display_ref}")
+    if args.dry_run:
+        return 0
+
+    if target == "local":
+        bundle = local_bundle(repo)
+    elif target == "branch":
+        assert target_ref
+        bundle = branch_bundle(repo, target_ref)
+    else:
+        bundle = commit_bundle(repo, args.commit)
+        target_ref = args.commit
+    prompt = build_prompt(repo, target, target_ref, bundle, load_extra_prompt(args), load_datasets(args))
+    changed_paths = review_paths(repo, target, target_ref, args.commit)
+    print(f"bundle: {len(prompt)} chars")
+
+    tests_proc: tuple[subprocess.Popen, float] | None = None
+    if args.parallel_tests:
+        tests_proc = start_parallel_tests(args.parallel_tests, repo)
+    try:
+        raw = run_engine(args, repo, prompt)
+        report = extract_json(raw)
+        validate_report(report, repo, changed_paths, args.require_finding, args.require_any_finding)
+        if args.json_output:
+            Path(args.json_output).write_text(json.dumps(report, indent=2) + "\n")
+
+        if args.output:
+            original_stdout = sys.stdout
+            with Path(args.output).open("w") as handle:
+                sys.stdout = Tee(original_stdout, handle)
+                print_report(report)
+                sys.stdout = original_stdout
+        else:
+            print_report(report)
+    finally:
+        tests_status = finish_parallel_tests(*tests_proc) if tests_proc else 0
+
+    has_findings = bool(report["findings"])
+    overall_incorrect = report["overall_correctness"] == "patch is incorrect"
+    if tests_status != 0:
+        return 1
+    if args.expect_findings:
+        return 0 if has_findings else 1
+    return 1 if has_findings or overall_incorrect else 0
+
+
+class Tee:
+    def __init__(self, *streams: Any) -> None:
+        self.streams = streams
+
+    def write(self, data: str) -> None:
+        for stream in self.streams:
+            stream.write(data)
+
+    def flush(self) -> None:
+        for stream in self.streams:
+            stream.flush()
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/.agents/skills/crabbox/SKILL.md b/.agents/skills/crabbox/SKILL.md
new file mode 100644
index 0000000..483d831
--- /dev/null
+++ b/.agents/skills/crabbox/SKILL.md
@@ -0,0 +1,711 @@
+---
+name: crabbox
+description: Use the Crabbox wrapper for OpenClaw remote validation across Linux, macOS, Windows, and WSL2, including delegated Blacksmith Testbox proof. Report the actual provider and id.
+---
+
+# Crabbox
+
+Use the Crabbox wrapper when OpenClaw needs remote Linux proof for broad tests,
+CI-parity checks, secrets, hosted services, Docker/E2E/package lanes, warmed
+reusable boxes, sync timing, logs/results, cache inspection, or lease cleanup.
+
+Crabbox is the transport/orchestration surface. The actual backend can be:
+
+- brokered AWS Crabbox: direct provider, `provider=aws`, lease ids like
+  `cbx_...`, `syncDelegated=false`
+- Blacksmith Testbox through Crabbox: delegated provider,
+  `provider=blacksmith-testbox`, ids like `tbx_...`, `syncDelegated=true`
+
+For OpenClaw maintainer broad `pnpm` gates, Blacksmith Testbox through the
+Crabbox wrapper is acceptable and often preferred when the standing Testbox
+rules apply. Do not describe those runs as "AWS Crabbox"; report them as
+Testbox-through-Crabbox with the `tbx_...` id and Actions run.
+
+Use the repo `.crabbox.yaml` brokered AWS path when the task specifically needs
+direct AWS Crabbox behavior, persistent direct-provider leases, `--fresh-pr`,
+`--full-resync`, environment forwarding, capture/download support, or provider
+comparison. Use `--provider blacksmith-testbox` when the task needs OpenClaw
+maintainer Testbox proof, prepared CI environment, broad/heavy pnpm gates, or
+the user asks for Testbox/Blacksmith.
+
+## First Checks
+
+- Run from the repo root. Crabbox sync mirrors the current checkout.
+- Check the wrapper and providers before remote work:
+
+```sh
+command -v crabbox
+../crabbox/bin/crabbox --version
+pnpm crabbox:run -- --help | sed -n '1,120p'
+../crabbox/bin/crabbox desktop launch --help
+../crabbox/bin/crabbox webvnc --help
+```
+
+- OpenClaw scripts prefer `../crabbox/bin/crabbox` when present. The user PATH
+  shim can be stale.
+- Check `.crabbox.yaml` for direct-provider defaults. Omitting `--provider`
+  means brokered AWS today.
+- The brokered AWS default is a Linux developer image in `eu-west-1`; the repo
+  config pins hot `eu-west-1a/b/c` placement so Fast Snapshot Restore can apply.
+  If warmup drifts well past the minute-scale path, verify image promotion,
+  region/AZ placement, and FSR state before blaming OpenClaw.
+- For broad OpenClaw maintainer `pnpm` gates, prefer the repo wrapper with
+  `--provider blacksmith-testbox` or the repo Testbox helpers when the standing
+  Testbox policy applies.
+- Always report the actual provider and id. `cbx_...` means AWS Crabbox;
+  `tbx_...` means Blacksmith Testbox through Crabbox. If the output only says
+  `blacksmith testbox list`, use `blacksmith testbox list --all` before
+  concluding no box exists.
+- If a warm direct-provider lease smells stale, retry with `--full-resync`
+  (alias `--fresh-sync`) before replacing the lease. This resets the remote
+  workdir, skips the fingerprint fast path, reseeds Git when possible, and
+  uploads the checkout from scratch.
+- For live/provider bugs, use the configured secret workflow before downgrading
+  to mocks. Copy only the exact needed key into the remote process environment
+  for that one command. Do not print it, do not sync it as a repo file, and do
+  not leave it in remote shell history or logs. If no secret-safe injection path
+  is available, say true live provider auth is blocked instead of silently using
+  a fake key.
+- Prefer local targeted tests for tight edit loops. Broad gates belong remote.
+- Do not treat inherited shell env as operator intent. In particular,
+  `OPENCLAW_LOCAL_CHECK_MODE=throttled` from the local shell is not permission
+  to move broad `pnpm check:changed`, `pnpm test:changed`, full `pnpm test`, or
+  lint/typecheck fan-out onto the laptop.
+- Only use `OPENCLAW_LOCAL_CHECK_MODE=throttled|full` when the user explicitly
+  asks for local proof in the current task. If Testbox is queued or capacity is
+  constrained, report the blocker and keep only targeted local edit-loop checks
+  running.
+
+## macOS And Windows Targets
+
+Use these only when the task needs an existing non-Linux host. OpenClaw broad
+Linux validation uses the repo Crabbox config unless a provider is explicitly
+requested.
+
+Native brokered Windows is available for Windows-specific proof. Use the AWS
+developer image in `us-west-2` on demand; it has the expected OpenClaw developer
+toolchain and Docker image cache. Keep broad Linux gates on Linux/Testbox unless
+the bug is Windows-specific:
+
+```sh
+../crabbox/bin/crabbox warmup \
+  --provider aws \
+  --target windows \
+  --windows-mode normal \
+  --region us-west-2 \
+  --market on-demand \
+  --timing-json
+```
+
+The hydrate workflow assumes Docker should already be baked into Linux images
+and only installs it as a fallback. Do not add per-run Docker installs to proof
+commands unless the image probe shows Docker is actually missing.
+
+When the user explicitly asks for brokered macOS runners, use Crabbox AWS
+macOS only after confirming the deployed coordinator supports EC2 Mac host
+lifecycle/image routes and the operator has AWS EC2 Mac Dedicated Host quota
+and IAM. Prefer `CRABBOX_HOST_ID` for a known Crabbox-managed Dedicated Host,
+or run the no-spend preflight first:
+
+```sh
+crabbox admin hosts quota --provider aws --target macos --region eu-west-1 --type mac2.metal --json
+crabbox admin hosts allocate --provider aws --target macos --region eu-west-1 --type mac2.metal --dry-run --json
+CRABBOX_MACOS_TYPES=all scripts/macos-host-region-preflight.sh
+```
+
+Do not silently substitute AWS macOS for normal OpenClaw Linux proof. Report
+paid-host blockers as quota, IAM, coordinator deployment, or host availability
+instead of falling back to local macOS.
+
+Crabbox supports static SSH targets:
+
+```sh
+../crabbox/bin/crabbox run --provider ssh --target macos --static-host mac-studio.local -- xcodebuild test
+../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode normal --static-host win-dev.local -- pwsh -NoProfile -Command "dotnet test"
+../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode wsl2 --static-host win-dev.local -- pnpm test
+```
+
+- `target=macos` and `target=windows --windows-mode wsl2` use the POSIX SSH,
+  bash, Git, rsync, and tar contract.
+- Native Windows uses OpenSSH, PowerShell, Git, and tar; sync is manifest tar
+  archive transfer into `static.workRoot`. Direct native Windows runs support
+  `--script*`, `--env-from-profile`, `--preflight`, and PowerShell `--shell`.
+- `crabbox actions hydrate/register` are Linux-only today; use plain
+  `crabbox run` loops for static macOS and Windows hosts.
+- Live proof needs a reachable, operator-managed SSH host. Without one, verify
+  with `../crabbox/bin/crabbox run --help`, config/flag tests, and the Crabbox
+  Go test suite.
+
+## Direct Brokered AWS Backend
+
+Use this when the task needs direct AWS Crabbox semantics rather than the
+prepared Blacksmith Testbox CI environment.
+
+Changed gate:
+
+```sh
+pnpm crabbox:run -- \
+  --idle-timeout 90m \
+  --ttl 240m \
+  --timing-json \
+  --shell -- \
+  "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
+```
+
+Full suite:
+
+```sh
+pnpm crabbox:run -- \
+  --idle-timeout 90m \
+  --ttl 240m \
+  --timing-json \
+  --shell -- \
+  "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test"
+```
+
+Focused rerun:
+
+```sh
+pnpm crabbox:run -- \
+  --idle-timeout 90m \
+  --ttl 240m \
+  --timing-json \
+  --shell -- \
+  "env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test <path-or-filter>"
+```
+
+Read the JSON summary. Useful fields:
+
+- `provider`: `aws`
+- `leaseId`: `cbx_...`
+- `syncDelegated`: `false`
+- `commandPhases`: populated when the command prints `CRABBOX_PHASE:<name>`
+- `commandMs` / `totalMs`
+- `exitCode`
+
+Crabbox should stop one-shot AWS leases automatically after the run. Verify
+cleanup when a run fails, is interrupted, or the command output is unclear:
+
+```sh
+../crabbox/bin/crabbox list --provider aws
+```
+
+## Blacksmith Testbox Through Crabbox
+
+Use this for OpenClaw maintainer broad/heavy `pnpm` gates when the prepared CI
+environment is the right proof surface:
+
+```sh
+node scripts/crabbox-wrapper.mjs run \
+  --provider blacksmith-testbox \
+  --blacksmith-org openclaw \
+  --blacksmith-workflow .github/workflows/ci-check-testbox.yml \
+  --blacksmith-job check \
+  --blacksmith-ref main \
+  --idle-timeout 90m \
+  --ttl 240m \
+  --timing-json \
+  -- \
+  CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 OPENCLAW_TESTBOX=1 OPENCLAW_TESTBOX_REMOTE_RUN=1 pnpm check:changed
+```
+
+Read the JSON summary and the Testbox line. Useful fields:
+
+- `provider`: `blacksmith-testbox`
+- `leaseId`: `tbx_...`
+- `syncDelegated`: `true`
+- `syncPhases`: delegated/skipped because Blacksmith owns checkout/sync
+- Actions run URL/id from the Testbox output
+- `exitCode`
+
+`blacksmith testbox list` may hide hydrating or ready boxes. Use:
+
+```sh
+blacksmith testbox list --all
+blacksmith testbox status <tbx_id>
+```
+
+## Observability Flags
+
+Use these on debugging runs before inventing ad hoc logging:
+
+- `--preflight`: prints run context, workspace mode, SSH target, remote user/cwd,
+  and target-specific tool probes. Defaults cover `git`, `tar`, `node`, `npm`,
+  `corepack`, `pnpm`, `yarn`, `bun`, `docker`, plus POSIX
+  `sudo`/`apt`/`bubblewrap` and native Windows
+  `powershell`/`execution_policy`/`longpaths`/`temp`/`pwsh`. Add
+  `--preflight-tools node,bun,docker`, `CRABBOX_PREFLIGHT_TOOLS`, or repo
+  `run.preflightTools` to replace the list. `default` expands built-ins; `none`
+  prints only the workspace summary. Preflight is diagnostic only; install
+  toolchains through Actions hydration, images, devcontainer/Nix/mise/asdf, or
+  the run script. On `blacksmith-testbox`, this prints a delegated-unsupported
+  note because the workflow owns setup.
+- `CRABBOX_ENV_ALLOW=NAME,...`: forwards only listed local env vars for direct
+  providers and prints `set len=N secret=true` style summaries. On
+  `blacksmith-testbox`, env forwarding is unsupported; put secrets in the
+  Testbox workflow instead.
+- `--env-from-profile <file>` plus `--allow-env NAME`: loads simple
+  `export NAME=value` / `NAME=value` lines from a local profile without
+  executing it, then forwards only allowlisted names. `--allow-env` is
+  repeatable and comma-separated. Profile values override ambient allowlisted
+  env values for that run. Direct POSIX, WSL2, and native Windows runs are
+  supported; delegated providers are not. Crabbox probes the uploaded profile
+  remotely and prints redacted presence/length metadata before the command.
+- `--env-helper <name>`: with `--env-from-profile` on POSIX SSH targets,
+  persists `.crabbox/env/<name>` and `.crabbox/env/<name>.env` so follow-up
+  commands on the same lease can run through `./.crabbox/env/<name> <command>`.
+  Use only on leases you control; the profile stays until cleanup, lease reset,
+  or `--full-resync`.
+- `--script <file>` / `--script-stdin`: upload a local script into
+  `.crabbox/scripts/` and execute it on the remote box. Shebang scripts execute
+  directly on POSIX; scripts without a shebang run through `bash`. Native
+  Windows uploads run through Windows PowerShell, and Crabbox appends `.ps1`
+  when needed. Arguments after `--` become script args.
+- `--fresh-pr owner/repo#123|URL|number`: skip dirty local sync and create a
+  fresh remote checkout of the GitHub PR. Bare numbers use the current repo's
+  GitHub origin. Add `--apply-local-patch` only when the current local
+  `git diff --binary HEAD` should be applied on top of that PR checkout.
+- `--full-resync` / `--fresh-sync`: reset a stale direct-provider workdir
+  before syncing. Use after sync fingerprints look wrong, SSH times out before
+  sync, or rsync watchdog output suggests it. It is redundant with
+  `--fresh-pr`, incompatible with `--no-sync`, and unsupported by delegated
+  providers.
+- `--capture-stdout <path>` / `--capture-stderr <path>`: write remote streams to
+  local files and keep binary/noisy output out of retained logs. Parent
+  directories must already exist. These are direct-provider only.
+- `--capture-on-fail`: on non-zero direct-provider exits, downloads
+  `.crabbox/captures/*.tar.gz` with `test-results`, `playwright-report`,
+  `coverage`, JUnit XML, and nearby logs. Treat as secret-bearing until reviewed.
+- `--keep-on-failure`: leave a failed one-shot lease alive for live debugging
+  until idle/TTL expiry. Useful on direct providers and delegated one-shots.
+- `--timing-json`: final machine-readable timing. Add
+  `echo CRABBOX_PHASE:install`, `CRABBOX_PHASE:test`, etc. in long shell
+  commands; direct providers and Blacksmith Testbox both report them as
+  `commandPhases`.
+
+Live-provider debug template for direct AWS/Hetzner leases:
+
+```sh
+mkdir -p .crabbox/logs
+pnpm crabbox:run -- --provider aws \
+  --preflight \
+  --allow-env OPENAI_API_KEY,OPENAI_BASE_URL \
+  --timing-json \
+  --capture-stdout .crabbox/logs/live-provider.stdout.log \
+  --capture-stderr .crabbox/logs/live-provider.stderr.log \
+  --capture-on-fail \
+  --shell -- \
+  "echo CRABBOX_PHASE:install; pnpm install --frozen-lockfile; echo CRABBOX_PHASE:test; pnpm test:live"
+```
+
+Do not pass `--capture-*`, `--download`, `--checksum`, `--force-sync-large`, or
+`--sync-only` to delegated providers. Also do not pass `--script*`,
+`--fresh-pr`, `--full-resync`, or `--env-helper` there. Crabbox rejects these
+because the provider owns sync or command transport. `--keep-on-failure` is OK
+for delegated one-shots when you need to inspect a failed lease.
+
+## Efficient Bug E2E Verification
+
+Use the smallest Crabbox lane that proves the reported user path, not just the
+touched code. Aim for one after-fix E2E proof before commenting, closing, or
+opening a PR for a user-visible bug.
+
+When the user says "test in Crabbox", do not simply copy tests to the remote
+box and run them there. Crabbox is for remote real-scenario proof: copy or
+install OpenClaw as the user would, run the same setup/update/CLI/Gateway/API
+call that failed, and capture behavior from that entrypoint. For regressions or
+bug reports, prove the broken state first when feasible, then run the same
+scenario after the fix.
+
+Pick the lane by symptom:
+
+- Docker/setup/install bug: build a package tarball and run the matching
+  `scripts/e2e/*-docker.sh` or package script. This proves npm packaging,
+  install paths, runtime deps, config writes, and container behavior.
+- Provider/model/auth bug: prefer true live E2E. Use the configured secret
+  workflow, then inject the single needed key into Crabbox if needed. Scrub
+  unrelated provider env vars in the child command so interactive defaults do
+  not drift to another provider. If only a dummy key is used, label the proof
+  narrowly, e.g. "UI/install path only; live provider auth not exercised."
+- Channel delivery bug: use the channel Docker/live lane when available; include
+  setup, config, gateway start, send/receive or agent-turn proof, and redacted
+  logs.
+- Gateway/session/tool bug: prefer an end-to-end CLI or Gateway RPC command that
+  creates real state and inspects the resulting files/API output.
+- Pure parser/config bug: targeted tests may be enough, but still run a
+  Crabbox command when OS, package, Docker, secrets, or service lifecycle could
+  change behavior.
+
+Efficient flow:
+
+1. Reproduce or prove the pre-fix symptom from the real user-facing entrypoint
+   when feasible. If the issue cannot be reproduced, capture the exact command
+   and observed behavior instead.
+2. Patch locally and run narrow local tests for edit speed.
+3. Run one Crabbox E2E command that starts from the user-facing entrypoint:
+   package install, Docker setup, onboarding, channel add, gateway start, or
+   agent turn as appropriate.
+4. Record proof as: Testbox id, command, environment shape, redacted secret
+   source, and copied success/failure output.
+5. If the issue says "cannot reproduce", ask for the missing config/log fields
+   that would distinguish the tested path from the reporter's path.
+
+Keep it efficient:
+
+- Reuse existing E2E scripts and helper assertions before writing ad hoc shell.
+- Use `--script <file>` or `--script-stdin` for multi-line E2E commands instead
+  of quote-heavy `--shell` strings on direct SSH providers.
+- Use `--fresh-pr <pr>` when validating an upstream PR in isolation from the
+  local dirty tree. Add `--apply-local-patch` only when testing a local fixup on
+  top of that PR.
+- Use `--full-resync` before replacing a warmed direct-provider lease when the
+  remote workdir or sync fingerprint appears stale.
+- Use one-shot Crabbox for a single proof; use a reusable Testbox only when
+  several commands must share built images, installed packages, or live state.
+- Prefer `OPENCLAW_CURRENT_PACKAGE_TGZ` with Docker/package lanes when testing a
+  candidate tarball; prefer the repo's package helper instead of direct source
+  execution when the bug might be packaging/install related.
+- Keep secrets redacted. It is fine to report key presence, source, and length;
+  never print secret values.
+- Include `--timing-json` on broad or flaky runs when command duration or sync
+  behavior matters.
+
+Before/after PR proof on delegated Testbox:
+
+- For PRs that should prove "broken before, fixed after", compare base and PR
+  on the same Testbox when practical. Fetch both refs, create detached temp
+  worktrees under `/tmp`, install in each, then run the same harness twice.
+- Do not checkout base/PR refs in the synced repo root. Delegated Testbox sync
+  may leave the root dirty with local files; `git checkout` can abort or mix
+  proof state.
+- Temp harness files under `/tmp` do not resolve repo packages by default. Put
+  the harness inside the worktree, or in ESM use
+  `createRequire(path.join(process.cwd(), "package.json"))` before requiring
+  workspace deps such as `@lydell/node-pty`.
+- For full-screen TUI/CLI bugs, a PTY harness is stronger than helper-only
+  assertions. Use a real PTY, wait for visible lifecycle markers, send input,
+  then send control keys and assert process exit/stuck behavior.
+- When validating a rebased local branch before push, remember delegated sync
+  usually validates synced file content on a detached dirty checkout, not a
+  remote commit object. Record the local head SHA, changed files, Testbox id,
+  and final success markers; after pushing, ensure the pushed SHA has the same
+  file content.
+- If GitHub CI is still queued but the exact changed content passed Testbox
+  `pnpm check:changed`, `pnpm check:test-types`, and the real E2E proof, it is
+  reasonable to merge once required checks allow it. Note any still-running
+  unrelated shards in the proof comment instead of waiting forever.
+
+Interactive CLI/onboarding:
+
+- For full-screen or prompt-heavy CLI flows, run the target command inside tmux
+  on the Crabbox and drive it with `tmux send-keys`; capture proof with
+  `tmux capture-pane`, redacted through `sed`.
+- Prefer deterministic arrow navigation over search typing for Clack-style
+  searchable selects. Raw `send-keys -l openai` may not trigger filtering in a
+  tmux pane; inspect option order locally or on-box and send exact Down/Enter
+  sequences.
+- Isolate mutable state with `OPENCLAW_STATE_DIR=$(mktemp -d)`. Plugin npm
+  installs live under that state dir (`npm/node_modules/...`), not under
+  `OPENCLAW_CONFIG_DIR`. Verify downloads by checking the state dir, package
+  lock, and installed package metadata.
+- To test automatic setup installs against local package artifacts, use
+  `OPENCLAW_ALLOW_PLUGIN_INSTALL_OVERRIDES=1` plus
+  `OPENCLAW_PLUGIN_INSTALL_OVERRIDES='{"plugin-id":"npm-pack:/tmp/plugin.tgz"}'`.
+  Pack with `npm pack`, set an isolated `OPENCLAW_STATE_DIR`, and verify the
+  package under `npm/node_modules`. Overrides are test-only and must not be
+  treated as official/trusted-source installs.
+- For OpenAI/Codex onboarding proof, the useful markers are the UI line
+  `Installed Codex plugin`, `npm/node_modules/@openclaw/codex`, and the
+  package-lock entry showing the bundled `@openai/codex` dependency. A dummy
+  OpenAI-shaped key can prove only UI/install behavior; it is not live auth.
+
+## Reuse And Keepalive
+
+For most Crabbox calls, one-shot is enough. Use reuse only when you need
+multiple manual commands on the same hydrated box.
+
+If Crabbox returns a reusable id or you intentionally keep a lease:
+
+```sh
+pnpm crabbox:run -- --id <cbx_id-or-slug> --no-sync --timing-json --shell -- "pnpm test <path>"
+```
+
+Stop boxes you created before handoff:
+
+```sh
+pnpm crabbox:stop -- <id-or-slug>
+blacksmith testbox stop --id <tbx_id>
+```
+
+## Interactive Desktop And WebVNC
+
+Prefer WebVNC for human inspection because the browser portal can preload the
+lease VNC password and avoids a native VNC client's copy/paste/password dance.
+Use native `crabbox vnc` only when WebVNC is unavailable, the browser portal is
+broken, or the user explicitly wants a local VNC client.
+
+Common desktop flow:
+
+```sh
+../crabbox/bin/crabbox warmup --provider hetzner --desktop --browser --class standard --idle-timeout 60m --ttl 240m
+../crabbox/bin/crabbox desktop launch --provider hetzner --id <cbx_id-or-slug> --browser --url https://example.com --webvnc --open --take-control
+```
+
+Useful WebVNC commands:
+
+```sh
+../crabbox/bin/crabbox webvnc --provider hetzner --id <cbx_id-or-slug> --open --take-control
+../crabbox/bin/crabbox webvnc daemon start --provider hetzner --id <cbx_id-or-slug> --open --take-control
+../crabbox/bin/crabbox webvnc daemon status --provider hetzner --id <cbx_id-or-slug>
+../crabbox/bin/crabbox webvnc daemon stop --provider hetzner --id <cbx_id-or-slug>
+../crabbox/bin/crabbox webvnc status --provider hetzner --id <cbx_id-or-slug>
+../crabbox/bin/crabbox webvnc reset --provider hetzner --id <cbx_id-or-slug> --open --take-control
+../crabbox/bin/crabbox desktop doctor --provider hetzner --id <cbx_id-or-slug>
+../crabbox/bin/crabbox desktop click --provider hetzner --id <cbx_id-or-slug> --x 640 --y 420
+../crabbox/bin/crabbox desktop paste --provider hetzner --id <cbx_id-or-slug> --text "user@example.com"
+../crabbox/bin/crabbox desktop key --provider hetzner --id <cbx_id-or-slug> ctrl+l
+../crabbox/bin/crabbox artifacts collect --id <cbx_id-or-slug> --all --output artifacts/<slug>
+../crabbox/bin/crabbox artifacts publish --dir artifacts/<slug> --pr <number>
+```
+
+`desktop launch --webvnc --open` is usually the nicest one-shot: it starts the
+browser/app inside the visible session, bridges the lease into the authenticated
+WebVNC portal, and opens the portal. Keep browsers windowed for human QA; use
+`--fullscreen` only for capture/video workflows.
+For human handoff, include `--take-control` so the opened portal viewer gets
+keyboard/mouse control automatically instead of landing as an observer.
+
+Human handoff preflight:
+
+- Do not assume a visible desktop or launched browser means the repo CLI/app is
+  installed, built, or on the interactive terminal's `PATH`.
+- Before handing WebVNC to a human tester, prove the expected command from the
+  same kept lease and from a neutral directory such as `~`.
+- If the handoff needs repo-local code, sync/build/link it explicitly on that
+  lease. Source-tree CLIs often need build output before a symlink works.
+- Prefer a real `command -v <expected-command> && <expected-command> --version`
+  check over a repo-root-only `pnpm ...` command.
+
+Generic handoff repair pattern:
+
+```sh
+../crabbox/bin/crabbox run --id <cbx_id-or-slug> --full-resync --shell -- \
+  "set -euo pipefail
+   pnpm install --frozen-lockfile
+   pnpm build
+   sudo ln -sf \"\$PWD/<cli-entry>\" /usr/local/bin/<expected-command>
+   cd ~
+   command -v <expected-command>
+   <expected-command> --version"
+```
+
+## If Crabbox Fails
+
+Keep the fallback narrow. First decide whether the failure is Crabbox itself,
+the brokered AWS lease, Blacksmith/Testbox, repo hydration, sync, or the test
+command.
+
+Fast checks:
+
+```sh
+command -v crabbox
+../crabbox/bin/crabbox --version
+pnpm crabbox:run -- --help | sed -n '1,140p'
+../crabbox/bin/crabbox doctor
+command -v blacksmith
+blacksmith --version
+blacksmith testbox list
+```
+
+Common Crabbox-only failures:
+
+- Provider missing or old CLI: use `../crabbox/bin/crabbox` from the sibling
+  repo, or update/install Crabbox before retrying.
+- Bad local config: inspect `.crabbox.yaml`, `crabbox config show`, and
+  `crabbox whoami`; normal OpenClaw proof should use brokered AWS without
+  asking for cloud keys.
+- Slug/claim confusion: use the raw `cbx_...` / `tbx_...` id, or run one-shot
+  without `--id`.
+- Sync/timing bug: add `--debug --timing-json`; capture the final JSON and the
+  printed Actions URL. Large sync warnings now include top source directories
+  by file count and a hint to update `.crabboxignore` / `sync.exclude`; inspect
+  those before reaching for `--force-sync-large`. Quiet rsync watchdogs and SSH
+  timeouts now print `next_action=` hints; follow them, usually `--full-resync`
+  first and a fresh lease second.
+- Cleanup uncertainty: run `crabbox list --provider aws`; for explicit
+  Blacksmith runs, use `blacksmith testbox list` and stop only boxes you
+  created.
+- Testbox queued/capacity pressure: do not retry Blacksmith repeatedly. Rerun
+  once without `--provider` so `.crabbox.yaml` routes to brokered AWS, or report
+  the Blacksmith blocker if Testbox itself is the requested proof.
+
+If brokered AWS cannot dispatch, sync, attach, or stop, retry once with
+`--debug` and `--timing-json`:
+
+```sh
+pnpm crabbox:run -- --debug --timing-json -- \
+  CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed
+```
+
+Full suite:
+
+```sh
+pnpm crabbox:run -- --debug --timing-json -- \
+  CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test
+```
+
+Auth fallback, only when `blacksmith` says auth is missing:
+
+```sh
+blacksmith auth login --non-interactive --organization openclaw
+```
+
+Raw Blacksmith footguns:
+
+- Run from repo root. The CLI syncs the current directory.
+- Save the returned `tbx_...` id in the session.
+- Reuse that id for focused reruns; stop it before handoff.
+- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag.
+- Treat `blacksmith testbox list` as cleanup diagnostics, not a shared reusable
+  queue.
+
+Use Blacksmith only when the task is specifically about Testbox, brokered AWS
+is unavailable, or an explicit comparison is needed. If Blacksmith is down or
+quota-limited, do not keep probing it; stay on brokered AWS and note the
+delegated-provider outage.
+
+## Blacksmith Backend Notes
+
+Crabbox Blacksmith backend delegates setup to:
+
+- org: `openclaw`
+- workflow: `.github/workflows/ci-check-testbox.yml`
+- job: `check`
+- ref: `main` unless testing a branch/tag intentionally
+
+The hydration workflow owns checkout, Node/pnpm setup, dependency install,
+secrets, ready marker, and keepalive. Crabbox owns dispatch, sync, SSH command
+execution, timing, logs/results, and cleanup.
+
+Minimal Blacksmith-backed Crabbox run, from repo root:
+
+```sh
+pnpm crabbox:run -- --provider blacksmith-testbox --timing-json -- \
+  CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test:changed
+```
+
+Use direct Blacksmith only when Crabbox is the broken layer and you are
+isolating a Crabbox bug. Prefer direct `blacksmith testbox list` for cleanup
+diagnostics, not as a reusable work queue.
+
+Important Blacksmith footguns:
+
+- Always run from repo root. The CLI syncs the current directory.
+- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag.
+- If auth is missing and browser auth is acceptable:
+
+```sh
+blacksmith auth login --non-interactive --organization openclaw
+```
+
+## Brokered AWS
+
+Use AWS for normal OpenClaw remote proof. The repo `.crabbox.yaml` already
+selects brokered AWS, so omit `--provider` unless you are testing a different
+provider deliberately.
+
+```sh
+pnpm crabbox:warmup -- --class beast --market on-demand --idle-timeout 90m
+pnpm crabbox:hydrate -- --id <cbx_id-or-slug>
+pnpm crabbox:run -- --id <cbx_id-or-slug> --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
+pnpm crabbox:stop -- <cbx_id-or-slug>
+```
+
+Install/auth for owned Crabbox if needed:
+
+```sh
+brew install openclaw/tap/crabbox
+crabbox login --url https://crabbox.openclaw.ai --provider aws
+```
+
+New users should self-resolve broker auth before anyone asks for AWS keys:
+
+```sh
+crabbox config show
+crabbox doctor
+crabbox whoami
+```
+
+- If broker auth is missing, run `crabbox login --url https://crabbox.openclaw.ai --provider aws`.
+- If the CLI asks for `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, or AWS
+  profile setup during normal OpenClaw validation, assume the agent selected
+  the wrong path. Use brokered `crabbox login` or an existing brokered lease
+  before asking the user for cloud credentials.
+- Ask for AWS keys only for explicit direct-provider/account administration,
+  not for normal brokered OpenClaw proof.
+- Trusted automation may still use
+  `printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin`.
+
+macOS config lives at:
+
+```text
+~/Library/Application Support/crabbox/config.yaml
+```
+
+It should include `broker.url`, `broker.token`, and usually `provider: aws`
+for OpenClaw lanes. Let that config drive normal validation.
+
+### Interactive Desktop / WebVNC
+
+For human desktop demos, prefer `webvnc` over native `vnc` and keep the remote
+desktop visible/windowed. Do not fullscreen the remote browser or hide the XFCE
+panel/window chrome unless the explicit goal is video/capture output. After
+launch, verify a screenshot shows the desktop panel plus browser title bar. If
+Chrome is fullscreen, toggle it back with:
+
+```sh
+crabbox run --id <lease> --shell -- 'DISPLAY=:99 xdotool search --onlyvisible --class google-chrome windowactivate key F11'
+```
+
+## Diagnostics
+
+```sh
+crabbox status --id <id-or-slug> --wait
+crabbox inspect --id <id-or-slug> --json
+crabbox sync-plan
+crabbox history --limit 20
+crabbox history --lease <id-or-slug>
+crabbox attach <run_id>
+crabbox events <run_id> --json
+crabbox logs <run_id>
+crabbox results <run_id>
+crabbox cache stats --id <id-or-slug>
+crabbox ssh --id <id-or-slug>
+blacksmith testbox list
+```
+
+Use `--debug` on `run` when measuring sync timing.
+Use `--timing-json` on warmup, hydrate, and run when comparing backends.
+Use `--market spot|on-demand` only on AWS warmup/one-shot runs.
+
+## Failure Triage
+
+- Crabbox cannot find provider: verify `../crabbox/bin/crabbox --help` lists
+  the provider selected by `.crabbox.yaml`; update Crabbox before falling back.
+- Hydration stuck or failed: open the printed GitHub Actions run URL and inspect
+  the hydration step.
+- Sync failed: rerun with `--debug`; check changed-file count and whether the
+  checkout is dirty.
+- Command failed: rerun only the failing shard/file first. Do not rerun a full
+  suite until the focused failure is understood.
+- Cleanup uncertain: `crabbox list --provider aws`; for explicit Blacksmith
+  runs, use `blacksmith testbox list` and stop owned `tbx_...` leases you
+  created.
+- Crabbox broken but Blacksmith works: use the direct Blacksmith fallback above,
+  then file/fix the Crabbox issue.
+
+## Boundary
+
+Do not add OpenClaw-specific setup to Crabbox itself. Put repo setup in the
+hydration workflow and keep Crabbox generic around lease, sync, command
+execution, logs/results, timing, and cleanup.
diff --git a/.crabbox.yaml b/.crabbox.yaml
new file mode 100644
index 0000000..468d358
--- /dev/null
+++ b/.crabbox.yaml
@@ -0,0 +1,47 @@
+profile: proxyline-check
+provider: aws
+class: standard
+capacity:
+  market: spot
+  strategy: most-available
+  fallback: on-demand-after-120s
+  hints: true
+  regions:
+    - eu-west-1
+    - eu-west-2
+    - eu-central-1
+    - us-east-1
+    - us-west-2
+actions:
+  workflow: .github/workflows/crabbox-hydrate.yml
+  job: hydrate
+  ref: main
+  runnerLabels:
+    - crabbox
+    - openclaw
+    - proxyline
+  runnerVersion: latest
+  ephemeral: true
+aws:
+  region: eu-west-1
+  rootGB: 120
+sync:
+  delete: true
+  checksum: false
+  gitSeed: true
+  fingerprint: true
+  baseRef: main
+  exclude:
+    - .artifacts
+    - .codex
+    - .DS_Store
+    - node_modules
+    - data/*.db-wal
+    - data/*.db-shm
+env:
+  allow:
+    - CI
+    - NODE_OPTIONS
+ssh:
+  user: crabbox
+  port: "2222"
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
new file mode 100644
index 0000000..35f1f43
--- /dev/null
+++ b/.github/CODEOWNERS
@@ -0,0 +1,19 @@
+# Protect ownership and automation rules.
+/.github/CODEOWNERS @openclaw/openclaw-secops
+/.github/dependabot.yml @openclaw/openclaw-secops
+/.github/workflows/ @openclaw/openclaw-secops
+/.agents/skills/ @openclaw/openclaw-secops
+/.crabbox.yaml @openclaw/openclaw-secops
+/SECURITY.md @openclaw/openclaw-secops
+/AGENTS.md @openclaw/openclaw-secops
+
+# Package, release, and security-sensitive surfaces.
+/package.json @openclaw/openclaw-secops
+/pnpm-lock.yaml @openclaw/openclaw-secops
+/package-lock.json @openclaw/openclaw-secops
+/src/ @openclaw/openclaw-secops
+/Sources/ @openclaw/openclaw-secops
+/cmd/ @openclaw/openclaw-secops
+/internal/ @openclaw/openclaw-secops
+/scripts/ @openclaw/openclaw-secops
+/docs/ @openclaw/openclaw-secops
diff --git a/.github/dependabot.yml b/.github/dependabot.yml
new file mode 100644
index 0000000..4843006
--- /dev/null
+++ b/.github/dependabot.yml
@@ -0,0 +1,32 @@
+version: 2
+
+updates:
+  - package-ecosystem: npm
+    directory: /
+    schedule:
+      interval: daily
+    cooldown:
+      default-days: 2
+    groups:
+      npm:
+        patterns:
+          - "*"
+        update-types:
+          - minor
+          - patch
+    open-pull-requests-limit: 5
+
+  - package-ecosystem: github-actions
+    directory: /
+    schedule:
+      interval: daily
+    cooldown:
+      default-days: 2
+    groups:
+      actions:
+        patterns:
+          - "*"
+        update-types:
+          - minor
+          - patch
+    open-pull-requests-limit: 5
diff --git a/.github/workflows/codeql.yml b/.github/workflows/codeql.yml
new file mode 100644
index 0000000..9d5b44e
--- /dev/null
+++ b/.github/workflows/codeql.yml
@@ -0,0 +1,40 @@
+name: CodeQL
+
+on:
+  pull_request:
+  push:
+    branches:
+      - main
+  schedule:
+    - cron: "49 4 * * 1"
+  workflow_dispatch:
+
+permissions:
+  actions: read
+  contents: read
+  security-events: write
+
+jobs:
+  analyze:
+    name: analyze (${{ matrix.language }})
+    runs-on: ubuntu-latest
+    strategy:
+      fail-fast: false
+      matrix:
+        language:
+          - actions
+          - javascript-typescript
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v6
+
+      - name: Initialize CodeQL
+        uses: github/codeql-action/init@v4
+        with:
+          languages: ${{ matrix.language }}
+          build-mode: none
+
+      - name: Perform CodeQL Analysis
+        uses: github/codeql-action/analyze@v4
+        with:
+          category: "/language:${{ matrix.language }}"
diff --git a/.github/workflows/crabbox-hydrate.yml b/.github/workflows/crabbox-hydrate.yml
new file mode 100644
index 0000000..f69aa34
--- /dev/null
+++ b/.github/workflows/crabbox-hydrate.yml
@@ -0,0 +1,125 @@
+name: Crabbox Hydrate
+
+on:
+  workflow_dispatch:
+    inputs:
+      crabbox_id:
+        description: "Crabbox lease ID"
+        required: true
+        type: string
+      ref:
+        description: "Git ref to hydrate"
+        required: false
+        type: string
+      crabbox_runner_label:
+        description: "Dynamic Crabbox runner label"
+        required: true
+        type: string
+      crabbox_job:
+        description: "Hydration job identifier expected by Crabbox"
+        required: false
+        default: "hydrate"
+        type: string
+      crabbox_keep_alive_minutes:
+        description: "Minutes to keep the hydrated job alive"
+        required: false
+        default: "90"
+        type: string
+
+permissions:
+  contents: read
+
+env:
+  NODE_VERSION: "22"
+  PNPM_VERSION: "11.1.2"
+
+jobs:
+  hydrate:
+    name: hydrate
+    runs-on: [self-hosted, "${{ inputs.crabbox_runner_label }}"]
+    timeout-minutes: 120
+    steps:
+      - uses: actions/checkout@v6
+        with:
+          ref: ${{ inputs.ref || github.ref }}
+
+      - uses: pnpm/action-setup@v4
+        with:
+          version: ${{ env.PNPM_VERSION }}
+
+      - uses: actions/setup-node@v6
+        with:
+          node-version: ${{ env.NODE_VERSION }}
+          cache: pnpm
+
+      - name: Prepare pnpm workspace
+        shell: bash
+        run: |
+          set -euo pipefail
+          git fetch --no-tags --depth=50 origin "+refs/heads/main:refs/remotes/origin/main"
+          pnpm install --frozen-lockfile
+          node --version
+          pnpm --version
+
+      - name: Mark Crabbox ready
+        shell: bash
+        env:
+          CRABBOX_ID: ${{ inputs.crabbox_id }}
+          CRABBOX_JOB: ${{ inputs.crabbox_job }}
+        run: |
+          set -euo pipefail
+          job="${CRABBOX_JOB}"
+          if [ -z "$job" ]; then job=hydrate; fi
+          case "$CRABBOX_ID" in
+            ''|*[!A-Za-z0-9._-]*)
+              echo "Invalid crabbox_id" >&2
+              exit 2
+              ;;
+          esac
+          mkdir -p "$HOME/.crabbox/actions"
+          state="$HOME/.crabbox/actions/${CRABBOX_ID}.env"
+          env_file="$HOME/.crabbox/actions/${CRABBOX_ID}.env.sh"
+          {
+            for key in CI GITHUB_ACTIONS GITHUB_WORKSPACE GITHUB_REPOSITORY GITHUB_RUN_ID GITHUB_RUN_NUMBER GITHUB_RUN_ATTEMPT GITHUB_REF GITHUB_REF_NAME GITHUB_SHA GITHUB_EVENT_NAME GITHUB_ACTOR RUNNER_OS RUNNER_ARCH RUNNER_TEMP RUNNER_TOOL_CACHE PATH; do
+              value="${!key-}"
+              if [ -n "$value" ]; then
+                printf 'export %s=%q\n' "$key" "$value"
+              fi
+            done
+          } > "${env_file}.tmp"
+          mv "${env_file}.tmp" "$env_file"
+          tmp="${state}.tmp"
+          {
+            echo "WORKSPACE=${GITHUB_WORKSPACE}"
+            echo "RUN_ID=${GITHUB_RUN_ID}"
+            echo "JOB=${job}"
+            echo "ENV_FILE=${env_file}"
+            echo "READY_AT=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+          } > "$tmp"
+          mv "$tmp" "$state"
+
+      - name: Keep Crabbox job alive
+        shell: bash
+        env:
+          CRABBOX_ID: ${{ inputs.crabbox_id }}
+          CRABBOX_KEEP_ALIVE_MINUTES: ${{ inputs.crabbox_keep_alive_minutes }}
+        run: |
+          set -euo pipefail
+          case "$CRABBOX_ID" in
+            ''|*[!A-Za-z0-9._-]*)
+              echo "Invalid crabbox_id" >&2
+              exit 2
+              ;;
+          esac
+          minutes="${CRABBOX_KEEP_ALIVE_MINUTES}"
+          case "$minutes" in
+            ''|*[!0-9]*) minutes=90 ;;
+          esac
+          stop="$HOME/.crabbox/actions/${CRABBOX_ID}.stop"
+          deadline=$(( $(date +%s) + minutes * 60 ))
+          while [ "$(date +%s)" -lt "$deadline" ]; do
+            if [ -f "$stop" ]; then
+              exit 0
+            fi
+            sleep 15
+          done
diff --git a/.github/workflows/stale.yml b/.github/workflows/stale.yml
new file mode 100644
index 0000000..e87d74b
--- /dev/null
+++ b/.github/workflows/stale.yml
@@ -0,0 +1,86 @@
+name: Stale
+
+on:
+  schedule:
+    - cron: "17 4 * * *"
+  workflow_dispatch:
+
+permissions: {}
+
+jobs:
+  stale:
+    permissions:
+      issues: write
+      pull-requests: write
+    runs-on: ubuntu-latest
+    steps:
+      - name: Mark stale unassigned issues and pull requests
+        uses: actions/stale@v10
+        with:
+          days-before-issue-stale: 14
+          days-before-issue-close: 7
+          days-before-pr-stale: 14
+          days-before-pr-close: 7
+          stale-issue-label: stale
+          stale-pr-label: stale
+          exempt-issue-labels: enhancement,maintainer,pinned,security,no-stale
+          exempt-pr-labels: maintainer,no-stale
+          operations-per-run: 1000
+          ascending: true
+          exempt-all-assignees: true
+          remove-stale-when-updated: true
+          stale-issue-message: |
+            This issue has been automatically marked as stale due to inactivity.
+            Please add updated proxyline details or it will be closed.
+          stale-pr-message: |
+            This pull request has been automatically marked as stale due to inactivity.
+            Please update it or it will be closed.
+          close-issue-message: |
+            Closing due to inactivity.
+            If this still affects proxyline, open a new issue with current reproduction details.
+          close-issue-reason: not_planned
+          close-pr-message: |
+            Closing due to inactivity.
+            If this PR should be revived, reopen it with current context and validation.
+
+      - name: Mark stale assigned issues
+        uses: actions/stale@v10
+        with:
+          days-before-issue-stale: 30
+          days-before-issue-close: 10
+          days-before-pr-stale: -1
+          days-before-pr-close: -1
+          stale-issue-label: stale
+          exempt-issue-labels: enhancement,maintainer,pinned,security,no-stale
+          operations-per-run: 1000
+          ascending: true
+          include-only-assigned: true
+          remove-stale-when-updated: true
+          stale-issue-message: |
+            This assigned issue has been automatically marked as stale after 30 days of inactivity.
+            Please add an update or it will be closed.
+          close-issue-message: |
+            Closing due to inactivity.
+            If this still affects proxyline, reopen or file a new issue with current evidence.
+          close-issue-reason: not_planned
+
+      - name: Mark stale assigned pull requests
+        uses: actions/stale@v10
+        with:
+          days-before-issue-stale: -1
+          days-before-issue-close: -1
+          days-before-pr-stale: 27
+          days-before-pr-close: 7
+          stale-pr-label: stale
+          exempt-pr-labels: maintainer,no-stale
+          operations-per-run: 1000
+          ascending: true
+          include-only-assigned: true
+          ignore-pr-updates: true
+          remove-stale-when-updated: true
+          stale-pr-message: |
+            This assigned pull request has been automatically marked as stale after being open for 27 days.
+            Please add an update or it will be closed.
+          close-pr-message: |
+            Closing due to inactivity.
+            If this PR should be revived, reopen it with current context and validation.
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 0000000..3a401ca
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,21 @@
+# AGENTS.md
+
+`proxyline` changes should stay focused, data-safe, and aligned with the existing
+repo workflows.
+
+## Rules
+
+- Do not commit credentials, live config, generated build output, private app
+  data, or local cache/state files.
+- Keep package and release workflow changes narrow and reviewable.
+- Update docs when command flags, package surfaces, or setup behavior change.
+- Prefer existing tooling and local patterns before adding dependencies.
+
+## Checks
+
+Run the smallest relevant gate first, then the repo's full check before handoff
+when runtime behavior changed. For setup-only changes, use:
+
+```bash
+git diff --check
+```
diff --git a/SECURITY.md b/SECURITY.md
new file mode 100644
index 0000000..45b9259
--- /dev/null
+++ b/SECURITY.md
@@ -0,0 +1,30 @@
+# Security Policy
+
+## Reporting
+
+Report suspected vulnerabilities privately through GitHub Security Advisories for
+this repository. If GHSA is unavailable to you, email security@openclaw.ai.
+
+Do not open public issues for vulnerabilities or include secrets, private local
+data, credentials, tokens, app data, or exploit details in public reports.
+
+## Scope
+
+In scope:
+
+- Proxyline TypeScript package, proxy runtime, package release
+- config, credential, local filesystem, package, and workflow integrity surfaces
+- command output, logs, artifacts, or generated data that could disclose private data
+- dependency or runtime behavior that materially affects safe execution
+
+Out of scope:
+
+- upstream service outages, API changes, quotas, or account enforcement decisions
+- compromise of a trusted local account, shell, filesystem, or maintainer device
+- scanner-only findings without a reachable exploit path in supported usage
+
+## Expectations
+
+We prioritize reachable issues that affect credentials, private data, package
+integrity, privileged automation, or safe execution. Include the affected commit,
+platform, minimal reproduction steps, and sanitized impact details.