realactivity · pswider · Jun 4, 2026 · Jun 4, 2026 · Jun 4, 2026 · Jun 4, 2026
diff --git a/docs/evals.md b/docs/evals.md
@@ -0,0 +1,60 @@
+# Skill Evaluation Status
+
+Continuous evaluation status for Tula skills. This page is regenerated
+automatically by `scripts/generate-eval-status.sh` on every CI run that
+touches `skills/` or `evals/`. Static analysis (compliance, spec
+checks, token budgets) is fresh on every run; live eval results come
+from manually-published runs in `results/`.
+
+Powered by [Microsoft Waza](https://github.com/microsoft/waza).
+
+| Skill | Compliance | Spec | Tokens | Last live run |
+|---|---|---|---|---|
+| `epic-note` | Medium-High | 9/9 ✓ | 705 / 500 ⚠ | - |
+| `health-records` | Medium-High | 9/9 ✓ | 1318 / 500 ⚠ | - |
+| `lookout` | Medium-High | 9/9 ✓ | 1577 / 500 ⚠ | - |
+| `med-pdf` | Medium-High | 9/9 ✓ | 842 / 500 ⚠ | - |
+| `memory-diff` | Medium-High | 9/9 ✓ | 1183 / 500 ⚠ | - |
+| `myhealth-pulse` | Medium-High | 9/9 ✓ | 1176 / 500 ⚠ | - |
+| `prep-my-visit` | Medium-High | 9/9 ✓ | 457 / 500 ✓ | - |
+| `request-amendment` | Medium-High | 9/9 ✓ | 990 / 500 ⚠ | - |
+
+---
+
+## What this measures
+
+- **Compliance** - Waza's agentskills.io readiness score
+  (`High` / `Medium-High` / `Medium` / `Low`). `Medium-High` or better
+  is the house target.
+- **Spec** - count of agentskills.io spec checks the skill passes
+  (`spec-frontmatter`, `spec-name`, `spec-allowed-fields`, and so on).
+  9/9 is full pass.
+- **Tokens** - total tokens in `SKILL.md` against Waza's 500-token soft
+  limit. Tula's house style accepts a higher count when openclaw
+  fidelity would suffer (per `skills/AGENTS.md`'s "Token Discipline"
+  section). `⚠` marks "exceeds the soft cap but intentional"; `✓` marks
+  "within budget."
+- **Last live run** - most recent `waza run` output published in
+  `results/`. Cells show pass rate, run date, and model used (e.g.,
+  `5/5 ✓ (2026-05-17, sonnet-4.6)`). Live eval execution requires
+  `executor: copilot-sdk` plus model auth, so it is a deliberate
+  publish today rather than a per-PR CI run. Raw run outputs stay
+  private; only the pass-rate summary surfaces here.
+
+## What this does NOT measure
+
+- The model's actual answer quality. Evals check task-completion
+  signals (output shape, presence/absence of keywords, routing
+  behavior, schema validity), not clinical correctness.
+- Production behavior under PHI. All evals run against synthetic
+  personas. See `evals/*/fixtures/` for the test data.
+- Anything inside Aria's closed governance layer - multi-tenant
+  isolation, audit emission, cross-actor coordination - which is
+  evaluated separately under hospital-scale fixtures.
+
+## See also
+
+- [Eval suites](../evals/) - task definitions and fixtures
+- [Skill authoring conventions](../skills/AGENTS.md)
+- [Tula deployment guide](deployment-guide.md)
+- [Microsoft Waza](https://github.com/microsoft/waza) - the eval framework
diff --git a/scripts/agent-backup.sh b/scripts/agent-backup.sh
@@ -50,8 +50,9 @@
 # ## Exit codes
 #   0  Success (whether or not there were changes)
 #   1  Generic error
-#   2  Secret-pattern scan failed - see stderr for offending file(s)
+#   2  Secret-pattern scan or large-file guard failed - see stderr
 #   3  Push failed (commit was made; resolve auth and retry `git push`)
+#   4  Privacy guard failed (remote repo is not PRIVATE - refused to push)
 #
 # ## Exclusions (mirrors the repo's `.gitignore` - keep both in sync)
 #   credentials/                          telegram pairing secrets
@@ -153,6 +154,11 @@ PURGE=(
     'logs'
     'update-check.json'
     'plugin-runtime-deps'
+    'npm'                                  # ~700MB of plugin npm projects;
+                                           # contains coding-agent binaries
+                                           # 200MB+ each (> GitHub's 100MB
+                                           # file cap). Regenerable via
+                                           # `openclaw plugins install ...`.
 )
 
 # Nested-.git protection. Any `.git` directory under the source - at any
@@ -172,6 +178,12 @@ PROTECT=(
     'docs'
 )
 
+# Hard cap on individual file size in the backup tree. GitHub rejects any
+# file >100MB without LFS. We set a tighter 50MB cap to catch problems
+# before they hit the remote, and to keep the repo cloneable on slow links.
+# Anything over this should be added to PURGE.
+MAX_FILE_BYTES=$((50 * 1024 * 1024))
+
 # Regex patterns that look like real credentials. Tuned to be high-signal;
 # if a pattern fires, the run aborts unless the file is in ALLOWLIST_GLOBS.
 SECRET_PATTERNS=(
@@ -340,6 +352,63 @@ else
     log "secret scan: clean"
 fi
 
+# ---------- step 3b: large-file guard --------------------------------------
+#
+# Refuse to stage anything over MAX_FILE_BYTES. GitHub rejects >100MB hard,
+# but we want to catch the problem early (cheaper than a failed push) and
+# under a tighter budget so clone-from-backup stays fast.
+
+if [[ $DRY_RUN -eq 0 ]]; then
+    big_files=$(find "$AGENT_REPO_DIR" -type f -size +"${MAX_FILE_BYTES}c" \
+                    -not -path "$AGENT_REPO_DIR/.git/*" 2>/dev/null || true)
+    if [[ -n "$big_files" ]]; then
+        echo "" >&2
+        echo "Large-file guard FAILED. Files over $((MAX_FILE_BYTES/1024/1024))MB:" >&2
+        echo "------------------------------------------------------------" >&2
+        while IFS= read -r f; do
+            sz=$(du -h "$f" | cut -f1)
+            printf '  %s\t%s\n' "$sz" "${f#$AGENT_REPO_DIR/}" >&2
+        done <<< "$big_files"
+        echo "------------------------------------------------------------" >&2
+        echo "Add the offending path (or its parent dir) to the PURGE array." >&2
+        exit 2
+    fi
+    log "large-file guard: clean (no files > $((MAX_FILE_BYTES/1024/1024))MB)"
+fi
+
+# ---------- step 3c: remote-private guard ----------------------------------
+#
+# Defense in depth: refuse to push if the GitHub repo is somehow public.
+# Catches a hand-toggle in the GitHub UI that would otherwise expose every
+# subsequent backup commit. Only runs for github.com remotes when `gh` is
+# available and authenticated; otherwise it is a soft warning.
+
+verify_repo_private() {
+    local remote_url="$1"
+    if ! command -v gh >/dev/null 2>&1; then
+        log "privacy guard: gh CLI not installed - SKIPPED (soft warning)"
+        return 0
+    fi
+    if ! [[ "$remote_url" =~ github\.com[:/]([^/]+)/([^/.]+)(\.git)?$ ]]; then
+        log "privacy guard: non-github remote - SKIPPED"
+        return 0
+    fi
+    local owner="${BASH_REMATCH[1]}"
+    local name="${BASH_REMATCH[2]}"
+    local visibility
+    visibility=$(gh repo view "$owner/$name" --json visibility -q .visibility 2>/dev/null || echo "")
+    if [[ -z "$visibility" ]]; then
+        log "privacy guard: could not query gh - SKIPPED (soft warning)"
+        return 0
+    fi
+    if [[ "$visibility" != "PRIVATE" ]]; then
+        log "privacy guard: REFUSING to push - $owner/$name is $visibility (expected PRIVATE)"
+        return 1
+    fi
+    log "privacy guard: $owner/$name confirmed PRIVATE"
+    return 0
+}
+
 # ---------- step 4 & 5: commit ---------------------------------------------
 
 cd "$AGENT_REPO_DIR"
@@ -380,6 +449,8 @@ fi
 REMOTE_URL=$(git remote get-url "$AGENT_REMOTE" 2>/dev/null || true)
 [[ -z "$REMOTE_URL" ]] && { log "remote '$AGENT_REMOTE' not configured"; exit 3; }
 
+verify_repo_private "$REMOTE_URL" || exit 4
+
 log "push: $AGENT_REMOTE $AGENT_BRANCH ($REMOTE_URL)"
 
 if [[ -n "${GITHUB_TOKEN:-}" && "$REMOTE_URL" =~ ^https://github\.com/ ]]; then

diff --git a/scripts/tenant-template/README.md b/scripts/tenant-template/README.md
@@ -0,0 +1,80 @@
+# Tula tenant-template build pipeline
+
+This directory holds the three artifacts that turn a Tula development VM
+into a per-tenant golden image and provision new tenants from it.
+
+| File | Purpose | Runs on |
+|---|---|---|
+| `deprovision.sh` | Scrubs a source VM for image capture | The source VM (the one being baked) |
+| `tula-provision.sh` | Spawns a new tenant from a captured image | The operator's laptop / control-plane VM |
+| `cloud-init-template.yaml` | First-boot configuration for each new tenant | Auto-injected; never run manually |
+
+Full specification: [`~/.openclaw/workspace/docs/TENANT_TEMPLATE_BUILD.md`](../../../.openclaw/workspace/docs/TENANT_TEMPLATE_BUILD.md)
+
+## Quick start (operator)
+
+```bash
+# One-time: prepare ops home
+mkdir -p ~/tula-ops/{tenants,secrets}
+chmod 700 ~/tula-ops ~/tula-ops/secrets
+echo -n 'sk-ant-xxxx' > ~/tula-ops/secrets/anthropic-api-key && chmod 600 ~/tula-ops/secrets/anthropic-api-key
+echo -n 'ghp_xxxx'    > ~/tula-ops/secrets/github-pat-tenant-write && chmod 600 ~/tula-ops/secrets/github-pat-tenant-write
+
+# Add a few Telegram bot tokens to the pool (one per row)
+cat <<EOF >> ~/tula-ops/bot-token-pool.txt
+# pool_name      bot_token              bot_username      status
+tula_aux_001    1234567890:AAH...       TulaAux001Bot     available
+tula_aux_002    0987654321:AAH...       TulaAux002Bot     available
+EOF
+chmod 600 ~/tula-ops/bot-token-pool.txt
+
+# Bake the image (one-time, ~30 min)
+ssh azureuser@ra-bake-vm 'sudo ~/tula/scripts/tenant-template/deprovision.sh --version 0.1.0 --confirm'
+ssh azureuser@ra-bake-vm 'sudo waagent -deprovision+user -force'
+az vm deallocate -g ra-healthcareagents-rg -n ra-bake-vm
+az vm generalize -g ra-healthcareagents-rg -n ra-bake-vm
+az image create  -g ra-healthcareagents-rg -n tula-tenant-template-0-1-0 --source ra-bake-vm
+
+# Provision a tenant (per tenant, ~5 min)
+~/tula/scripts/tenant-template/tula-provision.sh new-tenant "Jane Doe" "jane@example.com"
+```
+
+## Subcommands
+
+- `tula-provision new-tenant <name> <email>` - full provision
+- `tula-provision list` - list tenants
+- `tula-provision show <tenant-id>` - show one tenant's record
+- `tula-provision health <tenant-id>` - health check
+- `tula-provision rollback <tenant-id>` - clean teardown (idempotent)
+- `tula-provision decommission <tenant-id>` - 30-day-grace offboarding
+
+## Safety
+
+- `deprovision.sh` refuses to run on hosts named `tula-tenant-*` (prevents
+  nuking a live tenant)
+- `deprovision.sh` requires `--confirm`; supports `--dry-run`
+- `tula-provision.sh` rolls back automatically on any failure during
+  provisioning (deletes Azure RG, deletes GitHub repo, returns bot
+  token to pool)
+- All operator secrets live in `~/tula-ops/secrets/` with 0600 perms
+- Tenant secrets live in `/etc/tula-tenant-secrets.env` on the tenant
+  VM with 0600 perms, owned by `azureuser`
+- No tenant content ever crosses any operator boundary (operator can
+  break-glass via SSH, but the operation is logged)
+
+## v0.1 known gaps (to harden in v0.2)
+
+- GitHub PAT per tenant is currently shared across tenants via
+  `~/tula-ops/secrets/github-pat-tenant-write`. Should be per-tenant
+  fine-grained PAT or GitHub App installation. Tracked in
+  [`TENANT_TEMPLATE_BUILD.md`](../../../.openclaw/workspace/docs/TENANT_TEMPLATE_BUILD.md) § 6.5.
+- Data disk is currently combined with OS disk. v0.2 separates them so
+  image updates don't require workspace data migration.
+- No control plane yet; tenant heartbeat to a central observability
+  endpoint is wired but disabled. Enable when control plane lands.
+- No automated image-update workflow for existing tenants; updates are
+  manual per tenant in v0.1.
+
+## License
+
+Apache-2.0 (inherited from the Tula repository).