Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,49 @@ Tags older than v0.2.0 ship release notes inside the GitHub release
page; this CHANGELOG starts at v0.2.0 (the Lemonade migration cut).
For ADR-level architecture context see `docs/internal/adr/`.

## Unreleased — Phase 0 OpenRouter prereq + v0.3 MCP completion

### Added

- **Per-persona spending-cap primitive (Phase 0 OpenRouter prereq)**.
Lands the `[persona.budget]` TOML sub-table + a pure-Python budget
enforcement layer BEFORE the V1 OpenRouter upstream provider and V2
`hal0-fusion` MCP server. DA review of the OpenRouter integration
plan flagged this as P0 must-fix #3 — without a spending-cap
envelope, fusion (4.4x cost vs single-model) plus a recursing
Hermes loop could drain a $200/credit pool overnight.
- New `src/hal0/agents/budget.py` — `Budget` dataclass, append-only
`BudgetLedger`, pure `check_budget` / `record_charge` functions,
daily / monthly / lifetime aggregation windows + per-call max.
- New REST surface (mounted under `/api/agents/{id}/personas/{pid}`):
- `GET .../budget` — current caps + running spend stats +
per-window remaining headroom.
- `PUT .../budget` — replace the budget block; preserves every
other persona field on round-trip.
- `POST .../budget/check` — dry-run pre-call gate; takes
`{"estimated_cost_usd": float}`, returns `allowed` + `reason` +
`remaining_usd`. V1 OpenRouter provider calls this BEFORE
issuing the upstream request.
- `POST .../budget/charge` — post-response recorder; appends
`{ts, persona_id, surface, model, cost_usd, request_id}` to the
ledger.
- Ledger location:
`/var/lib/hal0/agents/{agent_id}/personas/{persona_id}/spend.jsonl`.
Append-only JSON-lines (no SQLite dependency); fsync after every
write; operator-inspectable via `tail -f` + `jq`.
- New dashboard editor — `PersonaBudgetPanel` mounts under the
Personas tab beneath the persona cards. Empty-state CTA reads
"no budget set — set caps to enable cloud providers".
- Persona seed (hermes + coder) keeps an empty budget block by
default; operators opt in by editing the TOML or PUT-ing through
the API. `hal0 agent reprovision hermes` preserves operator-set
budgets (idempotent persona seed with `overwrite=False`).
- **Scope decision (PLANNING.md §5 Q2):** per-persona only for v0.3.
Per-agent + platform-wide containing scopes are deferred to v0.4.
- **PREREQ — no provider charges to this primitive yet.** V1
(OpenRouter as a Hermes upstream) wires it in as a pre-call gate
+ post-response record.

## Unreleased — v0.3 MCP completion + memory-map redesign

End-to-end completion of the `hal0-admin` + `hal0-memory` bundled MCP
Expand Down
50 changes: 50 additions & 0 deletions docs/agents/hermes/CONFIG.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,58 @@ require_approval = ["files.*", "shell.*"] # glob list
[persona.model]
preferred_upstream = "hal0"
preferred_model = "" # empty = first available

[persona.budget]
# Per-persona spending caps (Phase 0 OpenRouter prereq). Each USD cap is
# optional; the omitted ones leave that window uncapped. An explicit
# ``0.0`` blocks every paid request. ``hard_cap`` enforces (default);
# set to ``false`` for warn-only mode (allowed=true, reason logged).
daily_usd = 5.0 # rolls over at 00:00 UTC
monthly_usd = 50.0 # rolls over on the 1st UTC
lifetime_usd = 500.0 # never resets
per_call_max_usd = 0.10 # rejects any single request over this
hard_cap = true # block (true) vs warn-only (false)
```

**Budget block (Phase 0 OpenRouter prereq):**

The `[persona.budget]` sub-table arms the per-persona spending-cap
primitive. Every paid surface (V1 OpenRouter as a Hermes upstream, V2
the `hal0-fusion` MCP) consults this block via two endpoints:

| Endpoint | Direction | Effect |
|---|---|---|
| `POST /api/agents/{id}/personas/{pid}/budget/check` | Caller → hal0 | Dry-run pre-call gate; returns `allowed=false` with a `reason` if the estimated cost would breach a cap. |
| `POST /api/agents/{id}/personas/{pid}/budget/charge` | Caller → hal0 | Records a real charge into the append-only ledger after the upstream response lands. |

The ledger lives at
`/var/lib/hal0/agents/{agent_id}/personas/{persona_id}/spend.jsonl`
(one JSON object per line, append-only, fsync after every write).
Operator-inspectable with `tail -f` + `jq`. Hard-cap semantics:

- `hard_cap = true` (default) — `check` returns `allowed=false` when
the estimate would push spend past any configured cap; the caller is
expected to short-circuit the request.
- `hard_cap = false` — `check` always returns `allowed=true`, but
`reason` is populated so the caller can log a warning. Useful for
audit-only deployments where the operator wants visibility without
enforcement.

**Race tolerance:** the check-then-record pattern is NOT serialised.
Two concurrent paid requests from the same persona can both pass
`check` (they read the same ledger state) before either records a
charge — periodic over-spend within a single window is tolerated. A
real lock + daemon-side enforcer is v0.4+ work; the JSONL layout
migrates cleanly.

**Idempotency:** running `hal0 agent reprovision hermes` after the
operator PUTs a budget preserves the caps. `_phase_persona_seed`
calls `seed_default_personas(overwrite=False)` which skips existing
files; only `--repair` re-writes the seeds back to canonical empty.

**Scope:** per-persona only for v0.3. Per-agent + platform-wide
containing scopes are deferred to v0.4 (PLANNING.md §5 Q2 default).

**Change effect:** The next bootstrap render (or `hal0 agent
reprovision hermes`) picks up the new prompt. `hal0 agent personas
activate <id>` switches the active persona AND sends a best-effort
Expand Down
Loading
Loading