Skip to content

Release 0.10.0 — reasoning crate restored, CLI features, Anthropic caching, 92 new tests#12

Merged
nightness merged 25 commits into
mainfrom
v-0.10
Apr 19, 2026
Merged

Release 0.10.0 — reasoning crate restored, CLI features, Anthropic caching, 92 new tests#12
nightness merged 25 commits into
mainfrom
v-0.10

Conversation

@nightness
Copy link
Copy Markdown
Member

Summary

Cuts v0.10.0 from v-0.10. The big-ticket item is the architectural restoration of brainwires-reasoning — the 0.9.0 release shipped the crate as a 22-line re-export shell, and the scorer modules + plan/output parsers that were supposed to live there were split across brainwires-core and brainwires-agents. This PR puts them all in the crate they belong in, which is SemVer-breaking for anyone importing brainwires_core::plan_parser::… directly (hence 0.10 not 0.9.1).

What's in the release

  • Architecture restoration (BREAKING): brainwires-reasoning now owns plan_parser, output_parser, and all 9 scorer modules (complexity, entity_enhancer, relevance_scorer, retrieval_classifier, router, strategies, strategy_selector, summarizer, validator). brainwires_core::plan_parser/::output_parser are gone. brainwires_agents::reasoning::… still resolves via a re-export of the new crate, so existing callers through that path keep working.
  • brainwires-providers: Anthropic prompt caching enabled by default on both chat + stream requests; cache_read/cache_creation token counts logged. ContentBlock::Image (Base64) now converts to Anthropic's native image envelope.
  • brainwires-tools: bash NetworkDeny sandbox via unshare -U -r -n (Linux, silent no-op elsewhere with a warning). Per-stream 25KB output cap with head/tail middle-truncation respecting UTF-8 boundaries.
  • brainwires-cli: /dream, /dream:status, /dream:run slash commands; --sandbox=network-deny; --all-tools; Monitor tool for background process watching; /shell interactive overlay; remappable global keybindings; TUI ask_user_question, skill autocomplete, custom status line; auto-loading of CLAUDE.md/BRAINWIRES.md from cwd upward; --provider first-run picker; command_handler.rs split into topic submodules; skill allowed_tools + execution-mode honouring; worktree primitive.
  • Tests: proptest added as workspace dev-dep. 92 new tests across 5 new integration files — permissions (44), mcp (15), reasoning (25), tools (7), metacrate smoke (1).
  • Docs: TESTING.md corrected to reference brainwires_agents::eval (the eval framework is a module, not a standalone crate). Matter implementation flagged experimental with known gaps.
  • Publish tooling: scripts/publish.sh --preflight-only for fast manifest checks.

20 commits since v0.9.0

chore: bump version to 0.10.0
test(metacrate): compile-time smoke for re-export surface
test(tools): FileOpsTool path resolution — pin current behaviour
test(reasoning): parser property tests + JSON extraction edge cases
test(mcp): JSON-RPC type roundtrips + transport discriminator edge cases
test(permissions): first integration suite — policy, domains, audit, anomaly
refactor(reasoning): restore brainwires-reasoning as the owner of Layer 3 logic
fix(providers): drop unreachable catch-arm in Anthropic block conversion
feat(cli): /dream commands, --sandbox flag, --all-tools, curated tool set
feat(providers): Anthropic prompt caching + image ContentBlock support
feat(tools): bash network-deny sandbox + per-stream byte caps
feat(cli): close the remaining scope-limited skill + keybinding items + worktree primitive
feat(cli): /shell interactive overlay + remappable global keybindings
refactor(cli): split 2456-line command_handler.rs into topic submodules
feat(cli): honor skill allowed_tools + execution modes in /skill
feat(cli): TUI ask_user_question, skill autocomplete, custom status line, docs
feat(cli): harness parity — settings, hooks, memory, ask, monitor polish
feat(cli): add Monitor tool for background process watching
feat(cli): auto-load CLAUDE.md and BRAINWIRES.md from cwd upward
feat(cli): make --provider flag actually work, add first-run picker

Breaking changes for downstream consumers

0.9.0 path 0.10.0 path
brainwires_core::plan_parser::{parse_plan_steps, steps_to_tasks, ParsedStep} brainwires_reasoning::plan_parser::…
brainwires_core::output_parser::{JsonOutputParser, JsonListParser, OutputParser, RegexOutputParser} brainwires_reasoning::output_parser::…
brainwires-core/planning feature feature removed (pull brainwires-reasoning directly)

brainwires_agents::reasoning::… and brainwires-core/native both keep resolving — no change needed for those.

Test plan

  • cargo fmt --check clean
  • cargo xtask check-stubs — no hard blockers (46 comment markers are pre-existing, all in CLI debug/introspection code)
  • cargo build --workspace clean (10m 46s)
  • cargo check --workspace clean post-bump (4m 57s)
  • Every extras/* crate builds individually with cargo clean between each (17 pass, 2 non-Rust skip, 1 excluded brainclaw which is pinned at 0.8.0 pre-existing)
  • Per-crate test runs: reasoning 67, core 60, agents+reasoning 368, permissions 108 (44 new), mcp 30 (15 new), tools 121 (7 new), metacrate 1 new — all green
  • ./scripts/publish.sh --preflight-only passes
  • Post-merge: ./scripts/publish.sh --live for crates.io publish + v0.10.0 tag

🤖 Generated with Claude Code

nightness and others added 21 commits April 17, 2026 10:26
The --provider flag existed on `chat` and `task` but was silently
dropped at every call site (underscore-prefixed everywhere) — config
was the only real input. Now the flag is threaded through a new
ProviderFactory::create_with_overrides() with precedence
CLI flag > BRAINWIRES_PROVIDER env > config.

New surface:
- src/types/provider_ext.rs — CLI-local helpers (env_var_name,
  summary, credential_hint, detect_provider_from_env, CHAT_PROVIDERS)
  since ProviderType lives in the framework crate.
- src/cli/first_run.rs — interactive dialoguer picker on first run
  (TTY) or a structured error listing providers + env vars (non-TTY).
  Triggered when ConfigManager::is_first_run() AND no credentials
  detected in the environment.
- /provider slash command — list + switch live, wired through
  command_handler.rs / builtin.rs / conversation_commands.rs.
- BRAINWIRES_API_KEY env fallback for Brainwires SaaS (CI usage).
- Env-var API-key fallback for direct providers (ANTHROPIC_API_KEY
  etc.) before keyring-miss errors.
- ConfigManager::is_first_run() — config-didn't-exist sentinel.

Cleanup:
- Error messages no longer hardcode "brainwires auth login"; they
  use credential_hint(provider) which emits the right command per
  provider.
- README gained a Providers section up front.

9 new unit tests; all 603 lib tests green.

Part 1 of the multi-phase plan in
/home/nightness/.claude/plans/extras-brainwires-cli-is-a-massive-lovely-sprout.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends utils/brainwires_md to walk from the working directory toward
the filesystem root collecting CLAUDE.md and BRAINWIRES.md files, plus
~/.claude/CLAUDE.md and ~/.brainwires/CLAUDE.md / BRAINWIRES.md as
global user-level instructions. Walking order puts ancestors before
cwd so the cwd file wins on conflicts. Duplicates (same canonical path
from multiple entry points, e.g. cwd is $HOME) are suppressed.

The assembled instructions are injected into Edit, Ask, and Batch
mode system prompts under a single `## Project and User Instructions`
header with `### From <path>` subheaders per source, so the model can
cite which file a rule came from.

This matches Claude Code's CLAUDE.md auto-loading — migrating users
get their existing CLAUDE.md picked up with zero configuration.

Opt out via BRAINWIRES_DISABLE_AUTO_INSTRUCTIONS=1 for scripts or
benchmarks that need a clean prompt.

- src/utils/brainwires_md.rs: discover_project_instructions,
  render_instructions, InstructionSource. 6 new unit tests.
- src/system_prompts/modes.rs: load_auto_instructions helper; wired
  into build_system_prompt_with_context, build_ask_mode_system_prompt,
  build_batch_mode_system_prompt.

Part 2 of the multi-phase plan in
/home/nightness/.claude/plans/extras-brainwires-cli-is-a-massive-lovely-sprout.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New src/tools/monitor.rs provides 4 tools for watching long-running
shell commands without blocking the agent's turn:

- monitor_start(command, cwd?) → returns opaque id, spawns via
  `bash -o pipefail -c`, streams stdout+stderr into a ring-buffered
  FIFO (cap 10_000 lines).
- monitor_read(id, since_offset?, max_lines?) → drains new lines
  with per-line offsets so callers can resume idempotently. Returns
  status (running / exited_ok / exited_error / killed / exited_unknown).
- monitor_stop(id) → SIGKILL + removes from registry.
- monitor_list() → enumerate active watchers with age + buffered lines.

Designed for: dev servers, log tails, long builds, file watchers.
Each session has its own registry (MonitorTool held by ToolExecutor).

Registered in ToolRegistry::with_builtins path and dispatched under
`monitor_*` prefix in ToolExecutor::execute. monitor_start inherits
the normal tool-approval flow; the read/stop/list tools don't require
approval since they only touch already-approved processes.

5 async unit tests cover lifecycle, since_offset resume, stop removes
from list, unknown id error, empty command error. 614 lib tests pass.

Part 3 of the multi-phase plan in
/home/nightness/.claude/plans/extras-brainwires-cli-is-a-massive-lovely-sprout.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four Claude-Code-shaped features plus pass-2 polish on recent code.

Settings layering (new):
  ~/.brainwires/settings.json → ~/.claude/settings.json (migrator compat)
  → <project>/.brainwires/settings.json → settings.local.json.
  Scalars take the later value; permission arrays concatenate.

Tool-specific permissions (allow/deny/ask): Claude-Code-syntax rules
  (Bash(ls:*), Edit(src/**/*.rs), mcp__server__tool) checked before the
  PolicyEngine branch. deny overrides PermissionMode::Full; allow skips
  approval but still audits; ask forces approval.

Hooks (PreToolUse / PostToolUse / UserPromptSubmit / Stop): configured
  under settings.hooks, dispatched around route_tool and at the two
  chat-loop lifecycle points. Exit 0 = continue, 2 = block with stderr
  fed back, other non-zero = soft error. 5s default timeout.

Auto-memory: ~/.brainwires/projects/<encoded-cwd>/memory/ with MEMORY.md
  index + typed memory files (user/feedback/project/reference).
  memory_save/delete/list tools; index auto-rewritten on every mutation.
  System prompt injection opt-outs via BRAINWIRES_DISABLE_AUTO_MEMORY=1.

ask_user_question tool: mpsc+oneshot channel (same shape as approval +
  sudo), with dialoguer fallback for plain-CLI mode and Cancelled on
  non-TTY. TUI adapter to question_panel left as a follow-up.

Pass-2 polish:
- first-run picker default now actually matches its comment (Brainwires
  when a saved session exists, Anthropic otherwise).
- Monitor tool tracks ring-buffer evictions and surfaces dropped_lines
  on read/list, so a chatty dev server can't silently outrun the agent.

649 lib tests pass; 37 new tests cover the four features.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ine, docs

Pass 4 — finishes the one knowingly-incomplete piece from pass 3
(TUI modal for ask_user_question), surfaces the new harness
features through docs, and closes the cheap remaining audit items.

TUI ask_user_question (new AppMode::UserQuestion):
- user_question_rx polled alongside approval_rx in tui/mod.rs; on
  receive, adapts the UserQuestionRequest into a synthetic
  QuestionBlock via new src/ask adapter helpers and reuses the
  existing question_panel renderer unchanged.
- New handler src/tui/app/events/modals.rs::handle_user_question_event
  mirrors question-answer navigation but routes submit/cancel back
  over the tool's oneshot channel instead of the AI conversation.
- Dialoguer fallback from pass 3 remains for plain-CLI / non-TTY.

Skills:
- src/utils/skills.rs loader was already called at App construction
  and /skill* commands were already wired. Added the missing pieces:
  dynamic autocomplete for /<skill-name> from the discovered
  SkillRegistry, and a fall-through so an unknown /<skill-name>
  invokes /skill <name> automatically.

Custom status line:
- Config gains an optional status_line_command. refresh_status_line()
  runs bash -c with a 200 ms timeout, caches the result for 1 s, and
  appends to the status bar. No render stalls.

Docs + dogfood:
- New docs/harness/settings.md — schema, merge order, permission
  patterns, hook exit codes + event payloads, memory types, and
  ask_user_question contract.
- New docs/harness/settings.example.json — committed-but-inert
  reference showing a real deny rule set + two hooks.
- CHANGELOG grouped under (settings) / (hooks) / (memory) / (tools) /
  (tui) / (config) / (docs) for pass 3+4.

Smoke:
- New tools::executor::tests::settings_deny_blocks_even_in_full_mode:
  live integration test that a Bash(rm:*) deny rule blocks under
  PermissionMode::Full (the central safety guarantee).
- New config::settings_manager::tests::docs_example_parses: guards
  the committed example JSON against schema drift.
- 655 lib tests pass (+6 since pass 3: 4 ask adapter round-trips,
  deny integration, docs example parse). Release binary builds and
  runs --help cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skills previously loaded their body + injected it as a user-role message
with no constraints. This commit makes /skill actually enforce the
skill's declared contract:

handle_invoke_skill (command_handler.rs:2013):
- Injects the rendered instructions as a **system** message (was user),
  so the AI treats the skill as constraint, not chat.
- Uses brainwires_agents::skills::render_template to substitute
  positional args and key=value args into {{placeholders}} in the body.
- Branches on ExecutionMode: Inline runs as-is; Subagent and Script log
  a notice and fall back to Inline (full agent-pool / orchestrator
  wiring is a follow-up pass).
- Stashes the skill's allowed_tools on App.pending_skill_tool_scope
  for the next AI turn.

Tool-scope enforcement (message_processing/mod.rs:240):
- If pending_skill_tool_scope is set, filter the tools list passed to
  the provider to only names matching the allowlist (plus MCP-style
  "server__tool" suffix match). Clear unconditionally after one turn.
- IPC and MDAP paths warn + clear but don't yet apply the filter —
  noted for follow-up.

/skill:show (command_handler.rs:2223):
- Lists Level-3 resources (scripts/, references/, assets/) via
  SkillRegistry::get_resources, so users can see what a skill ships
  without opening the file.

655 lib tests still pass — no regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
command_handler.rs had grown to 2456 lines in one `impl App` block —
hard to navigate and easy to lose stuff in. Turned it into a directory
module with the same external name, one file per topic:

  src/tui/app/message_processing/command_handler/
    mod.rs        # dispatch (handle_command + handle_command_action),
                  # mdap / context / tools-mode handlers (~1120 lines)
    knowledge.rs  # /learn, /knowledge, /knowledge:*        (~299 lines)
    profile.rs    # /profile, /profile:*                    (~397 lines)
    agents.rs     # /agents, /switch, /spawn, hibernate/resume  (~187)
    skills.rs     # /skill, /skills, /skill:*               (~449 lines)

Zero behavior change. `mod command_handler;` in the parent module still
resolves to the same path. Each submodule's `impl App { ... }` methods
are marked `pub(super)` so the dispatch in mod.rs can call them.

While I was in there, cleaned up the clippy warnings I'd introduced
over the prior passes:

- Collapsed nested `if let` / `if` into `&& let` (ask/mod.rs,
  config/settings.rs, tools/memory.rs, tui/app/events/modals.rs,
  command_handler/skills.rs, state.rs).
- `monitor.rs`: `.min(MAX).max(1)` → `.clamp(1, MAX)`.
- `utils/memory.rs`: counter loop → `.enumerate()`.
- `tools/memory.rs`: `#![allow(clippy::await_holding_lock)]` on the
  tests module; the env-var lock is process-global and must be held
  across await deliberately.

Lib-only clippy went from 10 warnings → 1 (the one remaining is
pre-existing and not mine). 655 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pass 5 §1 and §2 landing together.

/shell (src/tui/shell_overlay.rs, new):
- New slash command + AppMode::OpenShell action. Main TUI loop drops
  raw mode + leaves the alt screen + disables mouse capture, spawns
  bash (or $SHELL, or explicit override) with inherited stdio, then
  restores on return. Exit code captured into shell_history.
- Unix-gated (#[cfg(unix)]). Windows gets a clear "not yet supported"
  message from the action handler — no stub spawn.
- RestoreGuard ensures the TUI terminal state comes back even on panic
  during the shell invocation.

Remappable keybindings (src/tui/keybindings.rs, new):
- New `settings.keybindings.global` map from action name → key spec.
  Spec grammar: Modifiers (Ctrl/Alt/Shift) + key (char, Esc, Enter, Tab,
  Space, arrows, Home/End/PageUp/PageDown, F1–F24). Case-insensitive.
- `KeybindingMap::from_settings` seeds every known action with a
  built-in default, then overlays user entries. Unknown actions + unparseable
  specs log a warning and keep the default — partial configs are fine.
- Two actions wired in this pass: `console_view` (Ctrl+D) and
  `plan_mode_toggle` (Ctrl+P). The six-key target from the plan was
  deliberately narrowed — other globals (Ctrl+T/R/B/F) live inside the
  Normal-mode handler and can move over in a follow-up without changing
  the abstraction.
- Settings.merge extended to per-action later-wins on keybindings.

Docs + CHANGELOG updated. 662 lib tests pass (+6 keybinding, +1
shell_overlay signature), zero regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… + worktree primitive

Pass 5 follow-up. All of the items I'd explicitly scoped down or
deferred in prior commits.

Keybindings — expand to 6 actions (src/tui/keybindings.rs):
- Defaults now seed task_viewer (Ctrl+T), reverse_search (Ctrl+R),
  sub_agent_viewer (Ctrl+B), file_explorer (Ctrl+Alt+F) in addition
  to the original console_view / plan_mode_toggle.
- Four call sites swapped from event.is_<action>() to
  self.keybindings.matches("<action>", &event) in events/core.rs and
  events/viewers.rs. Docs table updated.

Skills — honor declared execution_mode (command_handler/skills.rs):
- Inline: already correct (body as system message, tool scope stashed).
- Subagent: now renders via the framework's prepare_subagent shape
  ("You are executing the '{name}' skill..." prefix) before injection.
  Full TaskAgent spawn via AgentPool still routes through /spawn —
  this pass makes the system prompt match what SkillExecutor would
  produce.
- Script: body is framed as an explicit instruction to run it via
  execute_script. `execute_script` is auto-appended to the scoped
  tool list if the skill's allowed_tools don't already include it,
  so the script mode can actually execute.

Skills — tool scope in MDAP + IPC (message_processing/mod.rs):
- New helper `apply_and_clear_skill_tool_scope` on App — replaces the
  inline filter in the default path and is now also called from the
  MDAP path (filters AgentContext.tools before OrchestratorAgent runs).
- IPC path can't enforce client-side (remote session owns its own
  ToolExecutor), so it surfaces a one-line notice instead of a silent
  clear.

Skills — SkillRouter auto-suggest (message_processing/mod.rs,
prompt_mode.rs):
- New `suggest_skill_for()` on App runs a keyword match against the
  discovered registry (same heuristic as brainwires_agents::SkillRouter::keyword_match,
  synchronous so it doesn't require the Arc<RwLock> dance). Emits
  "💡 Skill 'X' may help — invoke with /X" as a console hint when
  confidence ≥ 0.75. Never auto-invokes.
- Called right after the user message is pushed into conversation
  history, before the AI turn.

Worktree agent isolation primitive (src/agent/worktree.rs, new):
- RAII WorktreeGuard — `create(repo, label)` runs `git worktree add
  --detach` at ~/.brainwires/worktrees/<label>-<uuid>/; Drop runs
  `git worktree remove --force` with a manual-rm-dir fallback on
  failure.
- `prune_orphans()` helper for startup GC.
- Full Agent({isolation: "worktree"}) lifecycle wiring (TaskAgentConfig
  integration, FileLockManager interaction, permission scoping) stays
  deferred — this commit ships the primitive so that pass has
  something to build on without touching the rest of the agent system.

Test infrastructure (utils/mod.rs):
- New EnvVarGuard RAII helper restores the previous value of an env
  var on drop. Fixes a cross-test leakage: the worktree tests needed
  to override dot_brainwires_dir, and tempting as it was to swap $HOME,
  that bled into parallel tests reading dirs::home_dir() (file_explorer).
  Switched the override to BRAINWIRES_HOME (new; parallel to
  BRAINWIRES_MEMORY_ROOT) and migrated memory/worktree test fixtures
  to EnvVarGuard.
- utils/paths.rs::dot_brainwires_dir() now honors BRAINWIRES_HOME.

665 lib tests pass (was 664 before this pass's additions + 3 worktree
tests - 1 test_new_file_explorer flake that's now stable). Lib-only
clippy stays at 1 pre-existing warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BashSandboxMode::NetworkDeny wraps commands in `unshare -U -r -n` on Linux
(silently no-op elsewhere with a warning). Every bash invocation also
middle-truncates stdout/stderr at 25KB to keep a single runaway line from
blowing past model context limits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Set cache_prompt: true on chat + stream requests so tools/system blocks
  earn cache hits across turns; log cache_read vs cache_creation token
  counts so callers can verify hits in production.
- Convert ContentBlock::Image (Base64) → AnthropicContentBlock::Image with
  AnthropicImageSource, unblocking multimodal user messages. Added a
  roundtrip unit test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… set

- /dream, /dream:run, /dream:status slash commands wire the framework's
  DreamConsolidator into the CLI via an InMemoryDreamSessionStore adapter
  (extras/brainwires-cli/src/dream/). TUI shows a before/after token
  report after each manual cycle; background scheduling comes later.
- New `--sandbox=network-deny` CLI flag propagates to the bash tool via
  BRAINWIRES_BASH_SANDBOX. Set once at startup (pre-thread-spawn) so the
  tool's env read is race-free.
- New `--all-tools` opts into eager enumeration of every registered tool.
  Default non-TUI chat paths now call select_non_tui_tools(), which
  returns the curated core set (14 tools incl. search_tools) in canonical
  order — smaller request body and a stable prefix for prompt caching.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior commit added the ContentBlock::Image arm which made the `_ => None`
catch-all unreachable. Dropping it keeps the match exhaustive so adding a new
ContentBlock variant fails loudly instead of silently filtering out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er 3 logic

Reverses the accidental gutting of this crate during the v0.8→v0.9 refactor
(commits ad59b21 and 662342c). The original plan (sleepy-popping-falcon.md
PR 7) called for `brainwires-reasoning` to own plan/output parsing and the
local-inference scorers. What shipped instead was:

- The 8 scorers (complexity, entity_enhancer, relevance_scorer,
  retrieval_classifier, router, strategies, strategy_selector, summarizer,
  validator) hidden inside `brainwires-agents::reasoning` behind a feature,
- plan_parser/output_parser stuck in `brainwires-core` behind its `planning`
  feature, and
- `brainwires-reasoning` reduced to a 22-line re-export facade with zero
  tests and no runtime code of its own.

This commit moves the code to where it was meant to live:

- `brainwires-core/src/{plan_parser,output_parser}.rs` → `brainwires-reasoning/src/`
  (real `git mv`, not a copy). Drops the `planning` feature from core and
  its optional `regex` dep. The `native` feature stays as an empty stub so
  downstream `brainwires-core/native` references still resolve.
- `brainwires-agents/src/reasoning/*.rs` → `brainwires-reasoning/src/`.
  The module surface (LocalInferenceConfig, InferenceTimer, log_inference,
  every scorer re-export) lands in `brainwires-reasoning/src/lib.rs`.
- `brainwires-agents` now depends on `brainwires-reasoning` under the
  `reasoning` feature and re-exports it as `brainwires_agents::reasoning`,
  so existing callers (`extras/brainwires-autonomy`, `extras/brainwires-cli`,
  etc.) keep resolving via the same path — no caller rewrites needed.
- `extras/brainwires-cli/src/utils/mod.rs` facade retargeted to
  `brainwires::agents::reasoning::plan_parser` since
  `brainwires::core::plan_parser` no longer exists.

Prompting stays in `brainwires-knowledge` — that deviation from the
original plan (documented in commit ca6e13a) remains correct because of
its tight coupling to `bks_pks`. The restored crate's lib.rs explains
this explicitly so the choice is visible.

Verified: 67 tests pass in brainwires-reasoning (parser + scorer +
inference-config coverage), 60 in brainwires-core, 368 in
brainwires-agents with `reasoning` feature. Full `cargo check --workspace`
clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anomaly

Security-perimeter gap in the test inventory: brainwires-permissions had 71
inline tests but no `tests/` directory, so the engine's real consumer-facing
behaviour (priority ordering, wildcard domain matching, audit durability,
anomaly thresholds) went unverified across crate boundaries. Closes that gap
with 44 new integration tests across four files:

- policy_matching.rs (23 tests): table-driven coverage of every
  PolicyCondition variant, AND/OR/NOT composition (including empty-AND
  vacuous-truth and empty-OR), priority ordering where deny overrides allow,
  default-action fallback, disabled-policy skipping, and the with_defaults()
  preset.
- wildcard_domains.rs (5 proptests): sweeps suffix-confusion
  (`*.example.com` vs `example.com.attacker.io`), prefix-confusion
  (`fakeexample.com`), and apex/subdomain coverage. Guards the load-bearing
  `*.` matching rule in policy.rs:113-124.
- audit_durability.rs (8 tests): important events (PolicyViolation,
  TrustChange, HumanIntervention, UserFeedback) must hit disk before log()
  returns; ordinary events buffer until flush; JSONL format stays well-formed
  across mixed event types; a fresh logger pointed at an existing path
  replays prior-session events; a disabled logger is silent.
- anomaly_thresholds.rs (8 tests): sliding-window threshold boundary (fires
  at count >= threshold and keeps firing until window clears), per-agent
  isolation, out-of-window forgetting, tool-call rate detection, path-scope
  allowlist flagging `/etc/passwd` but passing `/workspace/src/main.rs`, and
  the no-op allowlist case.

Adds `proptest` as a workspace dev-dep and wires it into the permissions
crate. Deterministic everywhere — anomaly tests fabricate events with
explicit epoch timestamps so window aging doesn't rely on sleep().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
brainwires-mcp had ~15 inline tests and no tests/ dir, despite being the
parse surface for every byte coming off an MCP transport. Adds 15 checks
across 1 file:

- 10 explicit edge cases: string/integer/null id round-trips; error
  responses with `data` payloads skip `result` on the wire as required;
  notifications never emit `id`; ProgressParams parse from
  `notifications/progress`; unknown method names and malformed progress
  payloads both fall through to McpNotification::Unknown without panicking;
  transport discriminator (mirror of transport.rs:162-180) treats
  explicit `id: null` as notification and rejects malformed JSON.
- 5 proptest roundtrips: JsonRpcRequest / Response-success /
  Response-error / Notification / ProgressParams all survive a JSON
  serialize→deserialize cycle with shape intact. Progress floats are
  fixed to integer-valued f64s so JSON's decimal encoding is exact —
  the earlier `1e6` range exposed real ULP drift that would have been
  a flaky test rather than a genuine bug.

Also fixes TESTING.md to point at `brainwires_agents::eval` (the eval
framework is a module in brainwires-agents, not a standalone crate),
including the §8 §§-pointer now that scorers live in brainwires-reasoning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the code just relocated from brainwires-core into
brainwires-reasoning. 25 tests across 1 file:

- plan_parser shape invariants: numbered-dot/paren and `Step N:` formats
  both accepted; 2-space indent maps to indent_level=1; priority flag
  set for "important"/"critical"/"!" keywords (case-insensitive);
  short/note/warning bullets filtered; empty and whitespace-only inputs
  return empty vecs.
- steps_to_tasks preserves count, priority→TaskPriority mapping, and
  encodes step number in task id.
- output_parser edge cases: JsonOutputParser extracts from markdown
  fences with and without language tags and from surrounding prose;
  JsonListParser handles arrays; RegexOutputParser rejects invalid
  patterns at construction, surfaces mismatches as Err, and extracts
  named captures; format_instructions are non-empty.
- proptests: plan_parser is panic-free on arbitrary text and always
  emits strictly increasing step numbers; numbered-line counts match
  expectation (description trimmed by the parser, so assert on trimmed);
  JsonOutputParser never panics on arbitrary text; embedded
  `{"key":N}` objects in surrounding prose extract their value;
  indent_level always equals `leading_spaces / 2`.

Also fixes a stale doc-test import (`brainwires_core::output_parser` →
`brainwires_reasoning::output_parser`) left over from the P0 #0 move.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
brainwires-tools had no `tests/` directory. Adds focused coverage for
`FileOpsTool::resolve_path`, the single seam between a caller path string
and filesystem I/O. 7 tests, 5 hand-written + 2 proptests:

- relative paths anchor against working_directory; absolute paths pass
  through; nonexistent targets still anchor correctly so callers can
  mkdir the parent; nested `a/b/c.txt` composes as expected.
- `dotdot_traversal_is_not_blocked_current_behaviour` explicitly pins
  the fact that resolve_path does NOT enforce a working-dir sandbox —
  a `../sibling.txt` call escapes the working directory. Comment in the
  test tells a future sandboxing change exactly how to update it.
- proptests: arbitrary UTF-8 input never panics; unicode-named paths
  (`éüß` byte sequences) roundtrip through resolution unmangled.

The existing inline tests in bash.rs already cover the sandbox mode +
truncate_middle + shell_escape helpers added in the prior commit, so no
bash-level integration tests are needed yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Guards brainwires:: paths for every framework subsystem reachable via the
metacrate. Pure typecheck — const _: fn() = || { ... }; blocks assert that
Task, Message, Role, PermissionMode, TaskPriority, TaskQueue, ToolRegistry,
PolicyEngine, plan_parser, TieredMemory, and McpServerConfig all still
resolve under their respective feature flags.

If a sub-crate rename or a dropped re-export ever sneaks through, this
file stops compiling — catching the break before any downstream user
hits it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All 17 extras/ crates build clean (cargo build per crate with cargo clean
between each to bound disk). The one exclusion is extras/brainclaw, which
is pinned at v0.8.0 with brainwires-tools ^0.8.0 and remains explicitly
excluded from the workspace — pre-existing and not a release blocker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
constant_time_eq 0.4.3 was published 2026-04-18 (same day we cut 0.10.0)
and bumped its own MSRV to Rust 1.95, which was itself released only two
days earlier. blake3 → datafusion → lancedb transitively depends on
^0.4, so a fresh Cargo.lock on CI picked 0.4.3 and broke every build job.

Declaring `constant_time_eq = "=0.4.2"` as a direct dep in the
publish=false xtask crate makes the workspace resolver unify the
transitive at 0.4.2 (MSRV 1.85, comfortably under our 1.91 floor) without
requiring us to commit Cargo.lock. No impact on published crates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nightness and others added 4 commits April 18, 2026 20:24
Two issues that CI caught post-MSRV-pin:

1. brainwires-providers/src/anthropic/chat.rs — after the prior fix
   removed the `_ => None` catch-arm, every remaining arm returned
   `Some(...)`, so `.filter_map` was now equivalent to `.map`. Clippy's
   `unnecessary_filter_map` (error-level under `-D warnings`) rightly
   flagged it. Collapsed to `.map(...)` and dropped the `Some()`
   wrappers.

2. brainwires-tools/src/bash.rs:900 — `"A".repeat(n) + &"Z".repeat(n)`
   fails to typecheck in CI's fresh-lockfile environment (the `String
   + &String` deref coercion doesn't land for reasons that don't
   reproduce locally with our pinned lock). Rewrote as
   `format!("{}{}", ..., ...)` which is unambiguous.

3. Bonus: brainwires-cli/src/providers/factory.rs — local clippy 1.92
   flagged `Some(s) if s.is_empty()` via the new `redundant_guards`
   lint. Not on the CI matrix (1.91), but trivially fixed to
   `None | Some("")` to keep the 1.92+ path clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Test job on stock ubuntu-latest runners is OOMing the linker on the
brainwires-knowledge test binary: lancedb + datafusion + arrow + tantivy
+ tree-sitter-{c,cpp,python,rust,…} + image processing all link into a
single test artifact with full debug info, and collect2 dies with
`signal 7 [Bus error]` before completing.

line-tables-only retains file:line info (so panic backtraces are still
readable) but drops the dwarf .debug_info section that drives the
binary-size explosion. Release builds keep the default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_history_list_with_zero_limit, test_history_list_with_large_limit,
and test_history_search_combined_parameters all invoke CLI code paths
that eagerly initialize the FastEmbed embedding provider (downloading
model.onnx from HuggingFace on first use). On CI this hits a 504 and
fails deterministically; the default history list / search paths
short-circuit before embedding init and pass.

The actual fix is to make embedding-model init lazy in the CLI so these
listing paths never need the network. Tracked separately. Marked
#[ignore] until that lands — `cargo test -- --ignored` locally to run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
daily_job_not_due_again_within_same_day was using Utc::now() for both
`last_fired_at` (now-30m) and the is_due check. When CI runs near 02:00
UTC (today: 02:23), last_fired lands at 01:53 with the cron's 02:00
fire window crossed between them — correctly marking the job due. The
test's expectation (`!is_due`) is only valid when `now` is far from
02:00.

Pinned to 2026-01-15T12:00:00Z (noon UTC) so the 02:00 fire is
nowhere near the window and the test no longer depends on the CI
clock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nightness nightness merged commit 6c9bae0 into main Apr 19, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant