diff --git a/README.md b/README.md index d67b6f8..8fbf7ff 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,8 @@ Other install methods: [pip install](#alternative-install-with-pip) | [uv instal ## πŸ”₯πŸ”₯πŸ”₯ News (Pacific Time) -- June 6, 2026 (latest, **v3.5.82**): **macOS install reliably puts `cheetahclaws` on PATH, and local Ollama models that emit tool calls as text now actually execute them** (two fixes from issue #131). **(1) Install/PATH on macOS:** the installer `source`s the dedicated venv it creates, which made the post-install `command -v cheetahclaws` check succeed *inside the script's own shell* β€” so it reported "on PATH" and **skipped the entire rc-file step**, leaving `~/.zshrc` untouched and the binary unreachable in new terminals. It now symlinks only the `cheetahclaws` entry point into `~/.local/bin` (pipx-style, so the venv's `python`/`pip` don't shadow yours), creates `~/.zshrc` / `.bash_profile` if missing, and appends `~/.local/bin` to PATH there β€” without trusting the venv-polluted `command -v` (`scripts/install.sh`). **(2) Ollama tool calls:** `stream_ollama` only read Ollama's structured `message.tool_calls` field, while the cloud path already recovers calls a model emits as **text**, so Qwen-coder / Gemma / Mistral over Ollama produced "tool-calling-style chat" that streamed as plain text and never ran β€” the model seemed to "just keep talking." `stream_ollama` now mirrors the cloud path's interceptor: it buffers from the first `` / `<|tool_call|>` / `[TOOL_CALLS]` marker (so raw markup never reaches the user) and parses it into real tool calls at end-of-stream (`providers.py`). Details: [docs/guides/usage.md](docs/guides/usage.md#usage-open-source-models-local) Β· [docs/guides/faq.md](docs/guides/faq.md) Β· [docs/news.md](docs/news.md). +- June 16, 2026 (latest): **Internal modules drop the `cc_` prefix β€” `cc_config` / `cc_daemon` / `cc_kernel` / `cc_mcp` are now `config` / `daemon` / `kernel` / `mcp_client`.** A readability cleanup with no behavior change. The MCP client is named `mcp_client` (not bare `mcp`) to avoid shadowing Python's namespace and the `modelcontextprotocol` package. **Breaking only if you import these modules directly** β€” update your imports (`import cc_kernel` β†’ `import kernel`, `from cc_mcp.client import …` β†’ `from mcp_client.client import …`). Whole-word rename so unrelated tokens are untouched; two name-collision regressions caught and fixed; full suite green (**2449 passed, 3 skipped**). Details: [docs/news.md](docs/news.md). +- June 6, 2026 (**v3.5.82**): **macOS install reliably puts `cheetahclaws` on PATH, and local Ollama models that emit tool calls as text now actually execute them** (two fixes from issue #131). **(1) Install/PATH on macOS:** the installer `source`s the dedicated venv it creates, which made the post-install `command -v cheetahclaws` check succeed *inside the script's own shell* β€” so it reported "on PATH" and **skipped the entire rc-file step**, leaving `~/.zshrc` untouched and the binary unreachable in new terminals. It now symlinks only the `cheetahclaws` entry point into `~/.local/bin` (pipx-style, so the venv's `python`/`pip` don't shadow yours), creates `~/.zshrc` / `.bash_profile` if missing, and appends `~/.local/bin` to PATH there β€” without trusting the venv-polluted `command -v` (`scripts/install.sh`). **(2) Ollama tool calls:** `stream_ollama` only read Ollama's structured `message.tool_calls` field, while the cloud path already recovers calls a model emits as **text**, so Qwen-coder / Gemma / Mistral over Ollama produced "tool-calling-style chat" that streamed as plain text and never ran β€” the model seemed to "just keep talking." `stream_ollama` now mirrors the cloud path's interceptor: it buffers from the first `` / `<|tool_call|>` / `[TOOL_CALLS]` marker (so raw markup never reaches the user) and parses it into real tool calls at end-of-stream (`providers.py`). Details: [docs/guides/usage.md](docs/guides/usage.md#usage-open-source-models-local) Β· [docs/guides/faq.md](docs/guides/faq.md) Β· [docs/news.md](docs/news.md). - June 5, 2026 (**v3.5.82**): **User-controllable token/cost budgets** β€” `/budget $5` / `/budget 200k` / `/budget daily $20` cap spend per session or per day, enforced before each model call; on hit the session auto-saves and you're shown how to `/resume` or raise the cap and continue (warns at β‰₯80%/95%; `--budget` sets it at startup). Details: [docs/guides/features.md](docs/guides/features.md) Β· [docs/news.md](docs/news.md). - June 5, 2026: **Adaptive Markdown streaming β€” live output stays correct on every device** by auto-selecting a per-device tier (`live` in-place redraw on capable terminals incl. modern SSH emulators, append-only `commit` for SSH/Apple Terminal/pipes/CJK text so frames never duplicate, `plain` fallback); also ships a visual `/context` usage grid and a 1M context window for `deepseek-v4-flash`. Details: [docs/guides/features.md](docs/guides/features.md) Β· [docs/news.md](docs/news.md). - June 4, 2026 (**v3.05.81**): **Claude-Code-style quiet output** hides per-tool execution and shows one summary line per turn (on by default), with a live spinner timer + token estimate and a `✻ Worked for…` footer; `/verbose` overrides, toggle with `/quiet`. Details: [docs/guides/features.md](docs/guides/features.md) Β· [docs/news.md](docs/news.md). diff --git a/docs/news.md b/docs/news.md index a685046..7328f27 100644 --- a/docs/news.md +++ b/docs/news.md @@ -3,7 +3,8 @@ ## πŸ”₯πŸ”₯πŸ”₯ News (Pacific Time) -- June 6, 2026 (**v3.5.82**) (latest): **macOS install reliably puts `cheetahclaws` on PATH, and local Ollama models that emit tool calls as text now actually execute them.** Two fixes reported in issue #131. **(1) Install / PATH on macOS.** On macOS the installer creates a dedicated venv (`~/.cheetahclaws-venv`) and `source`s it, so the post-install verification `if command -v cheetahclaws` succeeded *inside the script's own activated shell* β€” it printed "cheetahclaws is on PATH" and **short-circuited past the entire rc-file block**, including the `touch ~/.zshrc` that was supposed to create the file. Result: `~/.zshrc` was never created/updated, and in a fresh terminal (no venv active) the binary was unreachable, so users had to hunt for the install location by hand. The verification step no longer trusts the venv-polluted `command -v`: it confirms the binary at the expected `BIN_DIR`, then (for venv installs) **symlinks only the `cheetahclaws` entry point into `~/.local/bin`** β€” pipx-style, so the venv's `python`/`pip` never get prepended to PATH and can't shadow the user's own β€” creates the right rc file if missing (`~/.zshrc` for zsh, `~/.bash_profile` for bash on macOS, `config.fish` for fish), and appends the exposure dir to PATH there. The fish branch now also writes fish (`set -gx PATH …`) syntax instead of `export`, and the reload hint points bash-on-macOS at `.bash_profile` (`scripts/install.sh`). **(2) Ollama tool calls (the "model just keeps talking" bug).** The Ollama streaming path (`stream_ollama`) only read tool calls from Ollama's structured `message.tool_calls` field, whereas the OpenAI-compatible cloud path (`stream_openai_compat`) *also* recovers tool calls a model emits as **text** via `_find_native_tool_marker` + `_extract_native_tool_calls`. Many local models β€” Qwen-coder, Gemma, Mistral β€” emit calls as `{…}` / `<|tool_call|>…` / `[TOOL_CALLS][…]` inside `content`; on the Ollama path that markup was streamed straight to the screen as chat and never executed, so the agent loop saw no tool calls and ended the turn β€” exactly the reported "tool-calling-style chat that never runs." `stream_ollama` now mirrors the cloud path: when a native marker appears in the streamed content it **buffers from that point** (so the user never sees raw markup), and at end-of-stream parses the buffer into real tool calls (falling back to surfacing the buffered text if parsing fails, so nothing is silently swallowed). Note: Ollama's native `/api/chat` does not accept a `tool_choice` parameter, so the fix is the text-format recovery, not a request-param change. Existing provider + cache-token suites stay green. See [docs/guides/usage.md](guides/usage.md#usage-open-source-models-local) Β· [docs/guides/faq.md](guides/faq.md). +- June 16, 2026 (latest): **Internal modules lose the `cc_` prefix.** The four `cc_`-prefixed modules are renamed for readability: `cc_config.py β†’ config.py`, `cc_daemon/ β†’ daemon/`, `cc_kernel/ β†’ kernel/`, and `cc_mcp/ β†’ mcp_client/`. The MCP client is deliberately `mcp_client/` rather than bare `mcp/` β€” a top-level `mcp` would shadow Python's namespace and the official `modelcontextprotocol` package, an easy-to-introduce, hard-to-debug import shadow. References were updated across **every `.py` source**, `docs/` + all RFCs, `README`, `CONTRIBUTING`, `pyproject.toml` (both `py-modules` and `packages.find`), and `.github/workflows/ci.yml`, using **whole-word matching** so lookalike tokens were left untouched β€” `cc_bin` / `cc_script` (web server locals), `cv_acc_mean` / `acc_clean` (trading-model fields), `cc_tool_call_debug` (a `/tmp` log path), and the `cc_test_` / `cc_subs_` `tempfile` prefixes. The 18 `tests/test_cc_daemon_*.py` files become `test_daemon_*.py`. **Two name-collision regressions** surfaced where the now-generic module name clashed with a same-scope local variable, both fixed: (1) `tests/test_setup_wizard.py` ran `import config`, which shadowed the wizard helper's `config` **dict param** β€” the module object then reached `run_setup_wizard` and raised `TypeError: 'module' object does not support item assignment`; the module is now imported as `_config_mod`. (2) `examples/kernel_e2e_smoke.py` both self-shadowed via `with kernel.Kernel.open(...) as kernel` (an `UnboundLocalError` because the `as` binding made `kernel` local for the whole line) **and** collided inside `_run_demo`, whose `kernel` **instance** param sat next to module-level `kernel.ScheduleSpec` / `kernel.SandboxPolicy` (an `AttributeError` once the param won) β€” the context var is now bound as `kern` and the two classes are imported at module level. An AST sweep over the whole tree confirmed no other module-name/local-variable shadowing remains. **This is breaking only for code that imports these modules directly** (`import cc_kernel` β†’ `import kernel`; `from cc_mcp.client import get_mcp_manager` β†’ `from mcp_client.client import get_mcp_manager`); the `cheetahclaws` CLI, the Web UI, and all bridges are unaffected. Full suite: **2449 passed, 3 skipped, 0 failed**. +- June 6, 2026 (**v3.5.82**): **macOS install reliably puts `cheetahclaws` on PATH, and local Ollama models that emit tool calls as text now actually execute them.** Two fixes reported in issue #131. **(1) Install / PATH on macOS.** On macOS the installer creates a dedicated venv (`~/.cheetahclaws-venv`) and `source`s it, so the post-install verification `if command -v cheetahclaws` succeeded *inside the script's own activated shell* β€” it printed "cheetahclaws is on PATH" and **short-circuited past the entire rc-file block**, including the `touch ~/.zshrc` that was supposed to create the file. Result: `~/.zshrc` was never created/updated, and in a fresh terminal (no venv active) the binary was unreachable, so users had to hunt for the install location by hand. The verification step no longer trusts the venv-polluted `command -v`: it confirms the binary at the expected `BIN_DIR`, then (for venv installs) **symlinks only the `cheetahclaws` entry point into `~/.local/bin`** β€” pipx-style, so the venv's `python`/`pip` never get prepended to PATH and can't shadow the user's own β€” creates the right rc file if missing (`~/.zshrc` for zsh, `~/.bash_profile` for bash on macOS, `config.fish` for fish), and appends the exposure dir to PATH there. The fish branch now also writes fish (`set -gx PATH …`) syntax instead of `export`, and the reload hint points bash-on-macOS at `.bash_profile` (`scripts/install.sh`). **(2) Ollama tool calls (the "model just keeps talking" bug).** The Ollama streaming path (`stream_ollama`) only read tool calls from Ollama's structured `message.tool_calls` field, whereas the OpenAI-compatible cloud path (`stream_openai_compat`) *also* recovers tool calls a model emits as **text** via `_find_native_tool_marker` + `_extract_native_tool_calls`. Many local models β€” Qwen-coder, Gemma, Mistral β€” emit calls as `{…}` / `<|tool_call|>…` / `[TOOL_CALLS][…]` inside `content`; on the Ollama path that markup was streamed straight to the screen as chat and never executed, so the agent loop saw no tool calls and ended the turn β€” exactly the reported "tool-calling-style chat that never runs." `stream_ollama` now mirrors the cloud path: when a native marker appears in the streamed content it **buffers from that point** (so the user never sees raw markup), and at end-of-stream parses the buffer into real tool calls (falling back to surfacing the buffered text if parsing fails, so nothing is silently swallowed). Note: Ollama's native `/api/chat` does not accept a `tool_choice` parameter, so the fix is the text-format recovery, not a request-param change. Existing provider + cache-token suites stay green. See [docs/guides/usage.md](guides/usage.md#usage-open-source-models-local) Β· [docs/guides/faq.md](guides/faq.md). - June 5, 2026 (**v3.5.82**): **User-controllable token / cost budgets β€” set a spend cap; on hit the session auto-saves and you can resume or raise it.** The quota engine (`quota.py`: per-session + per-day token/cost counters, enforced before each model call) already existed but had no friendly surface β€” you had to know four config keys (`session_token_budget` / `session_cost_budget` / `daily_token_budget` / `daily_cost_budget`) and there was no way to see how close you were, no warning before the wall, and the hard stop printed a bare `[Quota exceeded]`. This adds the UX layer on top of the unchanged engine: a **`/budget`** command β€” no args shows usage vs every budget as colored bars + percentages; **`/budget $5`** sets a session **cost** cap (the `$` means USD), **`/budget 200k`** a session **token** cap (parses `200k` / `1.5m` / `200000`), **`/budget daily $20`** / **`/budget daily 2m`** the daily caps, and **`/budget clear`** removes all. A **`--budget $5`** / **`--budget 200k`** startup flag sets the session cap at launch. **Proximity warnings** fire at the end of any turn that crosses **β‰₯80%** (yellow) / **β‰₯95%** (red) of a cap, so the wall never arrives by surprise. **On hit** the agent now yields a `QuotaPause` event (instead of a plain text line): the REPL **auto-saves the session** (`session_latest.json` + daily backup, the same path `/resume` reads) and prints a friendly next-steps block β€” raise the **same** cap or remove it (`/budget clear`) then resend, or restart later and `/resume`. So a long task that runs out of budget is never lost: you analyze, adjust, and continue. **Tight enforcement (no surprise overshoot):** the check projects the next request's *input* (`compaction.estimate_tokens`) and stops *before* the call if it would cross the cap, and clamps that call's `max_tokens` to the remaining headroom (`quota.output_room`) β€” so a single tool-heavy turn can't blow 40kβ†’49k past the budget the way a pure "already-spent β‰₯ limit" check let it. **One budget per scope:** setting a cap *replaces* the other unit for that scope (`/budget $5` after `/budget 200k` switches the session cap to cost rather than stacking), so a leftover token cap can't silently keep blocking after you switch to a `$` cap. **Unit-matched hint:** `QuotaExceeded` / `QuotaPause` carry which cap broke (`key`/`scope`/`unit`/`limit`), so the "raise it" suggestion is in the *right* unit β€” a token cap shows `/budget 40k`, a daily cost cap shows `/budget daily $40` β€” instead of a generic `$` amount that wouldn't lift a token cap. New helpers `quota.parse_budget` / `fmt_amount` / `usage_vs_limits` / `warnings` / `output_room`; command in `commands/core.py:cmd_budget`; `QuotaPause` in `agent.py`; REPL handling + `--budget` in `cheetahclaws.py`; 42-case `tests/test_budget.py` (isolated quota dir, incl. a regression that the hint matches the breached unit and that switching units clears the stale cap). The daemon's conservative `serve`-mode defaults (200k tok / $2 per session, 2M / $20 per day) are unchanged β€” interactive stays unlimited by default, the server stays guard-railed. See [docs/guides/features.md](guides/features.md) Β· [docs/guides/reference.md](guides/reference.md). - June 5, 2026 (**v3.5.82**): **Adaptive Markdown streaming β€” live output that stays correct on every device.** In-place Rich Live redraw is great on capable terminals but breaks elsewhere: it was disabled wholesale over SSH (so SSH users got raw tokens with no formatting), and where it *did* run it could leave **duplicate or stale frames** β€” on macOS Terminal (which can't erase above the scroll boundary), over laggy network PTYs, or with **wide CJK / emoji text** whose display width a naive line-count gets wrong. The renderer now selects a **streaming tier per device** in `ui.render.auto_stream_mode(config)`: **`live`** β€” full in-place redraw, only on terminals known to handle cursor-up (local TTYs, and modern emulators *even over SSH*: iTerm2, WezTerm, Windows Terminal, VSCode, kitty, Alacritty, Ghostty, detected via `TERM_PROGRAM` / `TERM` / `WT_SESSION` / `KITTY_WINDOW_ID` / `ALACRITTY_WINDOW_ID` / `WEZTERM_PANE`); **`commit`** β€” **append-only progressive Markdown**, the safe default for unknown-SSH / Apple Terminal / pipes / non-TTY, where each completed block (split on blank lines, respecting open code fences so a fenced block renders atomically) is rendered and printed **permanently** and the cursor is **never moved**, making a duplicate frame structurally impossible regardless of terminal, latency, or character width; **`plain`** β€” raw tokens, only when `rich` is unavailable. The append-only floor is provably duplication-free; `live` is progressive enhancement on top. Override with **`/config stream_mode=live|commit|plain`** (legacy boolean **`/config rich_live=true|false`** still works β†’ `live`/`commit`). Implemented in `ui/render.py` (`set_stream_mode` / `auto_stream_mode` / `_safe_commit_point` / `_commit_stream` / `_commit_flush`), wired in at REPL start in `cheetahclaws.py`, with a 26-case test suite in `tests/test_stream_modes.py` (device routing, code-fence-aware block boundaries, append-only commit, and a regression asserting commit mode emits **zero** cursor sequences even on a TTY with CJK text). Two related UX items shipped alongside: **`/context` is now a visual grid** β€” a Claude-Code-style 20Γ—10 cell grid of context-window usage, colored and broken down by category (system prompt / system tools / memory files / skills / messages / free space) with per-category token counts and percentages, adapting to the model's real context window and falling back to `#`/`.` on non-UTF-8 terminals (`commands/core.py:cmd_context`); and **`deepseek-v4-flash` is registered at its 1M context window** in `providers._MODEL_CONTEXT_LIMITS` (overriding the 128K deepseek provider default, which still applies to `deepseek-chat` / `deepseek-v4-pro`), so the prompt `%`, `/context`, and the compaction trigger all reflect the true 1M window. See [docs/guides/features.md](guides/features.md) Β· [docs/guides/reference.md](guides/reference.md). - June 4, 2026 (**v3.05.81**): **Claude-Code-style quiet output β€” hide tool execution, show one summary line per turn.** Long analysis turns used to scroll the terminal with a `βš™ Bash(...)` line and a `βœ“ β†’ N lines (… chars)` line for *every* tool call, and the permission prompt dumped the entire inline script (e.g. a 60-line `python3 << 'PYEOF'` heredoc). A new **quiet mode (on by default)** suppresses the per-tool lines β€” the spinner conveys live activity and a single summary line is emitted at the toolβ†’text boundary, sitting just above the reply (`Read 2 files, ran 3 shell commands`), the way Claude Code does. Errors and denials still surface so a mid-turn failure is never silent. In quiet mode the **permission prompt also collapses** a multi-line command to one line (`Run: python3 << 'PYEOF' … (+59 葌)`) instead of printing the whole script. `/verbose` overrides quiet (full per-tool lines + inputs + token counts); toggle with **`/quiet`**, or launch with **`--show-tools`** (alias `--no-quiet`). The startup banner gains an **`Output: quiet` / `Output: full`** line so the active mode is visible at a glance. **Live status line:** the spinner now shows elapsed time plus a running output-token estimate (`Thinking… (7s Β· ↓ 435 tokens)`) β€” char-based, since providers only report real usage at the end β€” and each quiet turn closes with a real-usage footer **`✻ Worked for 7.2s Β· ↑ 1.2k Β· ↓ 435`** built from the true `TurnDone` counts. Implemented in `ui/render.py` (turn-level tool accumulator + `turn_summary_line()`, spinner token meter, `print_turn_stats()`), wired through the REPL event loop in `cheetahclaws.py`, with the `/quiet` toggle in `commands/config_cmd.py`. See [docs/guides/features.md](guides/features.md).