Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,10 @@ Other install methods: [pip install](#alternative-install-with-pip) | [uv instal

## 🔥🔥🔥 News (Pacific Time)

- June 4, 2026 (latest, **v3.05.81**): **Claude-Code-style quiet output — hide tool execution, show one summary line per turn.** Quiet mode (on by default) suppresses the per-tool `⚙ Tool(...)` / `✓ → N lines` clutter; the spinner shows live activity and a single line (`Read 2 files, ran 3 shell commands`) is emitted just above the reply. The permission prompt also collapses multi-line commands to one line. Errors still surface. The spinner shows a live timer + running token estimate (`Thinking… (7s · ↓ 435 tokens)`) and each turn closes with a real-usage footer (`✻ Worked for 7.2s · ↑ 1.2k · ↓ 435`). `/verbose` overrides it; toggle with `/quiet` or `--show-tools`; the banner shows `Output: quiet/full`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
- June 4, 2026: **Context-window override — the prompt % and compaction follow a settable context length.** New `/config context_window=<N>` overrides the model's context window (`0` = default), distinct from `max_tokens` (the output cap). One value drives the prompt `%`, `/context`, the compaction trigger, and the per-call output cap consistently — read live, so switching model or window updates the `%` with no restart. Details: [docs/guides/reference.md](docs/guides/reference.md) · [docs/news.md](docs/news.md).
- June 4, 2026: **Rich Live streaming — long responses stay live via a bounded tail window.** Long responses that would overflow the terminal keep rendering live but show only the most recent screenful (a bounded tail window), committing the full output when done — fixing the duplicate/stale frames some terminals left behind. Builds on PR #133. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
- June 5, 2026 (latest, **v3.05.82**): **Adaptive Markdown streaming — live output stays correct on every device** by auto-selecting a per-device tier (`live` in-place redraw on capable terminals incl. modern SSH emulators, append-only `commit` for SSH/Apple Terminal/pipes/CJK text so frames never duplicate, `plain` fallback); also ships a visual `/context` usage grid and a 1M context window for `deepseek-v4-flash`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
- June 4, 2026 (**v3.05.81**): **Claude-Code-style quiet output** hides per-tool execution and shows one summary line per turn (on by default), with a live spinner timer + token estimate and a `✻ Worked for…` footer; `/verbose` overrides, toggle with `/quiet`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
- June 4, 2026: **Context-window override** — `/config context_window=<N>` sets the context length that drives the prompt `%`, `/context`, the compaction trigger, and the output cap consistently (distinct from `max_tokens`; read live, no restart). Details: [docs/guides/reference.md](docs/guides/reference.md) · [docs/news.md](docs/news.md).
- June 4, 2026: **Rich Live streaming** keeps long responses live via a bounded tail window — redrawing only the most recent screenful and committing the full output when done, fixing duplicate/stale frames (builds on PR #133). Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
- May 31, 2026: **QQ bot bridge — `/qq` connects cheetahclaws to QQ groups + C2C private chats via the official `qq-botpy` SDK (PR #121).** Details: [docs/guides/bridges.md](docs/guides/bridges.md#qq-bridge) · [docs/news.md](docs/news.md).
- May 12, 2026: **Security hardening sweep — env-var bot tokens, web CSRF cookie, terminal session owner-binding, and plugin/MCP/filesystem sandboxing (two CRITICAL + HIGH rounds, 2347 tests green).** Details: [docs/guides/security.md](docs/guides/security.md) · [docs/news.md](docs/news.md).
- May 12, 2026: **Daemon foundation roadmap — all nine F-1…F-9 items landed: subprocess agent runners, on-crash restart policy, daemonized Telegram/Slack/WeChat bridges, and budget guardrails.** Details: [docs/news.md](docs/news.md).
Expand All @@ -52,7 +53,7 @@ For more news, see [here](docs/news.md).

# CheetahClaws

CheetahClaws: **A Lightweight** and **Easy-to-Use** Python native Agent Harness Infrastructure, **Supporting Any Model**, such as Claude, GPT, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, and local open-source models via Ollama or any OpenAI-compatible endpoint.
CheetahClaws: **A Fast** and **Easy-to-Use** Python native Agent Harness Infrastructure, **Supporting Any Model**, such as Claude, GPT, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, and local open-source models via Ollama or any OpenAI-compatible endpoint.

---

Expand Down Expand Up @@ -537,6 +538,7 @@ If you find the repository useful, please cite the study
<a href="https://github.com/tsint"><img src="https://avatars.githubusercontent.com/u/63944253?v=4&s=48" width="48" height="48" alt="tsint"/></a>
<a href="https://github.com/albertcheng"><img src="https://avatars.githubusercontent.com/u/2686135?v=4&s=48" width="48" height="48" alt="albertcheng"/></a>
<a href="https://github.com/LostAion"><img src="https://avatars.githubusercontent.com/u/84846068?v=4&s=48" width="48" height="48" alt="LostAion"/></a>
<a href="https://github.com/lucaszhu-hue"><img src="https://avatars.githubusercontent.com/u/278269343?v=4&s=48" width="48" height="48" alt="lucaszhu-hue"/></a>
<a href="https://github.com/skint007"><img src="https://avatars.githubusercontent.com/u/37035851?v=4&s=48" width="48" height="48" alt="skint007"/></a>
<a href="https://github.com/thekbbohara"><img src="https://avatars.githubusercontent.com/u/133592644?v=4&s=48" width="48" height="48" alt="thekbbohara"/></a>

Expand Down
35 changes: 20 additions & 15 deletions cheetahclaws.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ def __getattr__(self, name):
render_diff, _has_diff,
stream_text, stream_thinking, flush_response,
_start_tool_spinner, _stop_tool_spinner, _change_spinner_phrase,
set_spinner_phrase, set_rich_live, set_spinner_tips,
set_spinner_phrase, set_rich_live, set_stream_mode, auto_stream_mode, set_spinner_tips,
print_tool_start, print_tool_end,
set_quiet, reset_turn_stats, print_turn_summary,
set_spinner_tokens, print_turn_stats,
Expand Down Expand Up @@ -613,7 +613,7 @@ def handle_slash(line: str, state, config) -> Union[bool, tuple]:
"load": ("Load a saved session", []),
"history": ("Show conversation history", []),
"search": ("Search past sessions", []),
"context": ("Show token-context usage", []),
"context": ("Visualize context-window usage by category", []),
"cost": ("Show cost estimate", []),
"verbose": ("Toggle verbose output", []),
"quiet": ("Toggle compact tool display", []),
Expand Down Expand Up @@ -974,6 +974,9 @@ def repl(config: dict, initial_prompt: str = None):
except Exception:
pass

# Blank line so the logo sits a little farther below the shell prompt.
print()

# Print logo - warm cheetah-gold vertical gradient, plain if color is off.
if C["reset"]:
for _ln, _hex in zip(_CHEETAH_LOGO, _CHEETAH_GRADIENT):
Expand Down Expand Up @@ -1054,23 +1057,25 @@ def _row(colored: str, plain: str) -> str:

query_lock = threading.RLock()

# Apply rich_live config: disable in-place Live streaming if terminal has issues.
# Auto-detect environments where ANSI cursor-up / live-rewrite doesn't work:
# - SSH sessions (cursor-up fails across network PTY)
# - Dumb terminals (no ANSI support)
# - macOS Terminal.app (can't erase above scroll boundary → duplicated output)
# - Screen/tmux over SSH
import os as _os, platform as _plat
_in_ssh = bool(_os.environ.get("SSH_CLIENT") or _os.environ.get("SSH_TTY"))
_is_dumb = (console is not None and getattr(console, "is_dumb_terminal", False))
_is_macos_terminal = (_plat.system() == "Darwin"
and _os.environ.get("TERM_PROGRAM", "") in ("Apple_Terminal", ""))
_rich_live_default = not _in_ssh and not _is_dumb and not _is_macos_terminal
set_rich_live(config.get("rich_live", _rich_live_default))
# Pick the streaming tier for this device (see ui.render.auto_stream_mode):
# "live" — full in-place Rich redraw (capable terminals, incl. modern
# emulators over SSH like iTerm2 / WezTerm / Windows Terminal /
# VSCode / kitty / Alacritty / Ghostty).
# "commit" — append-only progressive Markdown for terminals where cursor-up
# redraw is unsafe (Apple Terminal, unknown SSH PTYs, pipes). Still
# renders rich formatting block-by-block — a big upgrade over the
# old raw-token fallback.
# "plain" — only when Rich is unavailable.
# An explicit `stream_mode` or legacy `rich_live` config value overrides detection.
set_stream_mode(auto_stream_mode(config))

# Apply spinner_tips config: rotating Claude-Code-style tips beneath the
# spinner. Disabled automatically where multi-line cursor moves misbehave
# (dumb terminals, macOS Terminal.app) so the tip line never garbles output.
import os as _os, platform as _plat
_is_dumb = (console is not None and getattr(console, "is_dumb_terminal", False))
_is_macos_terminal = (_plat.system() == "Darwin"
and _os.environ.get("TERM_PROGRAM", "") in ("Apple_Terminal", ""))
_spinner_tips_default = not _is_dumb and not _is_macos_terminal
set_spinner_tips(config.get("spinner_tips", _spinner_tips_default))

Expand Down
129 changes: 123 additions & 6 deletions commands/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,13 +83,130 @@ def cmd_clear(_args: str, state, config) -> bool:
return True


def _fmt_tokens(n: int) -> str:
"""Compact human token count: 1m / 200k / 21.2k / 540."""
n = int(n)
if n >= 1_000_000:
s = f"{n / 1_000_000:.1f}m"
return s.replace(".0m", "m")
if n >= 1_000:
s = f"{n / 1_000:.1f}k"
return s.replace(".0k", "k")
return str(n)


def cmd_context(_args: str, state, config) -> bool:
msg_chars = sum(len(str(m.get("content", ""))) for m in state.messages)
est_tokens = msg_chars // 4
info(f"Messages: {len(state.messages)}")
info(f"Estimated tokens: ~{est_tokens:,}")
info(f"Model: {config['model']}")
info(f"Max tokens: {config['max_tokens']:,}")
"""Visual breakdown of context-window usage by category (Claude-Code style).

Renders a 20×10 cell grid where each cell represents an equal slice of the
model's context window, coloured per category, followed by a legend showing
the estimated token cost and percentage of each component.
"""
import sys as _sys
from compaction import estimate_tokens, get_context_limit
from providers import detect_provider

model = config.get("model", "unknown")
provider = detect_provider(model) if model else ""
ctx_limit = get_context_limit(model, config) or 0

def _est(text: str) -> int:
return estimate_tokens([{"role": "system", "content": text}]) if text else 0

# ── Measure each in-context component ───────────────────────────────────
# System prompt = base + env + live command index (everything
# build_system_prompt injects EXCEPT memory, which we break out below to
# mirror Claude Code's category split).
sys_tokens = 0
try:
import context as _ctx
from prompts import pick_base_prompt
base = pick_base_prompt(provider, model) if model else pick_base_prompt()
sys_tokens = (_est(base)
+ _est(_ctx._render_env_block(config))
+ _est(_ctx._render_commands_block()))
except Exception:
try:
import context as _ctx
sys_tokens = _est(_ctx.build_system_prompt(config))
except Exception:
sys_tokens = 0

mem_tokens = 0
try:
from memory import get_memory_context
mem_tokens = _est(get_memory_context())
except Exception:
mem_tokens = 0

tool_tokens = 0
try:
from tool_registry import get_tool_schemas
tool_tokens = _est(json.dumps(get_tool_schemas()))
except Exception:
tool_tokens = 0

skill_tokens = 0
try:
from skill import load_skills
blob = "\n".join(
f"{s.name}: {s.description} {' '.join(getattr(s, 'triggers', []) or [])}"
for s in load_skills()
)
skill_tokens = _est(blob)
except Exception:
skill_tokens = 0

msg_tokens = estimate_tokens(getattr(state, "messages", []))
msg_count = len(getattr(state, "messages", []))

cats = [
("System prompt", sys_tokens, "cyan"),
("System tools", tool_tokens, "blue"),
("Memory files", mem_tokens, "magenta"),
("Skills", skill_tokens, "yellow"),
("Messages", msg_tokens, "green"),
]
used = sum(t for _, t, _ in cats)
free = max(0, ctx_limit - used) if ctx_limit else 0

# ── Build the cell grid ─────────────────────────────────────────────────
utf8 = "utf" in (getattr(_sys.stdout, "encoding", "") or "").lower()
FULL, EMPTY = ("⛁", "⛶") if utf8 else ("#", ".")
COLS, ROWS = 20, 10
total_cells = COLS * ROWS
per_cell = (ctx_limit / total_cells) if ctx_limit else 0

cells: list[tuple[str, str]] = []
if per_cell:
for _name, tok, color in cats:
n = int(round(tok / per_cell))
cells.extend([(FULL, color)] * n)
cells = cells[:total_cells]
cells.extend([(EMPTY, "dim")] * (total_cells - len(cells)))

# ── Render ──────────────────────────────────────────────────────────────
print(clr(" Context Usage", "bold"))
for r in range(ROWS):
row = cells[r * COLS:(r + 1) * COLS]
print(" " + " ".join(clr(g, c) for g, c in row))

pct = (used / ctx_limit * 100) if ctx_limit else 0
print()
print(f" {clr(model, 'bold')}" + (f" · {provider}" if provider else ""))
if ctx_limit:
print(f" {_fmt_tokens(used)}/{_fmt_tokens(ctx_limit)} tokens ({pct:.1f}%)")
else:
print(f" {_fmt_tokens(used)} tokens (context limit unknown)")

print()
print(clr(" Estimated usage by category", "dim"))
for name, tok, color in cats:
p = (tok / ctx_limit * 100) if ctx_limit else 0
print(f" {clr(FULL, color)} {name + ':':<15} {_fmt_tokens(tok):>7} tokens ({p:.1f}%)"
+ (f" [{msg_count} msgs]" if name == "Messages" else ""))
fp = (free / ctx_limit * 100) if ctx_limit else 0
print(f" {clr(EMPTY, 'dim')} {'Free space:':<15} {_fmt_tokens(free):>7} tokens ({fp:.1f}%)")
return True


Expand Down
2 changes: 1 addition & 1 deletion docs/guides/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ and indexed in the [README Documentation section](../../README.md#documentation)
| Shell escape | Type `!command` in the REPL to execute any shell command directly without AI involvement (`!git status`, `!ls`, `!python --version`). Output prints inline. |
| Proactive monitoring | `/proactive [duration]` starts a background sentinel daemon; agent wakes automatically after inactivity, enabling continuous monitoring loops without user prompts |
| Force quit | 3× Ctrl+C within 2 seconds triggers `os._exit(1)` — kills the process immediately regardless of blocking I/O |
| Rich Live streaming | When `rich` is installed, responses render as live-updating Markdown in place. Long responses that would overflow the terminal keep rendering live but show only the most recent screenful (a bounded tail window, terminal-height / wrap aware); the complete output is committed when the response finishes, preventing the duplicate or stale frames some terminals leave behind. Plain streaming is used only as a fallback. Auto-disabled in SSH sessions; override with `/config rich_live=false`. |
| Adaptive Markdown streaming | Responses render as live Markdown, with the rendering tier **auto-selected per device** so streaming stays correct everywhere — no duplicated/stale frames over SSH, on macOS Terminal, or with wide (CJK / emoji) text. Three tiers: **`live`** — full in-place Rich redraw, used on terminals known to handle cursor-up reliably (local TTYs and modern emulators incl. over SSH: iTerm2, WezTerm, Windows Terminal, VSCode, kitty, Alacritty, Ghostty); **`commit`** — append-only progressive Markdown, the safe default for SSH/Apple Terminal/pipes, where each completed block renders and prints **permanently** (zero cursor movement, so it can never leave a duplicate frame) while the in-progress block appears when it closes; **`plain`** — raw tokens, only when `rich` is unavailable. Detection lives in `ui.render.auto_stream_mode`; override with `/config stream_mode=live\|commit\|plain` (legacy `/config rich_live=true\|false` still works). |
| Spinner tips | While the model works, the spinner shows an elapsed timer plus a rotating Claude-Code-style "Tip:" line surfacing handy commands (`/compact`, `/checkpoint`, `/research`, …). Auto-disabled on dumb / macOS Terminal where multi-line cursor moves misbehave; toggle with `/config spinner_tips=false`. |
| Compact tool display | Claude-Code-style quiet output (on by default). Instead of printing a `⚙ Tool(...)` line and a `✓ → N lines` line for every tool call, the per-tool execution is hidden — the spinner conveys live activity and one summary line is emitted at the tool→text boundary (`Read 2 files, ran 3 shell commands`), sitting just above the reply. Errors and denials still surface so a mid-turn failure is never silent. In quiet mode the permission prompt also collapses a multi-line command to one line (`Run: python3 << 'PYEOF' … (+59 行)`) instead of dumping the whole script. `/verbose` overrides it (full per-tool lines + inputs + token counts); toggle with `/quiet` or launch with `--show-tools` (alias `--no-quiet`). The banner shows the active mode as `Output: quiet` / `Output: full`. **Live status:** while the model works, the spinner shows elapsed time plus a running token estimate (`Thinking… (7s · ↓ 435 tokens)`, char-based since providers only report real usage at the end); each quiet turn then closes with a real-usage footer — `✻ Worked for 7.2s · ↑ 1.2k · ↓ 435` — using the true counts from `TurnDone`. |
| Context injection | Auto-loads `CLAUDE.md`, git status, cwd, persistent memory |
Expand Down
Loading
Loading