SafeRL-Lab · chauncygu · Jun 5, 2026 · Jun 5, 2026
diff --git a/README.md b/README.md
@@ -39,9 +39,10 @@ Other install methods: [pip install](#alternative-install-with-pip) | [uv instal
 
 ## 🔥🔥🔥 News (Pacific Time)
 
-- June 4, 2026 (latest, **v3.05.81**): **Claude-Code-style quiet output — hide tool execution, show one summary line per turn.** Quiet mode (on by default) suppresses the per-tool `⚙ Tool(...)` / `✓ → N lines` clutter; the spinner shows live activity and a single line (`Read 2 files, ran 3 shell commands`) is emitted just above the reply. The permission prompt also collapses multi-line commands to one line. Errors still surface. The spinner shows a live timer + running token estimate (`Thinking… (7s · ↓ 435 tokens)`) and each turn closes with a real-usage footer (`✻ Worked for 7.2s · ↑ 1.2k · ↓ 435`). `/verbose` overrides it; toggle with `/quiet` or `--show-tools`; the banner shows `Output: quiet/full`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
-- June 4, 2026: **Context-window override — the prompt % and compaction follow a settable context length.** New `/config context_window=<N>` overrides the model's context window (`0` = default), distinct from `max_tokens` (the output cap). One value drives the prompt `%`, `/context`, the compaction trigger, and the per-call output cap consistently — read live, so switching model or window updates the `%` with no restart. Details: [docs/guides/reference.md](docs/guides/reference.md) · [docs/news.md](docs/news.md).
-- June 4, 2026: **Rich Live streaming — long responses stay live via a bounded tail window.** Long responses that would overflow the terminal keep rendering live but show only the most recent screenful (a bounded tail window), committing the full output when done — fixing the duplicate/stale frames some terminals left behind. Builds on PR #133. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
+- June 5, 2026 (latest, **v3.05.82**): **Adaptive Markdown streaming — live output stays correct on every device** by auto-selecting a per-device tier (`live` in-place redraw on capable terminals incl. modern SSH emulators, append-only `commit` for SSH/Apple Terminal/pipes/CJK text so frames never duplicate, `plain` fallback); also ships a visual `/context` usage grid and a 1M context window for `deepseek-v4-flash`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
+- June 4, 2026 (**v3.05.81**): **Claude-Code-style quiet output** hides per-tool execution and shows one summary line per turn (on by default), with a live spinner timer + token estimate and a `✻ Worked for…` footer; `/verbose` overrides, toggle with `/quiet`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
+- June 4, 2026: **Context-window override** — `/config context_window=<N>` sets the context length that drives the prompt `%`, `/context`, the compaction trigger, and the output cap consistently (distinct from `max_tokens`; read live, no restart). Details: [docs/guides/reference.md](docs/guides/reference.md) · [docs/news.md](docs/news.md).
+- June 4, 2026: **Rich Live streaming** keeps long responses live via a bounded tail window — redrawing only the most recent screenful and committing the full output when done, fixing duplicate/stale frames (builds on PR #133). Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
 - May 31, 2026: **QQ bot bridge — `/qq` connects cheetahclaws to QQ groups + C2C private chats via the official `qq-botpy` SDK (PR #121).** Details: [docs/guides/bridges.md](docs/guides/bridges.md#qq-bridge) · [docs/news.md](docs/news.md).
 - May 12, 2026: **Security hardening sweep — env-var bot tokens, web CSRF cookie, terminal session owner-binding, and plugin/MCP/filesystem sandboxing (two CRITICAL + HIGH rounds, 2347 tests green).** Details: [docs/guides/security.md](docs/guides/security.md) · [docs/news.md](docs/news.md).
 - May 12, 2026: **Daemon foundation roadmap — all nine F-1…F-9 items landed: subprocess agent runners, on-crash restart policy, daemonized Telegram/Slack/WeChat bridges, and budget guardrails.** Details: [docs/news.md](docs/news.md).
@@ -52,7 +53,7 @@ For more news, see [here](docs/news.md).
 
 # CheetahClaws
 
-CheetahClaws: **A Lightweight** and **Easy-to-Use** Python native Agent Harness Infrastructure, **Supporting Any Model**, such as Claude, GPT, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, and local open-source models via Ollama or any OpenAI-compatible endpoint.
+CheetahClaws: **A Fast** and **Easy-to-Use** Python native Agent Harness Infrastructure, **Supporting Any Model**, such as Claude, GPT, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, and local open-source models via Ollama or any OpenAI-compatible endpoint.
 
 ---
 
@@ -537,6 +538,7 @@ If you find the repository useful, please cite the study
 <a href="https://github.com/tsint"><img src="https://avatars.githubusercontent.com/u/63944253?v=4&s=48" width="48" height="48" alt="tsint"/></a>
 <a href="https://github.com/albertcheng"><img src="https://avatars.githubusercontent.com/u/2686135?v=4&s=48" width="48" height="48" alt="albertcheng"/></a>
 <a href="https://github.com/LostAion"><img src="https://avatars.githubusercontent.com/u/84846068?v=4&s=48" width="48" height="48" alt="LostAion"/></a>
+<a href="https://github.com/lucaszhu-hue"><img src="https://avatars.githubusercontent.com/u/278269343?v=4&s=48" width="48" height="48" alt="lucaszhu-hue"/></a>
 <a href="https://github.com/skint007"><img src="https://avatars.githubusercontent.com/u/37035851?v=4&s=48" width="48" height="48" alt="skint007"/></a>
 <a href="https://github.com/thekbbohara"><img src="https://avatars.githubusercontent.com/u/133592644?v=4&s=48" width="48" height="48" alt="thekbbohara"/></a>
 

diff --git a/cheetahclaws.py b/cheetahclaws.py
@@ -203,7 +203,7 @@ def __getattr__(self, name):
     render_diff, _has_diff,
     stream_text, stream_thinking, flush_response,
     _start_tool_spinner, _stop_tool_spinner, _change_spinner_phrase,
-    set_spinner_phrase, set_rich_live, set_spinner_tips,
+    set_spinner_phrase, set_rich_live, set_stream_mode, auto_stream_mode, set_spinner_tips,
     print_tool_start, print_tool_end,
     set_quiet, reset_turn_stats, print_turn_summary,
     set_spinner_tokens, print_turn_stats,
@@ -613,7 +613,7 @@ def handle_slash(line: str, state, config) -> Union[bool, tuple]:
     "load":        ("Load a saved session",               []),
     "history":     ("Show conversation history",          []),
     "search":      ("Search past sessions",               []),
-    "context":     ("Show token-context usage",           []),
+    "context":     ("Visualize context-window usage by category", []),
     "cost":        ("Show cost estimate",                 []),
     "verbose":     ("Toggle verbose output",              []),
     "quiet":       ("Toggle compact tool display",        []),
@@ -974,6 +974,9 @@ def repl(config: dict, initial_prompt: str = None):
         except Exception:
             pass
 
+        # Blank line so the logo sits a little farther below the shell prompt.
+        print()
+
         # Print logo - warm cheetah-gold vertical gradient, plain if color is off.
         if C["reset"]:
             for _ln, _hex in zip(_CHEETAH_LOGO, _CHEETAH_GRADIENT):
@@ -1054,23 +1057,25 @@ def _row(colored: str, plain: str) -> str:
 
     query_lock = threading.RLock()
 
-    # Apply rich_live config: disable in-place Live streaming if terminal has issues.
-    # Auto-detect environments where ANSI cursor-up / live-rewrite doesn't work:
-    #   - SSH sessions (cursor-up fails across network PTY)
-    #   - Dumb terminals (no ANSI support)
-    #   - macOS Terminal.app (can't erase above scroll boundary → duplicated output)
-    #   - Screen/tmux over SSH
-    import os as _os, platform as _plat
-    _in_ssh = bool(_os.environ.get("SSH_CLIENT") or _os.environ.get("SSH_TTY"))
-    _is_dumb = (console is not None and getattr(console, "is_dumb_terminal", False))
-    _is_macos_terminal = (_plat.system() == "Darwin"
-                          and _os.environ.get("TERM_PROGRAM", "") in ("Apple_Terminal", ""))
-    _rich_live_default = not _in_ssh and not _is_dumb and not _is_macos_terminal
-    set_rich_live(config.get("rich_live", _rich_live_default))
+    # Pick the streaming tier for this device (see ui.render.auto_stream_mode):
+    #   "live"   — full in-place Rich redraw (capable terminals, incl. modern
+    #              emulators over SSH like iTerm2 / WezTerm / Windows Terminal /
+    #              VSCode / kitty / Alacritty / Ghostty).
+    #   "commit" — append-only progressive Markdown for terminals where cursor-up
+    #              redraw is unsafe (Apple Terminal, unknown SSH PTYs, pipes). Still
+    #              renders rich formatting block-by-block — a big upgrade over the
+    #              old raw-token fallback.
+    #   "plain"  — only when Rich is unavailable.
+    # An explicit `stream_mode` or legacy `rich_live` config value overrides detection.
+    set_stream_mode(auto_stream_mode(config))
 
     # Apply spinner_tips config: rotating Claude-Code-style tips beneath the
     # spinner. Disabled automatically where multi-line cursor moves misbehave
     # (dumb terminals, macOS Terminal.app) so the tip line never garbles output.
+    import os as _os, platform as _plat
+    _is_dumb = (console is not None and getattr(console, "is_dumb_terminal", False))
+    _is_macos_terminal = (_plat.system() == "Darwin"
+                          and _os.environ.get("TERM_PROGRAM", "") in ("Apple_Terminal", ""))
     _spinner_tips_default = not _is_dumb and not _is_macos_terminal
     set_spinner_tips(config.get("spinner_tips", _spinner_tips_default))
 

diff --git a/commands/core.py b/commands/core.py
@@ -83,13 +83,130 @@ def cmd_clear(_args: str, state, config) -> bool:
     return True
 
 
+def _fmt_tokens(n: int) -> str:
+    """Compact human token count: 1m / 200k / 21.2k / 540."""
+    n = int(n)
+    if n >= 1_000_000:
+        s = f"{n / 1_000_000:.1f}m"
+        return s.replace(".0m", "m")
+    if n >= 1_000:
+        s = f"{n / 1_000:.1f}k"
+        return s.replace(".0k", "k")
+    return str(n)
+
+
 def cmd_context(_args: str, state, config) -> bool:
-    msg_chars = sum(len(str(m.get("content", ""))) for m in state.messages)
-    est_tokens = msg_chars // 4
-    info(f"Messages:         {len(state.messages)}")
-    info(f"Estimated tokens: ~{est_tokens:,}")
-    info(f"Model:            {config['model']}")
-    info(f"Max tokens:       {config['max_tokens']:,}")
+    """Visual breakdown of context-window usage by category (Claude-Code style).
+
+    Renders a 20×10 cell grid where each cell represents an equal slice of the
+    model's context window, coloured per category, followed by a legend showing
+    the estimated token cost and percentage of each component.
+    """
+    import sys as _sys
+    from compaction import estimate_tokens, get_context_limit
+    from providers import detect_provider
+
+    model = config.get("model", "unknown")
+    provider = detect_provider(model) if model else ""
+    ctx_limit = get_context_limit(model, config) or 0
+
+    def _est(text: str) -> int:
+        return estimate_tokens([{"role": "system", "content": text}]) if text else 0
+
+    # ── Measure each in-context component ───────────────────────────────────
+    # System prompt = base + env + live command index (everything
+    # build_system_prompt injects EXCEPT memory, which we break out below to
+    # mirror Claude Code's category split).
+    sys_tokens = 0
+    try:
+        import context as _ctx
+        from prompts import pick_base_prompt
+        base = pick_base_prompt(provider, model) if model else pick_base_prompt()
+        sys_tokens = (_est(base)
+                      + _est(_ctx._render_env_block(config))
+                      + _est(_ctx._render_commands_block()))
+    except Exception:
+        try:
+            import context as _ctx
+            sys_tokens = _est(_ctx.build_system_prompt(config))
+        except Exception:
+            sys_tokens = 0
+
+    mem_tokens = 0
+    try:
+        from memory import get_memory_context
+        mem_tokens = _est(get_memory_context())
+    except Exception:
+        mem_tokens = 0
+
+    tool_tokens = 0
+    try:
+        from tool_registry import get_tool_schemas
+        tool_tokens = _est(json.dumps(get_tool_schemas()))
+    except Exception:
+        tool_tokens = 0
+
+    skill_tokens = 0
+    try:
+        from skill import load_skills
+        blob = "\n".join(
+            f"{s.name}: {s.description} {' '.join(getattr(s, 'triggers', []) or [])}"
+            for s in load_skills()
+        )
+        skill_tokens = _est(blob)
+    except Exception:
+        skill_tokens = 0
+
+    msg_tokens = estimate_tokens(getattr(state, "messages", []))
+    msg_count = len(getattr(state, "messages", []))
+
+    cats = [
+        ("System prompt", sys_tokens,   "cyan"),
+        ("System tools",  tool_tokens,  "blue"),
+        ("Memory files",  mem_tokens,   "magenta"),
+        ("Skills",        skill_tokens, "yellow"),
+        ("Messages",      msg_tokens,   "green"),
+    ]
+    used = sum(t for _, t, _ in cats)
+    free = max(0, ctx_limit - used) if ctx_limit else 0
+
+    # ── Build the cell grid ─────────────────────────────────────────────────
+    utf8 = "utf" in (getattr(_sys.stdout, "encoding", "") or "").lower()
+    FULL, EMPTY = ("⛁", "⛶") if utf8 else ("#", ".")
+    COLS, ROWS = 20, 10
+    total_cells = COLS * ROWS
+    per_cell = (ctx_limit / total_cells) if ctx_limit else 0
+
+    cells: list[tuple[str, str]] = []
+    if per_cell:
+        for _name, tok, color in cats:
+            n = int(round(tok / per_cell))
+            cells.extend([(FULL, color)] * n)
+    cells = cells[:total_cells]
+    cells.extend([(EMPTY, "dim")] * (total_cells - len(cells)))
+
+    # ── Render ──────────────────────────────────────────────────────────────
+    print(clr("  Context Usage", "bold"))
+    for r in range(ROWS):
+        row = cells[r * COLS:(r + 1) * COLS]
+        print("  " + " ".join(clr(g, c) for g, c in row))
+
+    pct = (used / ctx_limit * 100) if ctx_limit else 0
+    print()
+    print(f"  {clr(model, 'bold')}" + (f"  ·  {provider}" if provider else ""))
+    if ctx_limit:
+        print(f"  {_fmt_tokens(used)}/{_fmt_tokens(ctx_limit)} tokens ({pct:.1f}%)")
+    else:
+        print(f"  {_fmt_tokens(used)} tokens (context limit unknown)")
+
+    print()
+    print(clr("  Estimated usage by category", "dim"))
+    for name, tok, color in cats:
+        p = (tok / ctx_limit * 100) if ctx_limit else 0
+        print(f"  {clr(FULL, color)} {name + ':':<15} {_fmt_tokens(tok):>7} tokens ({p:.1f}%)"
+              + (f"  [{msg_count} msgs]" if name == "Messages" else ""))
+    fp = (free / ctx_limit * 100) if ctx_limit else 0
+    print(f"  {clr(EMPTY, 'dim')} {'Free space:':<15} {_fmt_tokens(free):>7} tokens ({fp:.1f}%)")
     return True
 
 

diff --git a/docs/guides/features.md b/docs/guides/features.md
@@ -47,7 +47,7 @@ and indexed in the [README Documentation section](../../README.md#documentation)
 | Shell escape | Type `!command` in the REPL to execute any shell command directly without AI involvement (`!git status`, `!ls`, `!python --version`). Output prints inline. |
 | Proactive monitoring | `/proactive [duration]` starts a background sentinel daemon; agent wakes automatically after inactivity, enabling continuous monitoring loops without user prompts |
 | Force quit | 3× Ctrl+C within 2 seconds triggers `os._exit(1)` — kills the process immediately regardless of blocking I/O |
-| Rich Live streaming | When `rich` is installed, responses render as live-updating Markdown in place. Long responses that would overflow the terminal keep rendering live but show only the most recent screenful (a bounded tail window, terminal-height / wrap aware); the complete output is committed when the response finishes, preventing the duplicate or stale frames some terminals leave behind. Plain streaming is used only as a fallback. Auto-disabled in SSH sessions; override with `/config rich_live=false`. |
+| Adaptive Markdown streaming | Responses render as live Markdown, with the rendering tier **auto-selected per device** so streaming stays correct everywhere — no duplicated/stale frames over SSH, on macOS Terminal, or with wide (CJK / emoji) text. Three tiers: **`live`** — full in-place Rich redraw, used on terminals known to handle cursor-up reliably (local TTYs and modern emulators incl. over SSH: iTerm2, WezTerm, Windows Terminal, VSCode, kitty, Alacritty, Ghostty); **`commit`** — append-only progressive Markdown, the safe default for SSH/Apple Terminal/pipes, where each completed block renders and prints **permanently** (zero cursor movement, so it can never leave a duplicate frame) while the in-progress block appears when it closes; **`plain`** — raw tokens, only when `rich` is unavailable. Detection lives in `ui.render.auto_stream_mode`; override with `/config stream_mode=live\|commit\|plain` (legacy `/config rich_live=true\|false` still works). |
 | Spinner tips | While the model works, the spinner shows an elapsed timer plus a rotating Claude-Code-style "Tip:" line surfacing handy commands (`/compact`, `/checkpoint`, `/research`, …). Auto-disabled on dumb / macOS Terminal where multi-line cursor moves misbehave; toggle with `/config spinner_tips=false`. |
 | Compact tool display | Claude-Code-style quiet output (on by default). Instead of printing a `⚙ Tool(...)` line and a `✓ → N lines` line for every tool call, the per-tool execution is hidden — the spinner conveys live activity and one summary line is emitted at the tool→text boundary (`Read 2 files, ran 3 shell commands`), sitting just above the reply. Errors and denials still surface so a mid-turn failure is never silent. In quiet mode the permission prompt also collapses a multi-line command to one line (`Run: python3 << 'PYEOF'  … (+59 行)`) instead of dumping the whole script. `/verbose` overrides it (full per-tool lines + inputs + token counts); toggle with `/quiet` or launch with `--show-tools` (alias `--no-quiet`). The banner shows the active mode as `Output: quiet` / `Output: full`. **Live status:** while the model works, the spinner shows elapsed time plus a running token estimate (`Thinking… (7s · ↓ 435 tokens)`, char-based since providers only report real usage at the end); each quiet turn then closes with a real-usage footer — `✻ Worked for 7.2s · ↑ 1.2k · ↓ 435` — using the true counts from `TurnDone`. |
 | Context injection | Auto-loads `CLAUDE.md`, git status, cwd, persistent memory |