From fd88b28f3dc782cd6702364fba4e731776911fa5 Mon Sep 17 00:00:00 2001
From: chauncygu <gshangd@163.com>
Date: Fri, 5 Jun 2026 00:30:19 -0700
Subject: [PATCH] feat(v3.05.82): adaptive Markdown streaming, visual /context,
 deepseek-v4-flash 1M
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adaptive per-device Markdown streaming (ui/render.py):
- Auto-select a streaming tier in auto_stream_mode(): "live" (in-place Rich
  redraw on capable terminals incl. modern SSH emulators), "commit" (append-only
  progressive Markdown — completed blocks render and print permanently with zero
  cursor movement, so frames can never duplicate over SSH / Apple Terminal /
  pipes / with wide CJK-emoji text), and "plain" (raw tokens, no Rich).
- Override via /config stream_mode=live|commit|plain (legacy rich_live still works).
- Replaces the old crude "disable Live whenever SSH" gate that dropped SSH users
  to raw tokens. Adds tests/test_stream_modes.py (26 cases incl. a regression that
  commit mode emits zero cursor sequences even on a TTY with CJK text).

Visual /context grid (commands/core.py):
- 20x10 colored cell grid of context-window usage by category (system prompt /
  tools / memory / skills / messages / free space) with per-category tokens + %,
  adapting to the model's real window; ASCII (#/.) fallback on non-UTF-8 terminals.

deepseek-v4-flash 1M context window (providers.py):
- Registered in _MODEL_CONTEXT_LIMITS so the prompt %, /context, and compaction
  trigger reflect the true 1M window (deepseek-chat / v4-pro stay at 128K).

UI: add a blank line above the startup logo so it sits farther below the prompt.

Docs: README contributors update; one-line README news + detailed docs/news.md
entry; features.md / reference.md updated; version bump to 3.05.82.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 README.md                                |  10 +-
 cheetahclaws.py                          |  35 +++--
 commands/core.py                         | 129 +++++++++++++++-
 docs/guides/features.md                  |   2 +-
 docs/guides/reference.md                 |   4 +-
 docs/news.md                             |   3 +-
 providers.py                             |   4 +
 pyproject.toml                           |   2 +-
 tests/fixtures/golden_default_prompt.txt |   3 +-
 tests/test_compaction.py                 |   6 +-
 tests/test_stream_modes.py               | 182 +++++++++++++++++++++++
 ui/__init__.py                           |   2 +-
 ui/render.py                             | 170 ++++++++++++++++++++-
 13 files changed, 514 insertions(+), 38 deletions(-)
 create mode 100644 tests/test_stream_modes.py
diff --git a/README.md b/README.md
index a46ab22..d775769 100644
--- a/README.md
+++ b/README.md
@@ -39,9 +39,10 @@ Other install methods: [pip install](#alternative-install-with-pip) | [uv instal
 
 ## 🔥🔥🔥 News (Pacific Time)
 
-- June 4, 2026 (latest, **v3.05.81**): **Claude-Code-style quiet output — hide tool execution, show one summary line per turn.** Quiet mode (on by default) suppresses the per-tool `⚙ Tool(...)` / `✓ → N lines` clutter; the spinner shows live activity and a single line (`Read 2 files, ran 3 shell commands`) is emitted just above the reply. The permission prompt also collapses multi-line commands to one line. Errors still surface. The spinner shows a live timer + running token estimate (`Thinking… (7s · ↓ 435 tokens)`) and each turn closes with a real-usage footer (`✻ Worked for 7.2s · ↑ 1.2k · ↓ 435`). `/verbose` overrides it; toggle with `/quiet` or `--show-tools`; the banner shows `Output: quiet/full`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
-- June 4, 2026: **Context-window override — the prompt % and compaction follow a settable context length.** New `/config context_window=<N>` overrides the model's context window (`0` = default), distinct from `max_tokens` (the output cap). One value drives the prompt `%`, `/context`, the compaction trigger, and the per-call output cap consistently — read live, so switching model or window updates the `%` with no restart. Details: [docs/guides/reference.md](docs/guides/reference.md) · [docs/news.md](docs/news.md).
-- June 4, 2026: **Rich Live streaming — long responses stay live via a bounded tail window.** Long responses that would overflow the terminal keep rendering live but show only the most recent screenful (a bounded tail window), committing the full output when done — fixing the duplicate/stale frames some terminals left behind. Builds on PR #133. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
+- June 5, 2026 (latest, **v3.05.82**): **Adaptive Markdown streaming — live output stays correct on every device** by auto-selecting a per-device tier (`live` in-place redraw on capable terminals incl. modern SSH emulators, append-only `commit` for SSH/Apple Terminal/pipes/CJK text so frames never duplicate, `plain` fallback); also ships a visual `/context` usage grid and a 1M context window for `deepseek-v4-flash`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
+- June 4, 2026 (**v3.05.81**): **Claude-Code-style quiet output** hides per-tool execution and shows one summary line per turn (on by default), with a live spinner timer + token estimate and a `✻ Worked for…` footer; `/verbose` overrides, toggle with `/quiet`. Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
+- June 4, 2026: **Context-window override** — `/config context_window=<N>` sets the context length that drives the prompt `%`, `/context`, the compaction trigger, and the output cap consistently (distinct from `max_tokens`; read live, no restart). Details: [docs/guides/reference.md](docs/guides/reference.md) · [docs/news.md](docs/news.md).
+- June 4, 2026: **Rich Live streaming** keeps long responses live via a bounded tail window — redrawing only the most recent screenful and committing the full output when done, fixing duplicate/stale frames (builds on PR #133). Details: [docs/guides/features.md](docs/guides/features.md) · [docs/news.md](docs/news.md).
 - May 31, 2026: **QQ bot bridge — `/qq` connects cheetahclaws to QQ groups + C2C private chats via the official `qq-botpy` SDK (PR #121).** Details: [docs/guides/bridges.md](docs/guides/bridges.md#qq-bridge) · [docs/news.md](docs/news.md).
 - May 12, 2026: **Security hardening sweep — env-var bot tokens, web CSRF cookie, terminal session owner-binding, and plugin/MCP/filesystem sandboxing (two CRITICAL + HIGH rounds, 2347 tests green).** Details: [docs/guides/security.md](docs/guides/security.md) · [docs/news.md](docs/news.md).
 - May 12, 2026: **Daemon foundation roadmap — all nine F-1…F-9 items landed: subprocess agent runners, on-crash restart policy, daemonized Telegram/Slack/WeChat bridges, and budget guardrails.** Details: [docs/news.md](docs/news.md).
@@ -52,7 +53,7 @@ For more news, see [here](docs/news.md).
 
 # CheetahClaws
 
-CheetahClaws: **A Lightweight** and **Easy-to-Use** Python native Agent Harness Infrastructure, **Supporting Any Model**, such as Claude, GPT, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, and local open-source models via Ollama or any OpenAI-compatible endpoint.
+CheetahClaws: **A Fast** and **Easy-to-Use** Python native Agent Harness Infrastructure, **Supporting Any Model**, such as Claude, GPT, Gemini, Kimi, Qwen, Zhipu, DeepSeek, MiniMax, and local open-source models via Ollama or any OpenAI-compatible endpoint.
 
 ---
 
@@ -537,6 +538,7 @@ If you find the repository useful, please cite the study
 <a href="https://github.com/tsint"><img src="https://avatars.githubusercontent.com/u/63944253?v=4&s=48" width="48" height="48" alt="tsint"/></a>
 <a href="https://github.com/albertcheng"><img src="https://avatars.githubusercontent.com/u/2686135?v=4&s=48" width="48" height="48" alt="albertcheng"/></a>
 <a href="https://github.com/LostAion"><img src="https://avatars.githubusercontent.com/u/84846068?v=4&s=48" width="48" height="48" alt="LostAion"/></a>
+<a href="https://github.com/lucaszhu-hue"><img src="https://avatars.githubusercontent.com/u/278269343?v=4&s=48" width="48" height="48" alt="lucaszhu-hue"/></a>
 <a href="https://github.com/skint007"><img src="https://avatars.githubusercontent.com/u/37035851?v=4&s=48" width="48" height="48" alt="skint007"/></a>
 <a href="https://github.com/thekbbohara"><img src="https://avatars.githubusercontent.com/u/133592644?v=4&s=48" width="48" height="48" alt="thekbbohara"/></a>
 
diff --git a/cheetahclaws.py b/cheetahclaws.py
index 382c69f..5fdd5da 100755
--- a/cheetahclaws.py
+++ b/cheetahclaws.py
@@ -203,7 +203,7 @@ def __getattr__(self, name):
     render_diff, _has_diff,
     stream_text, stream_thinking, flush_response,
     _start_tool_spinner, _stop_tool_spinner, _change_spinner_phrase,
-    set_spinner_phrase, set_rich_live, set_spinner_tips,
+    set_spinner_phrase, set_rich_live, set_stream_mode, auto_stream_mode, set_spinner_tips,
     print_tool_start, print_tool_end,
     set_quiet, reset_turn_stats, print_turn_summary,
     set_spinner_tokens, print_turn_stats,
@@ -613,7 +613,7 @@ def handle_slash(line: str, state, config) -> Union[bool, tuple]:
     "load":        ("Load a saved session",               []),
     "history":     ("Show conversation history",          []),
     "search":      ("Search past sessions",               []),
-    "context":     ("Show token-context usage",           []),
+    "context":     ("Visualize context-window usage by category", []),
     "cost":        ("Show cost estimate",                 []),
     "verbose":     ("Toggle verbose output",              []),
     "quiet":       ("Toggle compact tool display",        []),
@@ -974,6 +974,9 @@ def repl(config: dict, initial_prompt: str = None):
         except Exception:
             pass
 
+        # Blank line so the logo sits a little farther below the shell prompt.
+        print()
+
         # Print logo - warm cheetah-gold vertical gradient, plain if color is off.
         if C["reset"]:
             for _ln, _hex in zip(_CHEETAH_LOGO, _CHEETAH_GRADIENT):
@@ -1054,23 +1057,25 @@ def _row(colored: str, plain: str) -> str:
 
     query_lock = threading.RLock()
 
-    # Apply rich_live config: disable in-place Live streaming if terminal has issues.
-    # Auto-detect environments where ANSI cursor-up / live-rewrite doesn't work:
-    #   - SSH sessions (cursor-up fails across network PTY)
-    #   - Dumb terminals (no ANSI support)
-    #   - macOS Terminal.app (can't erase above scroll boundary → duplicated output)
-    #   - Screen/tmux over SSH
-    import os as _os, platform as _plat
-    _in_ssh = bool(_os.environ.get("SSH_CLIENT") or _os.environ.get("SSH_TTY"))
-    _is_dumb = (console is not None and getattr(console, "is_dumb_terminal", False))
-    _is_macos_terminal = (_plat.system() == "Darwin"
-                          and _os.environ.get("TERM_PROGRAM", "") in ("Apple_Terminal", ""))
-    _rich_live_default = not _in_ssh and not _is_dumb and not _is_macos_terminal
-    set_rich_live(config.get("rich_live", _rich_live_default))
+    # Pick the streaming tier for this device (see ui.render.auto_stream_mode):
+    #   "live"   — full in-place Rich redraw (capable terminals, incl. modern
+    #              emulators over SSH like iTerm2 / WezTerm / Windows Terminal /
+    #              VSCode / kitty / Alacritty / Ghostty).
+    #   "commit" — append-only progressive Markdown for terminals where cursor-up
+    #              redraw is unsafe (Apple Terminal, unknown SSH PTYs, pipes). Still
+    #              renders rich formatting block-by-block — a big upgrade over the
+    #              old raw-token fallback.
+    #   "plain"  — only when Rich is unavailable.
+    # An explicit `stream_mode` or legacy `rich_live` config value overrides detection.
+    set_stream_mode(auto_stream_mode(config))
 
     # Apply spinner_tips config: rotating Claude-Code-style tips beneath the
     # spinner. Disabled automatically where multi-line cursor moves misbehave
     # (dumb terminals, macOS Terminal.app) so the tip line never garbles output.
+    import os as _os, platform as _plat
+    _is_dumb = (console is not None and getattr(console, "is_dumb_terminal", False))
+    _is_macos_terminal = (_plat.system() == "Darwin"
+                          and _os.environ.get("TERM_PROGRAM", "") in ("Apple_Terminal", ""))
     _spinner_tips_default = not _is_dumb and not _is_macos_terminal
     set_spinner_tips(config.get("spinner_tips", _spinner_tips_default))
 
diff --git a/commands/core.py b/commands/core.py
index d411a6c..fd57e52 100644
--- a/commands/core.py
+++ b/commands/core.py
@@ -83,13 +83,130 @@ def cmd_clear(_args: str, state, config) -> bool:
     return True
 
 
+def _fmt_tokens(n: int) -> str:
+    """Compact human token count: 1m / 200k / 21.2k / 540."""
+    n = int(n)
+    if n >= 1_000_000:
+        s = f"{n / 1_000_000:.1f}m"
+        return s.replace(".0m", "m")
+    if n >= 1_000:
+        s = f"{n / 1_000:.1f}k"
+        return s.replace(".0k", "k")
+    return str(n)
+
+
 def cmd_context(_args: str, state, config) -> bool:
-    msg_chars = sum(len(str(m.get("content", ""))) for m in state.messages)
-    est_tokens = msg_chars // 4
-    info(f"Messages:         {len(state.messages)}")
-    info(f"Estimated tokens: ~{est_tokens:,}")
-    info(f"Model:            {config['model']}")
-    info(f"Max tokens:       {config['max_tokens']:,}")
+    """Visual breakdown of context-window usage by category (Claude-Code style).
+
+    Renders a 20×10 cell grid where each cell represents an equal slice of the
+    model's context window, coloured per category, followed by a legend showing
+    the estimated token cost and percentage of each component.
+    """
+    import sys as _sys
+    from compaction import estimate_tokens, get_context_limit
+    from providers import detect_provider
+
+    model = config.get("model", "unknown")
+    provider = detect_provider(model) if model else ""
+    ctx_limit = get_context_limit(model, config) or 0
+
+    def _est(text: str) -> int:
+        return estimate_tokens([{"role": "system", "content": text}]) if text else 0
+
+    # ── Measure each in-context component ───────────────────────────────────
+    # System prompt = base + env + live command index (everything
+    # build_system_prompt injects EXCEPT memory, which we break out below to
+    # mirror Claude Code's category split).
+    sys_tokens = 0
+    try:
+        import context as _ctx
+        from prompts import pick_base_prompt
+        base = pick_base_prompt(provider, model) if model else pick_base_prompt()
+        sys_tokens = (_est(base)
+                      + _est(_ctx._render_env_block(config))
+                      + _est(_ctx._render_commands_block()))
+    except Exception:
+        try:
+            import context as _ctx
+            sys_tokens = _est(_ctx.build_system_prompt(config))
+        except Exception:
+            sys_tokens = 0
+
+    mem_tokens = 0
+    try:
+        from memory import get_memory_context
+        mem_tokens = _est(get_memory_context())
+    except Exception:
+        mem_tokens = 0
+
+    tool_tokens = 0
+    try:
+        from tool_registry import get_tool_schemas
+        tool_tokens = _est(json.dumps(get_tool_schemas()))
+    except Exception:
+        tool_tokens = 0
+
+    skill_tokens = 0
+    try:
+        from skill import load_skills
+        blob = "\n".join(
+            f"{s.name}: {s.description} {' '.join(getattr(s, 'triggers', []) or [])}"
+            for s in load_skills()
+        )
+        skill_tokens = _est(blob)
+    except Exception:
+        skill_tokens = 0
+
+    msg_tokens = estimate_tokens(getattr(state, "messages", []))
+    msg_count = len(getattr(state, "messages", []))
+
+    cats = [
+        ("System prompt", sys_tokens,   "cyan"),
+        ("System tools",  tool_tokens,  "blue"),
+        ("Memory files",  mem_tokens,   "magenta"),
+        ("Skills",        skill_tokens, "yellow"),
+        ("Messages",      msg_tokens,   "green"),
+    ]
+    used = sum(t for _, t, _ in cats)
+    free = max(0, ctx_limit - used) if ctx_limit else 0
+
+    # ── Build the cell grid ─────────────────────────────────────────────────
+    utf8 = "utf" in (getattr(_sys.stdout, "encoding", "") or "").lower()
+    FULL, EMPTY = ("⛁", "⛶") if utf8 else ("#", ".")
+    COLS, ROWS = 20, 10
+    total_cells = COLS * ROWS
+    per_cell = (ctx_limit / total_cells) if ctx_limit else 0
+
+    cells: list[tuple[str, str]] = []
+    if per_cell:
+        for _name, tok, color in cats:
+            n = int(round(tok / per_cell))
+            cells.extend([(FULL, color)] * n)
+    cells = cells[:total_cells]
+    cells.extend([(EMPTY, "dim")] * (total_cells - len(cells)))
+
+    # ── Render ──────────────────────────────────────────────────────────────
+    print(clr("  Context Usage", "bold"))
+    for r in range(ROWS):
+        row = cells[r * COLS:(r + 1) * COLS]
+        print("  " + " ".join(clr(g, c) for g, c in row))
+
+    pct = (used / ctx_limit * 100) if ctx_limit else 0
+    print()
+    print(f"  {clr(model, 'bold')}" + (f"  ·  {provider}" if provider else ""))
+    if ctx_limit:
+        print(f"  {_fmt_tokens(used)}/{_fmt_tokens(ctx_limit)} tokens ({pct:.1f}%)")
+    else:
+        print(f"  {_fmt_tokens(used)} tokens (context limit unknown)")
+
+    print()
+    print(clr("  Estimated usage by category", "dim"))
+    for name, tok, color in cats:
+        p = (tok / ctx_limit * 100) if ctx_limit else 0
+        print(f"  {clr(FULL, color)} {name + ':':<15} {_fmt_tokens(tok):>7} tokens ({p:.1f}%)"
+              + (f"  [{msg_count} msgs]" if name == "Messages" else ""))
+    fp = (free / ctx_limit * 100) if ctx_limit else 0
+    print(f"  {clr(EMPTY, 'dim')} {'Free space:':<15} {_fmt_tokens(free):>7} tokens ({fp:.1f}%)")
     return True
 
 
diff --git a/docs/guides/features.md b/docs/guides/features.md
index f159719..2f228df 100644
--- a/docs/guides/features.md
+++ b/docs/guides/features.md
@@ -47,7 +47,7 @@ and indexed in the [README Documentation section](../../README.md#documentation)
 | Shell escape | Type `!command` in the REPL to execute any shell command directly without AI involvement (`!git status`, `!ls`, `!python --version`). Output prints inline. |
 | Proactive monitoring | `/proactive [duration]` starts a background sentinel daemon; agent wakes automatically after inactivity, enabling continuous monitoring loops without user prompts |
 | Force quit | 3× Ctrl+C within 2 seconds triggers `os._exit(1)` — kills the process immediately regardless of blocking I/O |
-| Rich Live streaming | When `rich` is installed, responses render as live-updating Markdown in place. Long responses that would overflow the terminal keep rendering live but show only the most recent screenful (a bounded tail window, terminal-height / wrap aware); the complete output is committed when the response finishes, preventing the duplicate or stale frames some terminals leave behind. Plain streaming is used only as a fallback. Auto-disabled in SSH sessions; override with `/config rich_live=false`. |
+| Adaptive Markdown streaming | Responses render as live Markdown, with the rendering tier **auto-selected per device** so streaming stays correct everywhere — no duplicated/stale frames over SSH, on macOS Terminal, or with wide (CJK / emoji) text. Three tiers: **`live`** — full in-place Rich redraw, used on terminals known to handle cursor-up reliably (local TTYs and modern emulators incl. over SSH: iTerm2, WezTerm, Windows Terminal, VSCode, kitty, Alacritty, Ghostty); **`commit`** — append-only progressive Markdown, the safe default for SSH/Apple Terminal/pipes, where each completed block renders and prints **permanently** (zero cursor movement, so it can never leave a duplicate frame) while the in-progress block appears when it closes; **`plain`** — raw tokens, only when `rich` is unavailable. Detection lives in `ui.render.auto_stream_mode`; override with `/config stream_mode=live\|commit\|plain` (legacy `/config rich_live=true\|false` still works). |
 | Spinner tips | While the model works, the spinner shows an elapsed timer plus a rotating Claude-Code-style "Tip:" line surfacing handy commands (`/compact`, `/checkpoint`, `/research`, …). Auto-disabled on dumb / macOS Terminal where multi-line cursor moves misbehave; toggle with `/config spinner_tips=false`. |
 | Compact tool display | Claude-Code-style quiet output (on by default). Instead of printing a `⚙ Tool(...)` line and a `✓ → N lines` line for every tool call, the per-tool execution is hidden — the spinner conveys live activity and one summary line is emitted at the tool→text boundary (`Read 2 files, ran 3 shell commands`), sitting just above the reply. Errors and denials still surface so a mid-turn failure is never silent. In quiet mode the permission prompt also collapses a multi-line command to one line (`Run: python3 << 'PYEOF'  … (+59 行)`) instead of dumping the whole script. `/verbose` overrides it (full per-tool lines + inputs + token counts); toggle with `/quiet` or launch with `--show-tools` (alias `--no-quiet`). The banner shows the active mode as `Output: quiet` / `Output: full`. **Live status:** while the model works, the spinner shows elapsed time plus a running token estimate (`Thinking… (7s · ↓ 435 tokens)`, char-based since providers only report real usage at the end); each quiet turn then closes with a real-usage footer — `✻ Worked for 7.2s · ↑ 1.2k · ↓ 435` — using the true counts from `TurnDone`. |
 | Context injection | Auto-loads `CLAUDE.md`, git status, cwd, persistent memory |
diff --git a/docs/guides/reference.md b/docs/guides/reference.md
index 73e1b98..7faa5a0 100644
--- a/docs/guides/reference.md
+++ b/docs/guides/reference.md
@@ -54,6 +54,7 @@ Type `/` and press **Tab** to see all commands with descriptions. Continue typin
 | `/config` | Show all current config values |
 | `/config key=value` | Set a config value (persisted to disk). v3.05.78+ parses JSON values: `["a","b"]`, `{"k":"v"}`, signed numbers, quoted strings — list/dict configs no longer get silently saved as literal strings. |
 | `/config context_window=<N>` | Override the context window (tokens) for the session. `0` = use the model's default. Drives the prompt `%` indicator, `/context`, the compaction trigger, **and** the per-call output-token cap — all consistently. Distinct from `max_tokens` (which is the **output** cap, not the window). Bidirectional: a smaller value forces earlier compaction; a larger value corrects a stale default. Read live, so it takes effect on the next prompt (no restart). Warns if set above the model's real window (that would disable compaction and the API may reject oversized prompts). |
+| `/config stream_mode=<mode>` | Force the Markdown streaming tier: `live` (full in-place Rich redraw), `commit` (append-only progressive Markdown — safe over SSH / Apple Terminal / pipes), or `plain` (raw tokens). Unset = auto-detected per device (`ui.render.auto_stream_mode`). Legacy `/config rich_live=true\|false` still works (`true`→`live`, `false`→`commit`). |
 | `/save` | Save session (auto-named by timestamp) |
 | `/save <filename>` | Save session to named file |
 | `/load` | Interactive list grouped by date; enter number, `1,2,3` to merge, or `H` for full history |
@@ -61,7 +62,7 @@ Type `/` and press **Tab** to see all commands with descriptions. Continue typin
 | `/resume` | Restore the last auto-saved session (`mr_sessions/session_latest.json`) |
 | `/resume <filename>` | Load a specific file from `mr_sessions/` (or absolute path) |
 | `/history` | Print full conversation history |
-| `/context` | Show message count, token estimate, and context-window usage (honors a `context_window` override) |
+| `/context` | Visualize context-window usage as a Claude-Code-style cell grid, broken down by category (system prompt, system tools, memory files, skills, messages, free space) with per-category token counts and percentages. Honors a `context_window` override; falls back to `#`/`.` when the terminal isn't UTF-8. |
 | `/cost` | Show token usage and estimated USD cost |
 | `/verbose` | Toggle verbose mode (tokens + thinking) |
 | `/quiet` | Toggle compact tool display — hide per-tool execution lines and show one summary line per turn (on by default; `/verbose` overrides it) |
@@ -328,6 +329,7 @@ Keys are saved to `~/.cheetahclaws/config.json` and loaded automatically on next
   "verbose": false,
   "quiet": true,
   "thinking": false,
+  "stream_mode": null,
   "qwen_api_key": "sk-...",
   "kimi_api_key": "sk-...",
   "deepseek_api_key": "sk-...",
diff --git a/docs/news.md b/docs/news.md
index a3fd1b3..503e200 100644
--- a/docs/news.md
+++ b/docs/news.md
@@ -3,7 +3,8 @@
 ## 🔥🔥🔥 News (Pacific Time)
 
 
-- June 4, 2026 (**v3.05.81**) (latest): **Claude-Code-style quiet output — hide tool execution, show one summary line per turn.** Long analysis turns used to scroll the terminal with a `⚙ Bash(...)` line and a `✓ → N lines (… chars)` line for *every* tool call, and the permission prompt dumped the entire inline script (e.g. a 60-line `python3 << 'PYEOF'` heredoc). A new **quiet mode (on by default)** suppresses the per-tool lines — the spinner conveys live activity and a single summary line is emitted at the tool→text boundary, sitting just above the reply (`Read 2 files, ran 3 shell commands`), the way Claude Code does. Errors and denials still surface so a mid-turn failure is never silent. In quiet mode the **permission prompt also collapses** a multi-line command to one line (`Run: python3 << 'PYEOF'  … (+59 行)`) instead of printing the whole script. `/verbose` overrides quiet (full per-tool lines + inputs + token counts); toggle with **`/quiet`**, or launch with **`--show-tools`** (alias `--no-quiet`). The startup banner gains an **`Output: quiet` / `Output: full`** line so the active mode is visible at a glance. **Live status line:** the spinner now shows elapsed time plus a running output-token estimate (`Thinking… (7s · ↓ 435 tokens)`) — char-based, since providers only report real usage at the end — and each quiet turn closes with a real-usage footer **`✻ Worked for 7.2s · ↑ 1.2k · ↓ 435`** built from the true `TurnDone` counts. Implemented in `ui/render.py` (turn-level tool accumulator + `turn_summary_line()`, spinner token meter, `print_turn_stats()`), wired through the REPL event loop in `cheetahclaws.py`, with the `/quiet` toggle in `commands/config_cmd.py`. See [docs/guides/features.md](guides/features.md).
+- June 5, 2026 (**v3.05.82**) (latest): **Adaptive Markdown streaming — live output that stays correct on every device.** In-place Rich Live redraw is great on capable terminals but breaks elsewhere: it was disabled wholesale over SSH (so SSH users got raw tokens with no formatting), and where it *did* run it could leave **duplicate or stale frames** — on macOS Terminal (which can't erase above the scroll boundary), over laggy network PTYs, or with **wide CJK / emoji text** whose display width a naive line-count gets wrong. The renderer now selects a **streaming tier per device** in `ui.render.auto_stream_mode(config)`: **`live`** — full in-place redraw, only on terminals known to handle cursor-up (local TTYs, and modern emulators *even over SSH*: iTerm2, WezTerm, Windows Terminal, VSCode, kitty, Alacritty, Ghostty, detected via `TERM_PROGRAM` / `TERM` / `WT_SESSION` / `KITTY_WINDOW_ID` / `ALACRITTY_WINDOW_ID` / `WEZTERM_PANE`); **`commit`** — **append-only progressive Markdown**, the safe default for unknown-SSH / Apple Terminal / pipes / non-TTY, where each completed block (split on blank lines, respecting open code fences so a fenced block renders atomically) is rendered and printed **permanently** and the cursor is **never moved**, making a duplicate frame structurally impossible regardless of terminal, latency, or character width; **`plain`** — raw tokens, only when `rich` is unavailable. The append-only floor is provably duplication-free; `live` is progressive enhancement on top. Override with **`/config stream_mode=live|commit|plain`** (legacy boolean **`/config rich_live=true|false`** still works → `live`/`commit`). Implemented in `ui/render.py` (`set_stream_mode` / `auto_stream_mode` / `_safe_commit_point` / `_commit_stream` / `_commit_flush`), wired in at REPL start in `cheetahclaws.py`, with a 26-case test suite in `tests/test_stream_modes.py` (device routing, code-fence-aware block boundaries, append-only commit, and a regression asserting commit mode emits **zero** cursor sequences even on a TTY with CJK text). Two related UX items shipped alongside: **`/context` is now a visual grid** — a Claude-Code-style 20×10 cell grid of context-window usage, colored and broken down by category (system prompt / system tools / memory files / skills / messages / free space) with per-category token counts and percentages, adapting to the model's real context window and falling back to `#`/`.` on non-UTF-8 terminals (`commands/core.py:cmd_context`); and **`deepseek-v4-flash` is registered at its 1M context window** in `providers._MODEL_CONTEXT_LIMITS` (overriding the 128K deepseek provider default, which still applies to `deepseek-chat` / `deepseek-v4-pro`), so the prompt `%`, `/context`, and the compaction trigger all reflect the true 1M window. See [docs/guides/features.md](guides/features.md) · [docs/guides/reference.md](guides/reference.md).
+- June 4, 2026 (**v3.05.81**): **Claude-Code-style quiet output — hide tool execution, show one summary line per turn.** Long analysis turns used to scroll the terminal with a `⚙ Bash(...)` line and a `✓ → N lines (… chars)` line for *every* tool call, and the permission prompt dumped the entire inline script (e.g. a 60-line `python3 << 'PYEOF'` heredoc). A new **quiet mode (on by default)** suppresses the per-tool lines — the spinner conveys live activity and a single summary line is emitted at the tool→text boundary, sitting just above the reply (`Read 2 files, ran 3 shell commands`), the way Claude Code does. Errors and denials still surface so a mid-turn failure is never silent. In quiet mode the **permission prompt also collapses** a multi-line command to one line (`Run: python3 << 'PYEOF'  … (+59 行)`) instead of printing the whole script. `/verbose` overrides quiet (full per-tool lines + inputs + token counts); toggle with **`/quiet`**, or launch with **`--show-tools`** (alias `--no-quiet`). The startup banner gains an **`Output: quiet` / `Output: full`** line so the active mode is visible at a glance. **Live status line:** the spinner now shows elapsed time plus a running output-token estimate (`Thinking… (7s · ↓ 435 tokens)`) — char-based, since providers only report real usage at the end — and each quiet turn closes with a real-usage footer **`✻ Worked for 7.2s · ↑ 1.2k · ↓ 435`** built from the true `TurnDone` counts. Implemented in `ui/render.py` (turn-level tool accumulator + `turn_summary_line()`, spinner token meter, `print_turn_stats()`), wired through the REPL event loop in `cheetahclaws.py`, with the `/quiet` toggle in `commands/config_cmd.py`. See [docs/guides/features.md](guides/features.md).
 - June 4, 2026: **Context-window override — the prompt % and compaction now follow a settable context length.** The prompt's context-usage `%` (and the compaction trigger) derive from the model's context window, which previously could only be a hardcoded provider default — and `max_tokens` (the OUTPUT cap) doesn't change it, so `/config max_tokens=…` left the `%` unchanged (a common point of confusion). New per-session key **`context_window`** (`/config context_window=<N>`, `0` = model default) overrides it, kept deliberately distinct from `max_tokens`. A single parser (`providers.context_window_override`) feeds the prompt `%`, `/context`, the compaction trigger, **and** the per-call output-token cap, so all four stay consistent; it is bidirectional — a smaller value forces earlier compaction, a larger value corrects a stale default. The value is read live each prompt, so switching model **or** `context_window` updates the `%` with no restart. `/config` warns when the value exceeds the model's real window (which would disable compaction and let the API reject oversized prompts). No-op when unset, so existing behavior is unchanged. See [docs/guides/reference.md](guides/reference.md).
 - June 4, 2026: **Rich Live streaming — long responses stay live via a bounded tail window.** Large streamed responses that would overflow the terminal's redraw area could leave duplicate or stale frames behind on some emulators (macOS Terminal, etc.), because Rich Live redraws the whole accumulated output in place and the cursor can't reach content that has scrolled into the scrollback. Building on the per-response fallback from PR #133, Rich Live now keeps the live region **bounded to the viewport**: a short response is shown in full, but once it would overflow, only the **last screenful of rendered lines (a tail window) is redrawn** — so the Live region can never exceed the terminal and cannot leave stale frames. The complete output is committed once when the response finishes (including on Ctrl-C, since the REPL flushes on interrupt), so the head that scrolled out of the window is never lost. Plain streaming is kept only as a safety net (precise render failed, or the terminal is too small to bound a window). A cheap per-line wrap estimate short-circuits the expensive full `render_lines()` measurement while a response stays well under the limit, so normal responses pay no extra Markdown re-render per chunk. Adds focused tests covering full-frame streaming, the full→tail transition, tail-window commit-on-flush, real `Segments` rendering, and both safety-net fallbacks. See [docs/guides/features.md](guides/features.md).
 
diff --git a/providers.py b/providers.py
index 534365b..df7537d 100644
--- a/providers.py
+++ b/providers.py
@@ -369,6 +369,10 @@ def nim_next_model(current: str) -> str | None:
     "gemma-2-27b-it":              8192,
     "gemma3":                      8192,
     "gemma4":                      8192,
+    # deepseek-v4-flash ships a 1M context window. Per-model entry overrides
+    # the deepseek provider default (128k), which still applies to v4-pro and
+    # the older deepseek-chat / deepseek-reasoner API models.
+    "deepseek-v4-flash":           1000000,
     # DeepSeek local variants
     "deepseek-r1":                 65536,
     "deepseek-coder-v2":           128000,
diff --git a/pyproject.toml b/pyproject.toml
index 7f0bb0d..b5d0aeb 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "cheetahclaws"
-version = "3.05.81"
+version = "3.05.82"
 description = "CheetahClaws: An Extensible, Python-Native Agent System for Autonomous Multi-Model Workflows"
 readme = "README.md"
 requires-python = ">=3.10"
diff --git a/tests/fixtures/golden_default_prompt.txt b/tests/fixtures/golden_default_prompt.txt
index b19014b..9b8a42a 100644
--- a/tests/fixtures/golden_default_prompt.txt
+++ b/tests/fixtures/golden_default_prompt.txt
@@ -115,7 +115,7 @@ These commands the **user** can invoke at the REPL prompt — they are NOT tools
 - `/cloudsave` `[setup | auto | list | load | push]` — Cloud-sync sessions to GitHub Gist
 - `/compact` — Compact conversation history
 - `/config` — Show / set config key=value
-- `/context` — Show token-context usage
+- `/context` — Visualize context-window usage by category
 - `/copy` — Copy last response to clipboard
 - `/cost` — Show cost estimate
 - `/cwd` — Show / change working directory
@@ -138,6 +138,7 @@ These commands the **user** can invoke at the REPL prompt — they are NOT tools
 - `/plugin` `[install | uninstall | enable | disable | disable-all | update | recommend | info]` — Manage plugins
 - `/proactive` `[off]` — Manage proactive background watcher
 - `/qq` `[<appid> <secret> | stop | status]` — QQ bot bridge (botpy SDK)
+- `/quiet` — Toggle compact tool display
 - `/quit` — Exit (alias for /exit)
 - `/resume` — Resume last session
 - `/rewind` `[clear]` — Rewind to checkpoint (alias)
diff --git a/tests/test_compaction.py b/tests/test_compaction.py
index 23fe2a1..548755e 100644
--- a/tests/test_compaction.py
+++ b/tests/test_compaction.py
@@ -104,11 +104,11 @@ def test_gemini(self):
         assert get_context_limit("gemini-2.0-flash") == 1000000
 
     def test_deepseek(self):
-        # Raised to 128K on the v4 update — DeepSeek's real context window
-        # has been 128K since v3, and v4 keeps that.
+        # deepseek-chat / v4-pro stay at the 128K provider default; v4-flash
+        # ships a 1M context window (per-model registry override).
         assert get_context_limit("deepseek-chat") == 128000
         assert get_context_limit("deepseek-v4-pro") == 128000
-        assert get_context_limit("deepseek-v4-flash") == 128000
+        assert get_context_limit("deepseek-v4-flash") == 1000000
 
     def test_openai(self):
         assert get_context_limit("gpt-4o") == 128000
diff --git a/tests/test_stream_modes.py b/tests/test_stream_modes.py
new file mode 100644
index 0000000..78c89b1
--- /dev/null
+++ b/tests/test_stream_modes.py
@@ -0,0 +1,182 @@
+"""Tests for the adaptive streaming tiers in ui.render.
+
+Covers:
+  - auto_stream_mode device routing (live / commit / plain)
+  - _safe_commit_point block detection (incl. code fences)
+  - commit-mode stream_text / flush_response (append-only progressive Markdown)
+  - the bounded, self-healing in-progress preview
+"""
+import platform
+
+import pytest
+
+import ui.render as render
+
+
+# ── auto_stream_mode routing ────────────────────────────────────────────────
+
+class _Console:
+    def __init__(self, is_terminal=True, is_dumb_terminal=False, height=100, width=80):
+        self.is_terminal = is_terminal
+        self.is_dumb_terminal = is_dumb_terminal
+        self.height = height
+        self.width = width
+        self.printed = []
+
+    def print(self, value):
+        self.printed.append(value)
+
+
+@pytest.fixture
+def clean_env(monkeypatch):
+    """A real-TTY console + a baseline env with no terminal-identifying vars."""
+    for var in ("SSH_CLIENT", "SSH_TTY", "TERM_PROGRAM", "TERM", "WT_SESSION",
+                "KITTY_WINDOW_ID", "ALACRITTY_WINDOW_ID", "WEZTERM_PANE"):
+        monkeypatch.delenv(var, raising=False)
+    monkeypatch.setattr(render, "_RICH", True)
+    monkeypatch.setattr(render, "console", _Console())
+    monkeypatch.setattr(platform, "system", lambda: "Linux")
+    return monkeypatch
+
+
+def test_explicit_stream_mode_wins(clean_env):
+    assert render.auto_stream_mode({"stream_mode": "plain"}) == "plain"
+    assert render.auto_stream_mode({"stream_mode": "commit"}) == "commit"
+    assert render.auto_stream_mode({"stream_mode": "live"}) == "live"
+
+
+def test_legacy_rich_live_flag(clean_env):
+    assert render.auto_stream_mode({"rich_live": True}) == "live"
+    # Legacy False now maps to the rich append-only tier, not raw plain.
+    assert render.auto_stream_mode({"rich_live": False}) == "commit"
+
+
+def test_no_rich_is_plain(clean_env):
+    clean_env.setattr(render, "_RICH", False)
+    assert render.auto_stream_mode({}) == "plain"
+
+
+def test_local_tty_gets_live(clean_env):
+    assert render.auto_stream_mode({}) == "live"
+
+
+def test_dumb_terminal_gets_commit(clean_env):
+    clean_env.setattr(render, "console", _Console(is_dumb_terminal=True))
+    assert render.auto_stream_mode({}) == "commit"
+
+
+def test_non_tty_gets_commit(clean_env):
+    clean_env.setattr(render, "console", _Console(is_terminal=False))
+    assert render.auto_stream_mode({}) == "commit"
+
+
+def test_unknown_ssh_terminal_gets_commit(clean_env):
+    clean_env.setenv("SSH_CLIENT", "1.2.3.4 5555 22")
+    assert render.auto_stream_mode({}) == "commit"
+
+
+def test_modern_terminal_over_ssh_gets_live(clean_env):
+    clean_env.setenv("SSH_CLIENT", "1.2.3.4 5555 22")
+    clean_env.setenv("TERM_PROGRAM", "vscode")
+    assert render.auto_stream_mode({}) == "live"
+
+
+def test_windows_terminal_over_ssh_gets_live(clean_env):
+    clean_env.setenv("SSH_TTY", "/dev/pts/0")
+    clean_env.setenv("WT_SESSION", "abc-123")
+    assert render.auto_stream_mode({}) == "live"
+
+
+def test_apple_terminal_gets_commit(clean_env):
+    clean_env.setattr(platform, "system", lambda: "Darwin")
+    clean_env.setenv("TERM_PROGRAM", "Apple_Terminal")
+    assert render.auto_stream_mode({}) == "commit"
+
+
+def test_iterm_on_macos_gets_live(clean_env):
+    clean_env.setattr(platform, "system", lambda: "Darwin")
+    clean_env.setenv("TERM_PROGRAM", "iTerm.app")
+    assert render.auto_stream_mode({}) == "live"
+
+
+# ── _safe_commit_point ──────────────────────────────────────────────────────
+
+def test_commit_point_no_complete_block():
+    text = "still typing the first paragraph"
+    assert render._safe_commit_point(text, 0) == 0
+
+
+def test_commit_point_commits_completed_paragraph():
+    text = "first paragraph\n\nsecond, in progress"
+    # Boundary is just after the "\n\n".
+    assert render._safe_commit_point(text, 0) == len("first paragraph\n\n")
+
+
+def test_commit_point_does_not_split_open_code_fence():
+    # A blank line INSIDE an unclosed ``` fence must not be a commit point.
+    text = "intro\n\n```python\ncode line\n\nmore code"
+    assert render._safe_commit_point(text, 0) == len("intro\n\n")
+
+
+def test_commit_point_commits_after_fence_closes():
+    text = "intro\n\n```python\ncode\n\nmore\n```\n\nafter"
+    point = render._safe_commit_point(text, 0)
+    # Everything up to and including the blank line after the closing fence.
+    assert text[:point].endswith("```\n\n")
+    assert "after" not in text[:point]
+
+
+# ── commit-mode streaming ───────────────────────────────────────────────────
+
+@pytest.fixture
+def commit_mode(monkeypatch):
+    fake = _Console(is_terminal=True, height=40)   # even on a TTY, commit is append-only
+    monkeypatch.setattr(render, "_RICH", True)
+    monkeypatch.setattr(render, "console", fake)
+    monkeypatch.setattr(render, "_STREAM_MODE", "commit")
+    monkeypatch.setattr(render, "_make_renderable", lambda text: text)
+    monkeypatch.setattr(render, "_accumulated_text", [])
+    monkeypatch.setattr(render, "_commit_idx", 0)
+    return fake
+
+
+def test_commit_mode_commits_blocks_appendonly(commit_mode, capsys):
+    render.stream_text("# Title\n\n")        # completes a block → committed
+    render.stream_text("body still going")   # incomplete → buffered, no commit
+    assert commit_mode.printed == ["# Title"]
+
+    render.flush_response()
+    assert commit_mode.printed == ["# Title", "body still going"]
+    assert render._commit_idx == 0           # state reset after flush
+
+
+def test_commit_mode_emits_no_cursor_sequences(commit_mode, capsys):
+    """Regression: commit mode must NEVER issue cursor-up / erase ANSI, even on a
+    TTY — that was the source of duplicated frames over SSH / with CJK text."""
+    for chunk in ["第一段，正在", "输入中的内容", "\n\n", "第二段也在写", "更多文字"]:
+        render.stream_text(chunk)
+    render.flush_response()
+    out = capsys.readouterr().out
+    assert "\x1b[" not in out                 # no cursor control of any kind
+    # Each block rendered exactly once → no duplication.
+    assert commit_mode.printed == ["第一段，正在输入中的内容", "第二段也在写更多文字"]
+
+
+def test_commit_mode_streaming_chunks_commit_each_block_once(commit_mode):
+    """A long block streamed token-by-token commits exactly once when it closes
+    (not re-emitted on every chunk)."""
+    text = "这是一个很长的段落" * 20 + "\n\n尾巴"
+    for ch in text:                           # one char at a time, like a real stream
+        render.stream_text(ch)
+    render.flush_response()
+    assert commit_mode.printed == ["这是一个很长的段落" * 20, "尾巴"]
+
+
+def test_commit_mode_renders_full_fenced_block_atomically(commit_mode):
+    for chunk in ["```py\n", "x = 1\n", "\n", "y = 2\n", "```\n\n", "done"]:
+        render.stream_text(chunk)
+    render.flush_response()
+    # The whole code fence is one committed block; "done" is the trailing block.
+    assert commit_mode.printed[0].startswith("```py")
+    assert commit_mode.printed[0].rstrip().endswith("```")
+    assert commit_mode.printed[-1] == "done"
diff --git a/ui/__init__.py b/ui/__init__.py
index d2d2004..1fdec41 100644
--- a/ui/__init__.py
+++ b/ui/__init__.py
@@ -7,5 +7,5 @@
     _TOOL_SPINNER_PHRASES, _DEBATE_SPINNER_PHRASES,
     _start_tool_spinner, _stop_tool_spinner, _change_spinner_phrase,
     print_tool_start, print_tool_end, _tool_desc,
-    set_rich_live, set_spinner_tips,
+    set_rich_live, set_stream_mode, auto_stream_mode, set_spinner_tips,
 )
diff --git a/ui/render.py b/ui/render.py
index 1ab01b1..c10069c 100644
--- a/ui/render.py
+++ b/ui/render.py
@@ -154,14 +154,104 @@ def _has_diff(text: str) -> bool:
 
 _accumulated_text: list[str] = []   # buffer text during streaming
 _current_live = None                # active Rich Live instance (one at a time)
-_RICH_LIVE = True                   # set False (via config rich_live=false) to disable
+_RICH_LIVE = True                   # True only in "live" mode (in-place redraw)
 _plain_streaming_response = False   # current response has fallen back from Live
 _live_shows_full = False            # True when the live frame holds the whole response (not a tail window)
 
+# ── Adaptive streaming mode ────────────────────────────────────────────────
+# Three tiers, chosen per-device (see auto_stream_mode):
+#   "live"   — full in-place Rich Live redraw. Best experience, but the
+#              cursor-up rewrite breaks on some terminals (Apple Terminal can't
+#              erase above the scroll boundary; flaky network PTYs duplicate
+#              frames), so it is reserved for terminals known to handle it.
+#   "commit" — append-only progressive Markdown. Completed blocks are rendered
+#              and printed permanently (never redrawn). Pure append-only: it
+#              issues NO cursor-up / erase sequences at all, so it can never
+#              leave duplicate frames — correct over SSH / Apple Terminal /
+#              pipes / CJK-wide text alike, while still showing rich Markdown
+#              block by block. The universal default for non-"live" terminals.
+#   "plain"  — raw token stream (only when Rich is unavailable).
+_STREAM_MODE = "live" if _RICH else "plain"
+_commit_idx = 0                     # chars of the response already committed (rendered + printed)
+
+
+def set_stream_mode(mode: str) -> None:
+    """Select the streaming tier ('live' | 'commit' | 'plain')."""
+    global _STREAM_MODE, _RICH_LIVE
+    if mode not in ("live", "commit", "plain") or not _RICH:
+        mode = mode if (mode == "plain") else ("commit" if _RICH else "plain")
+    _STREAM_MODE = mode
+    _RICH_LIVE = (mode == "live")
+
+
 def set_rich_live(enabled: bool) -> None:
-    """Called from repl.py to apply the rich_live config setting."""
-    global _RICH_LIVE
-    _RICH_LIVE = _RICH and enabled
+    """Back-compat shim for the old boolean rich_live config.
+
+    True  → full in-place Live. False → 'commit' (still rich, just append-only
+    instead of plain raw tokens, which is a strict UX upgrade over the old
+    behaviour). New code should call set_stream_mode / auto_stream_mode."""
+    set_stream_mode("live" if (enabled and _RICH) else "commit")
+
+
+# Terminal emulators known to handle in-place cursor-up redraw reliably, even
+# over SSH. Detected via TERM_PROGRAM, TERM, or an emulator-specific env var.
+_GOOD_TERM_PROGRAMS = {
+    "iTerm.app", "WezTerm", "vscode", "ghostty", "rio", "Tabby", "Hyper",
+    "Warp", "kitty",
+}
+
+
+def auto_stream_mode(config: dict | None = None) -> str:
+    """Pick the best streaming tier for the current device.
+
+    Priority: explicit config override → capability detection. Capable
+    terminals (local TTYs and modern emulators, incl. over SSH) get 'live';
+    everything else with Rich gets the safe-but-rich 'commit' tier; only a
+    missing Rich install falls all the way back to 'plain'.
+    """
+    import os as _os
+    import platform as _plat
+
+    cfg = config or {}
+    explicit = cfg.get("stream_mode")
+    if explicit in ("live", "commit", "plain"):
+        return explicit
+    rl = cfg.get("rich_live")
+    if rl is True:
+        return "live"
+    if rl is False:
+        return "commit"
+
+    if not _RICH or console is None:
+        return "plain"
+    if getattr(console, "is_dumb_terminal", False):
+        return "commit"
+    # Not a real TTY (piped / redirected / captured): append-only, no cursor games.
+    if not getattr(console, "is_terminal", False):
+        return "commit"
+
+    term = _os.environ.get("TERM", "") or ""
+    term_program = _os.environ.get("TERM_PROGRAM", "") or ""
+    in_ssh = bool(_os.environ.get("SSH_CLIENT") or _os.environ.get("SSH_TTY"))
+    is_apple_terminal = (_plat.system() == "Darwin"
+                         and term_program in ("Apple_Terminal", ""))
+    modern = (
+        term_program in _GOOD_TERM_PROGRAMS
+        or "kitty" in term
+        or "alacritty" in term
+        or bool(_os.environ.get("WT_SESSION"))          # Windows Terminal
+        or bool(_os.environ.get("KITTY_WINDOW_ID"))
+        or bool(_os.environ.get("ALACRITTY_WINDOW_ID"))
+        or bool(_os.environ.get("WEZTERM_PANE"))
+    )
+
+    # Apple Terminal has a real cursor-erase bug → never full Live.
+    if is_apple_terminal:
+        return "commit"
+    # Untrusted network terminal → safe rich commit instead of risky redraw.
+    if in_ssh and not modern:
+        return "commit"
+    return "live"
 
 def _make_renderable(text: str):
     """Return a Rich renderable: Markdown if text contains markup, else plain."""
@@ -257,6 +347,62 @@ def _stop_live(clear: bool = False) -> None:
     _current_live = None
 
 
+# ── Commit-mode streaming (append-only progressive Markdown) ────────────────
+
+def _safe_commit_point(text: str, start: int) -> int:
+    """Return the index just after the last *completed* block at/after `start`.
+
+    A block ends at a blank line ("\\n\\n") that is NOT inside an unclosed code
+    fence. Counting ``` markers in the prefix tells us the fence state, so a
+    fenced code block (which may itself contain blank lines) is only ever
+    committed as a whole once its closing fence arrives — never rendered
+    half-open. Returns `start` when no new complete block is available yet.
+    """
+    best = start
+    i = text.find("\n\n", start)
+    while i != -1:
+        candidate = i + 2
+        if text.count("```", 0, candidate) % 2 == 0:   # fence is closed here
+            best = candidate
+        i = text.find("\n\n", i + 1)
+    return best
+
+
+def _commit_stream() -> None:
+    """Render + permanently print any newly-completed blocks (append-only).
+
+    Issues no cursor movement whatsoever: each completed block is printed once
+    and never touched again, so there is no way to leave a duplicate or stale
+    frame regardless of terminal, network latency, or wide (CJK/emoji) text. The
+    still-incomplete trailing block stays buffered and appears when it closes (or
+    at flush); the spinner conveys liveness in the meantime."""
+    global _commit_idx
+    full = "".join(_accumulated_text)
+    point = _safe_commit_point(full, _commit_idx)
+    if point > _commit_idx:
+        block = full[_commit_idx:point].strip("\n")
+        if block.strip():
+            try:
+                console.print(_make_renderable(block))
+            except Exception:
+                print(block)
+        _commit_idx = point
+
+
+def _commit_flush() -> None:
+    """Render+commit the final trailing block and reset commit state."""
+    global _commit_idx
+    full = "".join(_accumulated_text)
+    tail = full[_commit_idx:].strip("\n")
+    if tail.strip():
+        try:
+            console.print(_make_renderable(tail))
+        except Exception:
+            print(tail)
+    _accumulated_text.clear()
+    _commit_idx = 0
+
+
 def stream_text(chunk: str) -> None:
     """Buffer chunk; update Live in-place when Rich available, else print directly.
 
@@ -274,7 +420,20 @@ def stream_text(chunk: str) -> None:
     the scrollback. It is re-committed in full when the response finishes — including
     on Ctrl-C, since the REPL flushes on interrupt — so nothing is ever lost, it is
     just not visible live until completion.
+
+    Mode dispatch: "plain" prints raw tokens, "commit" delegates to the
+    append-only progressive-Markdown renderer, and "live" (below) does the
+    in-place Rich Live redraw described above.
     """
+    if not _RICH or _STREAM_MODE == "plain":
+        print(chunk, end="", flush=True)
+        return
+
+    if _STREAM_MODE == "commit":
+        _accumulated_text.append(chunk)
+        _commit_stream()
+        return
+
     if _plain_streaming_response:
         print(chunk, end="", flush=True)
         return
@@ -325,6 +484,9 @@ def stream_thinking(chunk: str, verbose: bool):
 def flush_response() -> None:
     """Commit buffered text to screen, then reset per-response streaming state."""
     global _plain_streaming_response, _live_shows_full
+    if _STREAM_MODE == "commit":
+        _commit_flush()
+        return
     full = "".join(_accumulated_text)
     _accumulated_text.clear()
     if _current_live is not None: