MoonshotAI · RealKai42 · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026 · Apr 2, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,6 +11,7 @@ Only write entries that are worth mentioning to users.
 
 ## Unreleased
 
+- Core: Improve session startup resilience — `--continue`/`--resume` now tolerate malformed `context.jsonl` records and corrupted subagent, background-task, or notification artifacts; the CLI skips invalid persisted state where possible instead of failing to restore the session
 - Grep: Add `include_ignored` parameter to search files excluded by `.gitignore` — when set to `true`, ripgrep's `--no-ignore` flag is enabled, allowing searches in gitignored artifacts such as build outputs or `node_modules`; sensitive files (like `.env`) remain filtered by the sensitive-file protection layer; defaults to `false` to preserve existing behavior
 - Core: Add sensitive file protection to Grep and Read tools — `.env`, SSH private keys (`id_rsa`, `id_ed25519`, `id_ecdsa`), and cloud credentials (`.aws/credentials`, `.gcp/credentials`) are now detected and blocked; Grep filters them from results with a warning, Read rejects them outright; `.env.example`/`.env.sample`/`.env.template` are exempted
 - Core: Fix parallel foreground subagent approval requests hanging the session — in interactive shell mode, `_set_active_approval_sink` no longer flushes pending approval requests to the live view sink (which cannot render approval modals); requests stay in the pending queue for the prompt modal path; also adds a 300-second timeout to `wait_for_response` so that any unresolved approval request eventually raises `ApprovalCancelledError` instead of hanging forever

diff --git a/docs/en/release-notes/changelog.md b/docs/en/release-notes/changelog.md
@@ -4,8 +4,9 @@ This page documents the changes in each Kimi Code CLI release.
 
 ## Unreleased
 
+- Core: Improve session startup resilience — `--continue`/`--resume` now tolerate malformed `context.jsonl` records and corrupted subagent, background-task, or notification artifacts; the CLI skips invalid persisted state where possible instead of failing to restore the session
+- Grep: Add `include_ignored` parameter to search files excluded by `.gitignore` — when set to `true`, ripgrep's `--no-ignore` flag is enabled, allowing searches in gitignored artifacts such as build outputs or `node_modules`; sensitive files (like `.env`) remain filtered by the sensitive-file protection layer; defaults to `false` to preserve existing behavior
 - CLI: Improve `kimi export` session export UX — `kimi export` now previews the previous session for the current working directory and asks for confirmation, showing the session ID, title, and last user-message time; adds `--yes` to skip confirmation; also fixes explicit session-ID invocations where `--output` after the argument was incorrectly parsed as a subcommand
-- Grep: Add `include_ignored` parameter to search files excluded by `.gitignore` — when set to `true`, ripgrep's `--no-ignore` flag is enabled, allowing searches in files like `.env` or `node_modules` that are normally gitignored; defaults to `false` to preserve existing behavior
 - Core: Add sensitive file protection to Grep and Read tools — `.env`, SSH private keys (`id_rsa`, `id_ed25519`, `id_ecdsa`), and cloud credentials (`.aws/credentials`, `.gcp/credentials`) are now detected and blocked; Grep filters them from results with a warning, Read rejects them outright; `.env.example`/`.env.sample`/`.env.template` are exempted
 - Core: Fix parallel foreground subagent approval requests hanging the session — in interactive shell mode, `_set_active_approval_sink` no longer flushes pending approval requests to the live view sink (which cannot render approval modals); requests stay in the pending queue for the prompt modal path; also adds a 300-second timeout to `wait_for_response` so that any unresolved approval request eventually raises `ApprovalCancelledError` instead of hanging forever
 - CLI: Add `--session`/`--resume` (`-S`/`-r`) flag to resume sessions — without an argument opens an interactive session picker (shell UI only); with a session ID resumes that specific session; replaces the reverted `--pick-session`/`--list-sessions` design with a unified optional-value flag

diff --git a/docs/zh/release-notes/changelog.md b/docs/zh/release-notes/changelog.md
@@ -4,6 +4,7 @@
 
 ## 未发布
 
+- Core：提升会话启动恢复的鲁棒性——`--continue`/`--resume` 现在可容忍损坏的 `context.jsonl` 记录，以及损坏的子 Agent、后台任务或通知持久化工件；CLI 会尽可能跳过无效状态并继续恢复会话，而不是直接启动失败
 - CLI：改进 `kimi export` 会话导出体验——`kimi export` 现在默认预览并确认当前工作目录的上一个会话，显示会话 ID、标题和最后一条用户消息时间；新增 `--yes` 跳过确认；同时修复显式会话 ID 时 `--output` 放在参数后面会被错误解析为子命令的问题
 - Grep：新增 `include_ignored` 参数，支持搜索被 `.gitignore` 排除的文件——设为 `true` 时启用 ripgrep 的 `--no-ignore` 标志，可搜索构建产物或 `node_modules` 等通常被忽略的文件；敏感文件（如 `.env`）仍由敏感文件保护层过滤；默认 `false`，不影响现有行为
 - Core：为 Grep 和 Read 工具添加敏感文件保护——`.env`、SSH 私钥（`id_rsa`、`id_ed25519`、`id_ecdsa`）和云凭据（`.aws/credentials`、`.gcp/credentials`）会被检测并拦截；Grep 从结果中过滤并显示警告，Read 直接拒绝读取；`.env.example`/`.env.sample`/`.env.template` 不受影响

diff --git a/src/kimi_cli/background/store.py b/src/kimi_cli/background/store.py
@@ -4,7 +4,10 @@
 import re
 from pathlib import Path
 
+from pydantic import BaseModel, ValidationError
+
 from kimi_cli.utils.io import atomic_json_write
+from kimi_cli.utils.logging import logger
 
 from .models import (
     TaskConsumerState,
@@ -104,7 +107,12 @@ def read_runtime(self, task_id: str) -> TaskRuntime:
         path = self.runtime_path(task_id)
         if not path.exists():
             return TaskRuntime()
-        return TaskRuntime.model_validate_json(path.read_text(encoding="utf-8"))
+        return _read_json_model(
+            path,
+            TaskRuntime,
+            fallback=TaskRuntime(updated_at=0),
+            artifact="task runtime",
+        )
 
     def write_control(self, task_id: str, control: TaskControl) -> None:
         atomic_json_write(control.model_dump(mode="json"), self.control_path(task_id))
@@ -113,7 +121,12 @@ def read_control(self, task_id: str) -> TaskControl:
         path = self.control_path(task_id)
         if not path.exists():
             return TaskControl()
-        return TaskControl.model_validate_json(path.read_text(encoding="utf-8"))
+        return _read_json_model(
+            path,
+            TaskControl,
+            fallback=TaskControl(),
+            artifact="task control",
+        )
 
     def write_consumer(self, task_id: str, consumer: TaskConsumerState) -> None:
         atomic_json_write(consumer.model_dump(mode="json"), self.consumer_path(task_id))
@@ -122,7 +135,12 @@ def read_consumer(self, task_id: str) -> TaskConsumerState:
         path = self.consumer_path(task_id)
         if not path.exists():
             return TaskConsumerState()
-        return TaskConsumerState.model_validate_json(path.read_text(encoding="utf-8"))
+        return _read_json_model(
+            path,
+            TaskConsumerState,
+            fallback=TaskConsumerState(),
+            artifact="task consumer state",
+        )
 
     def merged_view(self, task_id: str) -> TaskView:
         return TaskView(
@@ -133,7 +151,17 @@ def merged_view(self, task_id: str) -> TaskView:
         )
 
     def list_views(self) -> list[TaskView]:
-        views = [self.merged_view(task_id) for task_id in self.list_task_ids()]
+        views: list[TaskView] = []
+        for task_id in self.list_task_ids():
+            try:
+                views.append(self.merged_view(task_id))
+            except (OSError, ValidationError, ValueError, UnicodeDecodeError) as exc:
+                logger.warning(
+                    "Skipping invalid background task {task_id} from {path}: {error}",
+                    task_id=task_id,
+                    path=self.root / task_id / self.SPEC_FILE,
+                    error=exc,
+                )
         views.sort(
             key=lambda view: view.runtime.updated_at or view.spec.created_at,
             reverse=True,
@@ -194,3 +222,16 @@ def tail_output(self, task_id: str, max_bytes: int, max_lines: int) -> str:
         if len(lines) > max_lines:
             lines = lines[-max_lines:]
         return "\n".join(lines)
+
+
+def _read_json_model[T: BaseModel](path: Path, model: type[T], *, fallback: T, artifact: str) -> T:
+    try:
+        return model.model_validate_json(path.read_text(encoding="utf-8"))
+    except (OSError, ValidationError, ValueError, UnicodeDecodeError) as exc:
+        logger.warning(
+            "Failed to read {artifact} from {path}; using defaults: {error}",
+            artifact=artifact,
+            path=path,
+            error=exc,
+        )
+        return fallback
diff --git a/src/kimi_cli/notifications/store.py b/src/kimi_cli/notifications/store.py
@@ -3,7 +3,10 @@
 import re
 from pathlib import Path
 
+from pydantic import ValidationError
+
 from kimi_cli.utils.io import atomic_json_write
+from kimi_cli.utils.logging import logger
 
 from .models import NotificationDelivery, NotificationEvent, NotificationView
 
@@ -80,7 +83,15 @@ def read_delivery(self, notification_id: str) -> NotificationDelivery:
         path = self.delivery_path(notification_id)
         if not path.exists():
             return NotificationDelivery()
-        return NotificationDelivery.model_validate_json(path.read_text(encoding="utf-8"))
+        try:
+            return NotificationDelivery.model_validate_json(path.read_text(encoding="utf-8"))
+        except (OSError, ValidationError, ValueError, UnicodeDecodeError) as exc:
+            logger.warning(
+                "Failed to read notification delivery {path}; using defaults: {error}",
+                path=path,
+                error=exc,
+            )
+            return NotificationDelivery()
 
     def write_delivery(self, notification_id: str, delivery: NotificationDelivery) -> None:
         atomic_json_write(delivery.model_dump(mode="json"), self.delivery_path(notification_id))
@@ -92,8 +103,16 @@ def merged_view(self, notification_id: str) -> NotificationView:
         )
 
     def list_views(self) -> list[NotificationView]:
-        views = [
-            self.merged_view(notification_id) for notification_id in self.list_notification_ids()
-        ]
+        views: list[NotificationView] = []
+        for notification_id in self.list_notification_ids():
+            try:
+                views.append(self.merged_view(notification_id))
+            except (OSError, ValidationError, ValueError, UnicodeDecodeError) as exc:
+                logger.warning(
+                    "Skipping invalid notification {notification_id} from {path}: {error}",
+                    notification_id=notification_id,
+                    path=self.root / notification_id / self.EVENT_FILE,
+                    error=exc,
+                )
         views.sort(key=lambda view: view.event.created_at, reverse=True)
         return views
diff --git a/src/kimi_cli/soul/context.py b/src/kimi_cli/soul/context.py
@@ -4,10 +4,12 @@
 import json
 from collections.abc import Sequence
 from pathlib import Path
+from typing import Any, cast
 
 import aiofiles
 import aiofiles.os
 from kosong.message import Message
+from pydantic import ValidationError
 
 from kimi_cli.soul.compaction import estimate_text_tokens
 from kimi_cli.soul.message import system
@@ -38,24 +40,26 @@ async def restore(self) -> bool:
             return False
 
         messages_after_last_usage: list[Message] = []
-        async with aiofiles.open(self._file_backend, encoding="utf-8") as f:
+        async with aiofiles.open(self._file_backend, encoding="utf-8", errors="replace") as f:
+            line_no = 0
             async for line in f:
+                line_no += 1
                 if not line.strip():
                     continue
-                line_json = json.loads(line, strict=False)
-                if line_json["role"] == "_system_prompt":
-                    self._system_prompt = line_json["content"]
+                line_json = self._parse_context_line(
+                    line,
+                    file_backend=self._file_backend,
+                    line_no=line_no,
+                )
+                if line_json is None:
                     continue
-                if line_json["role"] == "_usage":
-                    self._token_count = line_json["token_count"]
-                    messages_after_last_usage.clear()
-                    continue
-                if line_json["role"] == "_checkpoint":
-                    self._next_checkpoint_id = line_json["id"] + 1
-                    continue
-                message = Message.model_validate(line_json)
-                self._history.append(message)
-                messages_after_last_usage.append(message)
+                self._apply_context_record(
+                    line_json,
+                    history=self._history,
+                    messages_after_last_usage=messages_after_last_usage,
+                    file_backend=self._file_backend,
+                    line_no=line_no,
+                )
 
         self._pending_token_estimate = estimate_text_tokens(messages_after_last_usage)
         return True
@@ -164,29 +168,34 @@ async def revert_to(self, checkpoint_id: int):
         self._system_prompt = None
         messages_after_last_usage: list[Message] = []
         async with (
-            aiofiles.open(rotated_file_path, encoding="utf-8") as old_file,
+            aiofiles.open(rotated_file_path, encoding="utf-8", errors="replace") as old_file,
             aiofiles.open(self._file_backend, "w", encoding="utf-8") as new_file,
         ):
+            line_no = 0
             async for line in old_file:
+                line_no += 1
                 if not line.strip():
                     continue
 
-                line_json = json.loads(line, strict=False)
-                if line_json["role"] == "_checkpoint" and line_json["id"] == checkpoint_id:
+                line_json = self._parse_context_line(
+                    line,
+                    file_backend=rotated_file_path,
+                    line_no=line_no,
+                )
+                if line_json is None:
+                    continue
+                if line_json.get("role") == "_checkpoint" and line_json.get("id") == checkpoint_id:
                     break
 
-                await new_file.write(line)
-                if line_json["role"] == "_system_prompt":
-                    self._system_prompt = line_json["content"]
-                elif line_json["role"] == "_usage":
-                    self._token_count = line_json["token_count"]
-                    messages_after_last_usage.clear()
-                elif line_json["role"] == "_checkpoint":
-                    self._next_checkpoint_id = line_json["id"] + 1
-                else:
-                    message = Message.model_validate(line_json)
-                    self._history.append(message)
-                    messages_after_last_usage.append(message)
+                keep_line = self._apply_context_record(
+                    line_json,
+                    history=self._history,
+                    messages_after_last_usage=messages_after_last_usage,
+                    file_backend=rotated_file_path,
+                    line_no=line_no,
+                )
+                if keep_line:
+                    await new_file.write(line)
 
         self._pending_token_estimate = estimate_text_tokens(messages_after_last_usage)
 
@@ -237,3 +246,94 @@ async def update_token_count(self, token_count: int):
 
         async with aiofiles.open(self._file_backend, "a", encoding="utf-8") as f:
             await f.write(json.dumps({"role": "_usage", "token_count": token_count}) + "\n")
+
+    def _parse_context_line(
+        self,
+        line: str,
+        *,
+        file_backend: Path,
+        line_no: int,
+    ) -> dict[str, Any] | None:
+        try:
+            line_json = json.loads(line, strict=False)
+        except json.JSONDecodeError as exc:
+            logger.warning(
+                "Skipping malformed context line {line_no} in {file}: {error}",
+                line_no=line_no,
+                file=file_backend,
+                error=exc,
+            )
+            return None
+        if not isinstance(line_json, dict):
+            logger.warning(
+                "Skipping non-object context line {line_no} in {file}",
+                line_no=line_no,
+                file=file_backend,
+            )
+            return None
+        return cast(dict[str, Any], line_json)
+
+    def _apply_context_record(
+        self,
+        line_json: dict[str, Any],
+        *,
+        history: list[Message],
+        messages_after_last_usage: list[Message],
+        file_backend: Path,
+        line_no: int,
+    ) -> bool:
+        role = line_json.get("role")
+        if not isinstance(role, str):
+            logger.warning(
+                "Skipping context line {line_no} in {file}: missing or invalid role",
+                line_no=line_no,
+                file=file_backend,
+            )
+            return False
+        if role == "_system_prompt":
+            content = line_json.get("content")
+            if not isinstance(content, str):
+                logger.warning(
+                    "Skipping invalid system prompt line {line_no} in {file}",
+                    line_no=line_no,
+                    file=file_backend,
+                )
+                return False
+            self._system_prompt = content
+            return True
+        if role == "_usage":
+            token_count = line_json.get("token_count")
+            if not isinstance(token_count, int):
+                logger.warning(
+                    "Skipping invalid usage line {line_no} in {file}",
+                    line_no=line_no,
+                    file=file_backend,
+                )
+                return False
+            self._token_count = token_count
+            messages_after_last_usage.clear()
+            return True
+        if role == "_checkpoint":
+            checkpoint_id = line_json.get("id")
+            if not isinstance(checkpoint_id, int):
+                logger.warning(
+                    "Skipping invalid checkpoint line {line_no} in {file}",
+                    line_no=line_no,
+                    file=file_backend,
+                )
+                return False
+            self._next_checkpoint_id = checkpoint_id + 1
+            return True
+        try:
+            message = Message.model_validate(line_json)
+        except ValidationError as exc:
+            logger.warning(
+                "Skipping invalid context message line {line_no} in {file}: {error}",
+                line_no=line_no,
+                file=file_backend,
+                error=exc,
+            )
+            return False
+        history.append(message)
+        messages_after_last_usage.append(message)
+        return True