Skip to content

fix(ckpt): refuse recover when regular directory occupies workspace path#860

Open
Ziqi002 wants to merge 1 commit into
alibaba:mainfrom
Ziqi002:fix/ckpt-issue-694
Open

fix(ckpt): refuse recover when regular directory occupies workspace path#860
Ziqi002 wants to merge 1 commit into
alibaba:mainfrom
Ziqi002:fix/ckpt-issue-694

Conversation

@Ziqi002

@Ziqi002 Ziqi002 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Description

  • init 检测 symlink 被普通目录替代时,追加提示用户先移走目录再 recover
  • recover 在 rsync 前检查 original_path,非 symlink 直接拒绝以防数据丢失

init 错误提示增强

issue #694 的 symlink 健康检查发现路径不是 symlink 时,现在会追加
note: path is currently a regular directory — move or rename it before running recover to avoid data loss,分行显示避免信息堆叠。

recover 拒绝覆盖普通目录

btrfs_base 和 btrfs_loop 两个后端的 recover_workspace 第 1 步,
如果 original_path 存在且不是 symlink,直接 bail 而不是跳过删除
后让 rsync --delete 静默覆盖用户数据。

Related Issue

closes #694

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional change)
  • Performance improvement
  • CI/CD or build changes

Scope

  • cosh (copilot-shell)
  • sec-core (agent-sec-core)
  • skill (os-skills)
  • sight (agentsight)
  • tokenless (tokenless)
  • memory (agent-memory)
  • ckpt (ws-ckpt)
  • Multiple / Project-wide

Checklist

  • I have read the Contributing Guide
  • My code follows the project's code style
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the documentation accordingly
  • For cosh: Lint passes, type check passes, and tests pass
  • For sec-core (Rust): cargo clippy -- -D warnings and cargo fmt --check pass
  • For sec-core (Python): Ruff format and pytest pass
  • For skill: Skill directory structure is valid and shell scripts pass syntax check
  • For sight: cargo clippy -- -D warnings and cargo fmt --check pass
  • For tokenless: cargo clippy -- -D warnings and cargo fmt --check pass
  • For memory (Linux only): cargo clippy --all-targets -- -D warnings, cargo fmt --check, and cargo test pass
  • Lock files are up to date (package-lock.json / Cargo.lock)

Testing

cargo test -p ws-ckpt-daemon -- workspace_mgr 全部 27 个测试通过,
包括新增的 init_registered_but_regular_dir_returns_error_with_hint

测试脚本:

#!/usr/bin/env bash
# 10-init-recover-guard.sh
# 验证 init/recover 对 symlink 被破坏场景的防护行为
#
# 场景 1: init 报错带 hint(symlink 被替换为普通目录)
# 场景 2: recover 拒绝覆盖普通目录
# 场景 3: 移走目录后 recover 正常工作
# 场景 4: symlink 不存在时 recover 正常(路径已被删除)
#
# 用法: bash 10-init-recover-guard.sh

set -euo pipefail

WS_CKPT="${WS_CKPT:-$(command -v ws-ckpt 2>/dev/null || echo /usr/local/bin/ws-ckpt)}"
WS1="/tmp/test-ws-guard1"
WS2="/tmp/test-ws-guard2"

RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; NC='\033[0m'
PASS=0; FAIL=0

ok()   { echo -e "  ${GREEN}PASS${NC}: $*"; PASS=$((PASS+1)); }
bad()  { echo -e "  ${RED}FAIL${NC}: $*"; FAIL=$((FAIL+1)); }
log()  { echo -e "\n${YELLOW}=== $* ===${NC}"; }

# ── 清理 ──────────────────────────────────────────────────────────
cleanup() {
    "$WS_CKPT" recover -w "$WS1" --force >/dev/null 2>&1 || true
    "$WS_CKPT" recover -w "$WS2" --force >/dev/null 2>&1 || true
    rm -rf "$WS1" "$WS2" "${WS1}.bak" "${WS2}.bak" 2>/dev/null || true
}
trap cleanup EXIT
cleanup   # 先清上一轮残留

# ── 场景 1: init 报错带 hint ──────────────────────────────────────
log "场景 1: init 报错带 hint(symlink → 普通目录)"

# 1a. 建目录 + 正常 init
mkdir -p "$WS1" && echo "seed" > "$WS1/f.txt"
"$WS_CKPT" init -w "$WS1" >/dev/null 2>&1

# 1b. 确认是 symlink
if [ -L "$WS1" ]; then
    ok "init 后路径是 symlink"
else
    bad "init 后路径不是 symlink,后续场景无法验证"
fi

# 1c. 破坏: 删 symlink → 建同名普通目录
rm "$WS1"
mkdir -p "$WS1"

# 1d. 重新 init,预期报错且 stderr 含关键字
set +e
OUT=$("$WS_CKPT" init -w "$WS1" 2>&1)
RC=$?
set -e

if [ "$RC" -ne 0 ]; then
    ok "re-init 返回非零 (rc=$RC)"
else
    bad "re-init 应该失败但返回 0"
fi

if echo "$OUT" | grep -qi "symlink missing or broken"; then
    ok "错误信息包含 symlink missing or broken"
else
    bad "错误信息缺少 symlink missing or broken,实际: $OUT"
fi

if echo "$OUT" | grep -qi "regular directory"; then
    ok "hint 提到 regular directory"
else
    bad "hint 缺少 regular directory 提示,实际: $OUT"
fi

if echo "$OUT" | grep -qi "recover"; then
    ok "hint 提到 recover"
else
    bad "hint 缺少 recover 引导,实际: $OUT"
fi

# ── 场景 2: recover 拒绝覆盖普通目录 ─────────────────────────────
log "场景 2: recover 拒绝覆盖普通目录"

# WS1 现在还是普通目录
set +e
OUT=$("$WS_CKPT" recover -w "$WS1" --force 2>&1)
RC=$?
set -e

if [ "$RC" -ne 0 ]; then
    ok "recover 返回非零 (rc=$RC)"
else
    bad "recover 应该失败但返回 0"
fi

if echo "$OUT" | grep -qi "regular directory"; then
    ok "错误信息包含 regular directory"
else
    bad "错误信息缺少 regular directory,实际: $OUT"
fi

if echo "$OUT" | grep -qi "move or rename"; then
    ok "错误信息提示 move or rename"
else
    bad "错误信息缺少 move or rename 引导,实际: $OUT"
fi

# ── 场景 3: 移走目录后 recover 正常工作 ───────────────────────────
log "场景 3: 移走目录后 recover 正常工作"

mv "$WS1" "${WS1}.bak"

set +e
OUT=$("$WS_CKPT" recover -w "$WS1" --force 2>&1)
RC=$?
set -e

if [ "$RC" -eq 0 ]; then
    ok "recover 成功 (rc=0)"
else
    bad "recover 应该成功但 rc=$RC, 输出: $OUT"
fi

if [ -d "$WS1" ]; then
    ok "recover 后路径存在(普通目录)"
else
    bad "recover 后路径不存在"
fi

# recover 后应该变成普通目录而非 symlink
if [ -d "$WS1" ] && [ ! -L "$WS1" ]; then
    ok "recover 后是普通目录(非 symlink)"
else
    bad "recover 后路径类型异常(symlink=$([ -L "$WS1" ] && echo yes || echo no)"
fi

# ── 场景 4: symlink 不存在时 recover 正常 ─────────────────────────
log "场景 4: symlink 不存在时 recover 正常(路径已删除)"

# 全新 init WS2
mkdir -p "$WS2" && echo "seed" > "$WS2/f.txt"
"$WS_CKPT" init -w "$WS2" >/dev/null 2>&1

# 只删 symlink,不重建
if [ -L "$WS2" ]; then
    rm "$WS2"
else
    bad "WS2 init 后不是 symlink,跳过场景 4"
fi

set +e
OUT=$("$WS_CKPT" recover -w "$WS2" --force 2>&1)
RC=$?
set -e

if [ "$RC" -eq 0 ]; then
    ok "recover 成功 (rc=0)"
else
    bad "recover 应该成功但 rc=$RC, 输出: $OUT"
fi

if [ -d "$WS2" ]; then
    ok "recover 后路径已重建"
else
    bad "recover 后路径不存在"
fi

# ── 汇总 ──────────────────────────────────────────────────────────
echo ""
TOTAL=$((PASS + FAIL))
echo -e "结果: ${PASS}/${TOTAL} passed, ${FAIL} failed"
if [ "$FAIL" -gt 0 ]; then
    echo -e "${RED}VERDICT: FAIL${NC}"
    exit 1
else
    echo -e "${GREEN}VERDICT: PASS${NC}"
    exit 0
fi

Additional Notes

- init: append hint when symlink replaced by regular directory (alibaba#694)
- recover: bail if original_path is not a symlink to prevent data loss
- update and add tests for both paths

Signed-off-by: Ziqi Huang <ziqi02@alibaba-inc.com>
@Ziqi002 Ziqi002 force-pushed the fix/ckpt-issue-694 branch from bac08cd to 18b47ed Compare June 12, 2026 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ckpt] bug: workspace symlink 被外部删除后,ws-ckpt init 静默成功但不重建 symlink,导致 checkpoint/rollback 失效

1 participant