Skip to content

feat: 写路径多库化(sync/bucket switch)— 适配 Codex 26.609,PR2#17

Closed
Wangnov wants to merge 2 commits into
feat/multi-store-26609from
feat/multi-store-write
Closed

feat: 写路径多库化(sync/bucket switch)— 适配 Codex 26.609,PR2#17
Wangnov wants to merge 2 commits into
feat/multi-store-26609from
feat/multi-store-write

Conversation

@Wangnov

@Wangnov Wangnov commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Stacked on #16(base 指向 feat/multi-store-26609)。本 PR 只含写路径改动;#16 合并后会自动对到 main。

背景

PR1(#16)做了多 store 只读发现 + 多库 status。PR2 把一次性写命令 sync / bucket switch 的写路径多库化:在并存的 CLI / Codex App 两个 state_5.sqlite 上都归一 provider,并统一改写共享的 rollout JSONL。共识见 #14

改动

  • reconcile_all_stores_with_backup:发现所有 store → 跨库收集 rollout 目标(按 canonical 路径去重)重写一次 → 逐库备份 + reconcile_sqlite_in_place。逐库失败收敛为 StoreOutcome::Failed,不 abort 健康库。
  • 数据安全不变量:rollout 在任何 DB 翻转之前重写;某库 rollout 目标读不到 → 该库标 Failed 且跳过 DB 写,绝不在 rollout 没改的情况下翻 DB。
  • 按 store 命名空间备份:<db_parent>/backups/<slug>/state_5.<ts>.bak,CLI / App 互不覆盖,备份永远先于写。
  • 退出码:main 改返回 ExitCode —— Full→0 / Partial→2 / Failed→1
  • --sqlite-only + App 库:打持久性告警(rollout 才是事实源,sqlite-only 改动可能被 backfill 还原)。
  • watch 仍走单库 reconcile_once(其 MismatchedRows followup 语义另案处理,PR3);多库路径加 debug_assert!(scope != MismatchedRows) 钉死前提。删除 3 个废弃单库备份函数,无 dead code。

并行 Review(Codex × Claude subagent)

  • Codex 抓到一个真实数据一致性 🔴(subagent 漏了):rollout 收集失败被吞,但该库 UPDATE model_provider 仍可能成功 → 报 Full 却 rollout 没改、DB 改了,违背"rollout 是事实源"。
  • 已修复:reconcile_rollouts_for_stores 返回失败库清单,调用方把这些库标 Failed 并跳过 DB 写;新增回归测试 reconcile_all_stores_fails_store_when_rollout_targets_unreadable(修复前失败、修复后通过)。
  • 两位都提的 MismatchedRows 守卫已加。

测试 / 质量

  • cargo test:50 passed(新增 3 个写路径集成测试:双库全更新 + rollout 去重一次、坏库报 Partial、rollout 读失败的库报 Failed 且 DB 不动)。
  • cargo clippy --all-targets:0;cargo fmt --check:clean。
  • CODEX_HOMEsync 实测退出码 1;真实库 status 只读回归通过。

已知小项(非本 PR 引入)

备份名用毫秒时间戳,同库 1ms 内重复 sync 理论上可能覆盖旧备份(PR1 前就有的旧行为)。留作独立小修。

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@Wangnov Wangnov force-pushed the feat/multi-store-write branch from 078dd5f to bb50bfa Compare June 13, 2026 05:56
Wangnov added a commit that referenced this pull request Jun 13, 2026
PR3 of the multi-store adaptation (builds on #17). Before writing, the
multi-store path now waits (bounded) for each store's Codex startup-backfill to
finish, so threadripper never races Codex's rebuild.

- wait_for_store_backfill polls backfill_state (read-only): no table or a
  `complete` status is ready immediately; otherwise it waits up to backfill_wait
  (default 10s) and then reports the store busy.
- A busy store is reported StoreOutcome::Skipped, and neither its rollout targets
  nor its DB are touched.
- Because rewriting the shared rollout JSONL would race a running backfill's
  reads (the rollout files are its source of truth), a rollout-rewriting scope
  (AllRows) combined with any in-progress backfill skips the whole round;
  --sqlite-only (RolloutScope::None) still writes its ready stores.
- main returns Partial(2) / Failed(1) accordingly; the --sqlite-only App warning
  now fires only when the App store was actually updated.

Reviewed in parallel by Codex and a Claude subagent. Codex caught that the first
cut still rewrote the shared rollout JSONL (racing the backfill) even while
skipping busy stores' DBs; fixed with the whole-round skip above, plus a
regression test asserting that a shared rollout and both DBs stay untouched while
a backfill runs. 53 tests, clippy and fmt clean.
@Wangnov Wangnov force-pushed the feat/multi-store-write branch from bb50bfa to abcee40 Compare June 13, 2026 06:11
Wangnov added a commit that referenced this pull request Jun 13, 2026
PR3 of the multi-store adaptation (builds on #17). Before writing, the
multi-store path now waits (bounded) for each store's Codex startup-backfill to
finish, so threadripper never races Codex's rebuild.

- wait_for_store_backfill polls backfill_state (read-only): no table or a
  `complete` status is ready immediately; otherwise it waits up to backfill_wait
  (default 10s) and then reports the store busy.
- A busy store is reported StoreOutcome::Skipped, and neither its rollout targets
  nor its DB are touched.
- Because rewriting the shared rollout JSONL would race a running backfill's
  reads (the rollout files are its source of truth), a rollout-rewriting scope
  (AllRows) combined with any in-progress backfill skips the whole round;
  --sqlite-only (RolloutScope::None) still writes its ready stores.
- main returns Partial(2) / Failed(1) accordingly; the --sqlite-only App warning
  now fires only when the App store was actually updated.

Reviewed in parallel by Codex and a Claude subagent. Codex caught that the first
cut still rewrote the shared rollout JSONL (racing the backfill) even while
skipping busy stores' DBs; fixed with the whole-round skip above, plus a
regression test asserting that a shared rollout and both DBs stay untouched while
a backfill runs. 53 tests, clippy and fmt clean.
@Wangnov Wangnov force-pushed the feat/multi-store-write branch from abcee40 to fcffc5e Compare June 13, 2026 06:34
Wangnov added a commit that referenced this pull request Jun 13, 2026
PR3 of the multi-store adaptation (builds on #17). Before writing, the
multi-store path now waits (bounded) for each store's Codex startup-backfill to
finish, so threadripper never races Codex's rebuild.

- wait_for_store_backfill polls backfill_state (read-only): no table or a
  `complete` status is ready immediately; otherwise it waits up to backfill_wait
  (default 10s) and then reports the store busy.
- A busy store is reported StoreOutcome::Skipped, and neither its rollout targets
  nor its DB are touched.
- Because rewriting the shared rollout JSONL would race a running backfill's
  reads (the rollout files are its source of truth), a rollout-rewriting scope
  (AllRows) combined with any in-progress backfill skips the whole round;
  --sqlite-only (RolloutScope::None) still writes its ready stores.
- main returns Partial(2) / Failed(1) accordingly; the --sqlite-only App warning
  now fires only when the App store was actually updated.

Reviewed in parallel by Codex and a Claude subagent. Codex caught that the first
cut still rewrote the shared rollout JSONL (racing the backfill) even while
skipping busy stores' DBs; fixed with the whole-round skip above, plus a
regression test asserting that a shared rollout and both DBs stay untouched while
a backfill runs. 53 tests, clippy and fmt clean.
@Wangnov Wangnov force-pushed the feat/multi-store-write branch from fcffc5e to 64f5c8c Compare June 13, 2026 06:44
Wangnov added a commit that referenced this pull request Jun 13, 2026
PR3 of the multi-store adaptation (builds on #17). Before writing, the
multi-store path now waits (bounded) for each store's Codex startup-backfill to
finish, so threadripper never races Codex's rebuild.

- wait_for_store_backfill polls backfill_state (read-only): no table or a
  `complete` status is ready immediately; otherwise it waits up to backfill_wait
  (default 10s) and then reports the store busy.
- A busy store is reported StoreOutcome::Skipped, and neither its rollout targets
  nor its DB are touched.
- Because rewriting the shared rollout JSONL would race a running backfill's
  reads (the rollout files are its source of truth), a rollout-rewriting scope
  (AllRows) combined with any in-progress backfill skips the whole round;
  --sqlite-only (RolloutScope::None) still writes its ready stores.
- main returns Partial(2) / Failed(1) accordingly; the --sqlite-only App warning
  now fires only when the App store was actually updated.

Reviewed in parallel by Codex and a Claude subagent. Codex caught that the first
cut still rewrote the shared rollout JSONL (racing the backfill) even while
skipping busy stores' DBs; fixed with the whole-round skip above, plus a
regression test asserting that a shared rollout and both DBs stay untouched while
a backfill runs. 53 tests, clippy and fmt clean.
PR2 of the multi-store adaptation (builds on #16). The one-shot write commands
now reconcile every discovered store plus the shared rollout JSONL, instead of a
single resolved DB.

- sync.rs: reconcile_all_stores_with_backup discovers all stores, rewrites the
  shared rollout JSONL once (deduped by canonical path, before any DB row is
  flipped), then backs up and reconciles each store's SQLite. A per-store
  failure is reported (StoreOutcome::Failed) without aborting healthy stores.
- A store whose rollout targets cannot be read is marked Failed and its DB is
  left untouched, so a DB is never flipped while its rollouts stay stale.
- Backups are namespaced per store: <db_parent>/backups/<slug>/.
- main returns ExitCode: Full -> 0, Partial -> 2, Failed -> 1.
- --sqlite-only warns that App-store SQLite-only edits may be reverted by
  Codex's rollout backfill (rollout JSONL is the source of truth).
- watch still uses the single-store path (MismatchedRows followup); a
  debug_assert guards the multi-store path against MismatchedRows misuse.
- Remove the now-unused single-store backup helpers.

Reviewed in parallel by Codex and a Claude subagent. Codex caught a real
consistency hole (a rollout-collection failure was swallowed while the DB write
could still succeed, reporting Full while the rollout stayed stale); fixed by
failing such stores and skipping their DB write, with a regression test.
50 tests, clippy and fmt clean.
@Wangnov Wangnov force-pushed the feat/multi-store-write branch from 64f5c8c to 268baf2 Compare June 13, 2026 06:58
Wangnov added a commit that referenced this pull request Jun 13, 2026
PR3 of the multi-store adaptation (builds on #17). Before writing, the
multi-store path now waits (bounded) for each store's Codex startup-backfill to
finish, so threadripper never races Codex's rebuild.

- wait_for_store_backfill polls backfill_state (read-only): no table or a
  `complete` status is ready immediately; otherwise it waits up to backfill_wait
  (default 10s) and then reports the store busy.
- A busy store is reported StoreOutcome::Skipped, and neither its rollout targets
  nor its DB are touched.
- Because rewriting the shared rollout JSONL would race a running backfill's
  reads (the rollout files are its source of truth), a rollout-rewriting scope
  (AllRows) combined with any in-progress backfill skips the whole round;
  --sqlite-only (RolloutScope::None) still writes its ready stores.
- main returns Partial(2) / Failed(1) accordingly; the --sqlite-only App warning
  now fires only when the App store was actually updated.

Reviewed in parallel by Codex and a Claude subagent. Codex caught that the first
cut still rewrote the shared rollout JSONL (racing the backfill) even while
skipping busy stores' DBs; fixed with the whole-round skip above, plus a
regression test asserting that a shared rollout and both DBs stay untouched while
a backfill runs. 53 tests, clippy and fmt clean.
PR3 of the multi-store adaptation (builds on #17). Before writing, the
multi-store path now waits (bounded) for each store's Codex startup-backfill to
finish, so threadripper never races Codex's rebuild.

- wait_for_store_backfill polls backfill_state (read-only): no table or a
  `complete` status is ready immediately; otherwise it waits up to backfill_wait
  (default 10s) and then reports the store busy.
- A busy store is reported StoreOutcome::Skipped, and neither its rollout targets
  nor its DB are touched.
- Because rewriting the shared rollout JSONL would race a running backfill's
  reads (the rollout files are its source of truth), a rollout-rewriting scope
  (AllRows) combined with any in-progress backfill skips the whole round;
  --sqlite-only (RolloutScope::None) still writes its ready stores.
- main returns Partial(2) / Failed(1) accordingly; the --sqlite-only App warning
  now fires only when the App store was actually updated.

Reviewed in parallel by Codex and a Claude subagent. Codex caught that the first
cut still rewrote the shared rollout JSONL (racing the backfill) even while
skipping busy stores' DBs; fixed with the whole-round skip above, plus a
regression test asserting that a shared rollout and both DBs stay untouched while
a backfill runs. 53 tests, clippy and fmt clean.
@Wangnov Wangnov force-pushed the feat/multi-store-write branch from 20e1a7e to b6dfe0b Compare June 13, 2026 07:48
@Wangnov

Wangnov commented Jun 13, 2026

Copy link
Copy Markdown
Owner Author

✅ 已随这条 stack 通过快进合并落到 main(本 PR 的提交 b6dfe0b 已在 main,HEAD=ebc2902)。作为已合并部分关闭。

@Wangnov Wangnov closed this Jun 13, 2026
@Wangnov Wangnov deleted the feat/multi-store-write branch June 13, 2026 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant