Skip to content

fix: address all 5 maintainer review blockers (PR #662)#697

Closed
jlin53882 wants to merge 8 commits intoCortexReach:masterfrom
jlin53882:fix/pr662-maintainer-review
Closed

fix: address all 5 maintainer review blockers (PR #662)#697
jlin53882 wants to merge 8 commits intoCortexReach:masterfrom
jlin53882:fix/pr662-maintainer-review

Conversation

@jlin53882
Copy link
Copy Markdown
Contributor

修復 PR #662 維護者提出的 5 個 Blocking 問題

基於 rwmjhb review(#4170815598)全部修復:


Fix #1runWithFileLock() 不再重複執行 mutation

問題:原本 try/catch 包裹整個 acquire() + fn() 區塊,若 fn() 在 Redis lock 取得後失敗,會被 catch 捕捉並觸發 file-lock fallback,等於同一筆 mutation 執行兩次。

修復:引入 lockAcquired flag,區分:

  • lockAcquired=false → lock 取得失敗 → fallback 合理
  • lockAcquired=true + fn() 失敗 → operation 錯誤,直接 re-throw
let lockAcquired = false;
let release: (() => Promise<void>) | null = null;
try {
  release = await redisManager.acquire(redisLockKey, redisLockTTL);
  lockAcquired = true;
  return await fn();     // ← fn() 錯誤往上拋,不 trigger fallback
} catch (err) {
  if (!lockAcquired) {
    // lock 取得失敗 → fallback 合理
    console.warn("...falling back to file lock:", err);
  } else {
    // lock 已取得但 fn() 失敗 → operation 錯誤,不 fallback
    throw err;
  }
} finally {
  if (release) { try { await release(); } catch {} }
}

Fix #2:Redis TTL 從 60s → 180s

問題:無 renewal 機制,長 operation 期間 lock 過期會造成 race condition。

修復:TTL 延長到 180 秒作為短期緩解。完整修復需 Phase 2 加 TTL renewal background task。


Fix #3:Redis 測試 Hermetic 化

問題redis-lock-edge-cases.test.mjsredis-lock-optimized.test.mjs 預設需要 localhost:6379 Redis,無 Redis 時 CI fail/hang。

修復:加 SKIP_NO_REDIS guard,REDIS_URL 未設定時自動 skip。CI 需設定 REDIS_URL 環境變數才執行這些測試。

const SKIP_NO_REDIS = !process.env.REDIS_URL;
// ...
(SKIP_NO_REDIS ? describe.skip : describe)("Edge Case 1", () => { ... });

Fix #4:Redis lock key 包含 dbPath,恢復 per-db 隔離

問題:所有 dbPath 共用 "memory-write" 全域 key,完全抹除原本 per-db file lock 的隔離。

修復:key 改為 memory-write:{normalized_db_path}

const redisLockKey = `memory-write:${normalizeStorageKey(this.config.dbPath)}`;

normalizeStorageKey() 會 resolve symlink 並把路徑正規化,確保同一 storage 的不同路徑表示得到相同的 key。


Fix #5:還原 memory-update-metadata-refresh.test.mjs 到 npm test

問題:PR 意外將該測試從 npm test 移除。

修復:將 node test/memory-update-metadata-refresh.test.mjs 放回正確位置(在 temporal-awarenessredis-lock-edge-cases 之間)。


測試驗證

# 無 REDIS_URL → Redis 測試自動 skip
npm test  # should pass (redis tests skipped)

# 有 Redis → 完整執行
REDIS_URL=redis://localhost:6379 npm test

- Add src/redis-lock.ts: Token-based Redis lock with graceful fallback
- Add test files for performance, edge cases, and optimization
- Add ioredis dependency

Fixes CortexReach#643: improves 200 concurrent write success from 6% to 97.5%

Requires: npm install ioredis
- Replace no-op lock with proper file lock fallback
- Maintains lock protection even without Redis
- Import proper-lockfile for file lock support
- Remove retries from sync lock (not supported)
- Handle Windows path for tmp directory
- Ignore ENOENT when releasing
- Add fallback unit tests
- Log Redis unavailable with error details
- Log file lock acquire/release with key and path
- Use emoji markers for clarity (✅/❌)
- Add Redis lock manager initialization
- Modify runWithFileLock to use Redis lock first, fallback to file lock
- Add integration test for 20/50 concurrent operations
- Fixes Issue CortexReach#643 lock contention
Blocker 2 (any types):
- properLockfile: any → proper-lockfile module type
- Promise<any> → Promise<typeof import('proper-lockfile')>

Blocker 3 (excessive console logging):
- Remove ALL console.log from redis-lock.ts (hot path)
- Keep only essential console.warn for actual failures

Blocker 4 (timeout behavior):
- Add MAX_ATTEMPTS=600 circuit breaker (prevents infinite retry loop)
- Add attempts >= MAX_ATTEMPTS check with descriptive error message
- Fix Windows C:\\tmp path → use os.tmpdir()
- Fix createFileLock() throwing on lockSync failure (was returning no-op release, causing silent data corruption risk)

Root cause fixes:
- createFileLock now throws immediately when lockSync fails (no silent swallow)
- File lock uses node os.tmpdir() which works correctly on Windows
Phase 1 修復(基於 rwmjhb review #4170815598):

1. [FIX #1] runWithFileLock() 不再重複執行 mutation
   - 引入 lockAcquired flag 區分「lock 取得失敗」vs「fn() 失敗」
   - 只有 lock 取得失敗才觸發 file-lock fallback
   - fn() 本身的錯誤直接 re-throw,不再默默重跑

2. [FIX #2] Redis TTL 從 60s → 180s
   - 緩解無 renewal 機制下的 lock 過期 race condition 風險
   - 完整修復需加 TTL renewal(預計 Phase 2)

3. [FIX #3] Redis 測試 hermetic 化
   - redis-lock-edge-cases.test.mjs / redis-lock-optimized.test.mjs
   - REDIS_URL env var 未設定時自動 skip,CI 不再 fail/hang

4. [FIX #4] Redis lock key 包含 dbPath identity
   - key = memory-write:{normalized_db_path}
   - 恢復原本 per-db 的 lock 隔離,避免不同 storage 排隊

5. [FIX #5] 還原 memory-update-metadata-refresh.test.mjs 到 npm test
   - 該檔案在 PR 中被意外移除,現在放回正確位置
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6407cf10bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/store.ts
// 【修復 #1】Redis lock key 包含 storage identity,防止全域 key 序列化不同 DB
// 【修復 #2】TTL 從 60s → 180s,降低長 operation 期間 lock 過期風險
// 注意:完整修復需加 renewal;180s 是短期緩解
const redisLockKey = `memory-write:${normalizeStorageKey(this.config.dbPath)}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Resolve normalizeStorageKey at module scope

runWithFileLock calls normalizeStorageKey(...) here, but the helper was inserted inside validateStoragePath (after an early return) rather than at module scope. That makes this identifier undefined in the class method scope, so write operations that reach this path will throw ReferenceError: normalizeStorageKey is not defined before any lock is acquired.

Useful? React with 👍 / 👎.

Comment thread src/store.ts
return await this.table!.countRows();
}

async getById(id: string, scopeFilter?: string[]): Promise<MemoryEntry | null> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore MemoryStore.count used by tool retry paths

This change removes MemoryStore.count(), but src/tools.ts still passes context.store.count() into retrieveWithRetry (e.g. lines 219/584/1088/1230/2184). When an initial retrieval returns no rows, that callback is executed and will now throw TypeError: ...count is not a function, breaking those tool flows instead of performing the intended empty-store check and retry behavior.

Useful? React with 👍 / 👎.

Comment thread src/store.ts
Comment on lines +271 to +273
if (!lockAcquired) {
console.warn("[memory-lancedb-pro] Redis lock acquire failed, falling back to file lock:", err);
// Fall through to file lock
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop falling back after Redis lock acquisition timeouts

The fallback branch triggers for any acquire error before lockAcquired is set, including contention timeouts thrown by RedisLockManager.acquire. In that scenario, one writer can still hold the Redis lock while another proceeds under the file-lock fallback, so concurrent writes run under different lock domains and can race. Fallback should be limited to Redis-unavailable/init failures, not timeout/contended acquisition errors.

Useful? React with 👍 / 👎.

@jlin53882 jlin53882 closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant