fix(e2e): resolve test failures from issue #244 by visahak · Pull Request #245 · AgentToolkit/altk-evolve

visahak · 2026-05-01T20:04:00Z

Description:

Summary

Fix 3 categories of e2e test failures reported in E2E suite still has 6 failures and 32 errors after installing examples/milvus extras #244 plus 2 sharing tests discovered in the full-suite run
Hardened filesystem backend against interrupted writes (was causing 0-byte namespace files and cascading test failures)
Test-side fixes for assertion bugs and LLM variance

Changes

altk_evolve/backend/filesystem.py — atomic writes via tmp + os.replace; graceful recovery from empty/corrupt JSON
altk_evolve/backend/filesystem.py — atomic writes via tmp + os.replace; graceful recovery from empty/corrupt JSON
platform-integrations/codex/.../subscribe.py — audit-append failure now warns and continues instead of rolling back with exit 1
platform-integrations/{claude,claw-code}/.../lib/config.py — invalid-repo-name warning prints to stdout with "(skipped - invalid subscription name)" wording
tests/e2e/test_e2e_segmentation.py — @pytest.mark.flaky(retries=2, delay=1) to absorb LLM variance
tests/e2e/test_sharing.py — assertion now checks leaked entity content + [public: alice] marker instead of the query term, which always echoes in the response header
.gitignore — exclude .codex

Test plan

Group A: 3 Phoenix-sync e2e pipeline tests pass
Group B: 4 invalid-subscription-name tests pass across bob/codex/claude
Group C: test_subscribe_warns_when_audit_write_fails passes; full test_codex_sharing.py green

Addresses issue #244

Summary by CodeRabbit

Bug Fixes
- Safer namespace file handling with atomic saves and recovery for empty or corrupt files
- Subscription flow now tolerates audit-log write failures (continues with a warning)
Tests
- Added unit tests for filesystem namespace recovery and save robustness
- Added retry for a flaky segmentation test; strengthened cross-namespace discovery assertions
Chores
- Added .codex to ignore list and clarified subscription validation/log messages

- filesystem backend: atomic writes + graceful recovery from corrupt JSON - codex subscribe: warn on audit-append failure instead of rolling back - shared config: invalid-repo-name warning goes to stdout - segmentation test: mark flaky to absorb LLM variance - cross-namespace sharing test: check leaked content, not echoed query - .gitignore: exclude .codex

coderabbitai · 2026-05-01T20:04:13Z

Warning

Rate limit exceeded

@visahak has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 43 minutes and 1 second before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a0291860-8fac-4bf0-bfcf-eb0914ad39b3

📥 Commits

Reviewing files that changed from the base of the PR and between da23440 and c2397e3.

📒 Files selected for processing (2)

altk_evolve/backend/filesystem.py
tests/unit/test_filesystem_backend.py

📝 Walkthrough

Walkthrough

Namespace persistence now uses atomic temp-file writes and stricter JSON handling; empty or corrupt namespace files are treated as missing and recovered. Added tests for these behaviors. Minor logging/message wording changes in two plugin configs and tests updated to expect messages on stderr. .codex added to .gitignore. Subscription audit-log failures are downgraded from fatal to warnings.

Changes

Git ignore

Layer / File(s)	Summary
Config `.gitignore`	Add `.codex` pattern to ignore list.

Filesystem namespace backend (atomic-save and recovery)

Layer / File(s)	Summary
Imports / Helpers `altk_evolve/backend/filesystem.py`	Add `os` import to support atomic replace operations.
Read / Validation `altk_evolve/backend/filesystem.py`	`_load_namespace_data()` now reads namespace JSON via `read_text()`, treats empty/whitespace-only files as invalid (logs warning, unlinks file, raises `NamespaceNotFoundException`), and treats JSON decode errors similarly (warning, unlink, raise with decode error chained).
Atomic Write `altk_evolve/backend/filesystem.py`	`_save_namespace_data()` writes JSON to a UUID-suffixed temporary sibling file, then atomically replaces the target namespace file with `os.replace`; temp-file cleanup on failure is implemented.
Unit tests `tests/unit/test_filesystem_backend.py`	New tests/fixtures added to verify recovery from zero-byte and corrupt namespace files and tolerance of stale `.tmp` files during save.

Plugin config validation message changes

Layer / File(s)	Summary
Validation messaging `platform-integrations/claude/plugins/evolve-lite/lib/config.py`	Update `_coerce_repo` invalid-name message wording; print still written to `stderr`.
Validation messaging `platform-integrations/claw-code/plugins/evolve-lite/lib/config.py`	Update `_coerce_repo` invalid-name message wording and remove explicit `file=sys.stderr` from `print`.

Subscription audit logging behavior

Layer / File(s)	Summary
Error handling change `platform-integrations/codex/plugins/evolve-lite/skills/subscribe/scripts/subscribe.py`	On audit-log append failure, change from fatal rollback+exit to printing a warning and continuing (no rollback/deletion).

Tests — expectations and flakiness handling

Layer / File(s)	Summary
Flakiness handling `tests/e2e/test_e2e_segmentation.py`	Mark `test_segment_trajectory_min_subtasks` as flaky — up to 2 retries with 1s delay.
Assertion strengthening `tests/e2e/test_sharing.py`	Tighten `test_cross_namespace_public_discovery` to assert full content string and absence of public owner marker when appropriate.
StdErr expectations `tests/platform_integrations/test_bob_sharing.py`, `tests/platform_integrations/test_codex_sharing.py`, `tests/platform_integrations/test_sync.py`	Update tests to expect the “invalid subscription name” skip message on `stderr` instead of `stdout`.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Backend
    participant OS
    Client->>Backend: ensure_namespace(ns)
    Backend->>OS: read_text(ns.json)
    OS-->>Backend: file content (empty / corrupt / valid)
    alt empty or corrupt content
        Backend->>OS: unlink(ns.json)
        Backend-->>Client: raise NamespaceNotFoundException
        Client->>Backend: create_namespace(ns) / recover
    end
    Client->>Backend: save namespace update
    Backend->>OS: write temp file (ns.json.<uuid>.tmp)
    OS-->>Backend: temp file written
    Backend->>OS: os.replace(temp -> ns.json) (atomic swap)
    OS-->>Backend: replace success
    Backend-->>Client: save confirmed

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

gaodan-fang
illeatmyhat

Poem

🐰 I nibble at files and mend the fray,
Temp names hop in, then swap away,
Warnings whisper where errors slept,
Tests retry, and stale crumbs are swept,
A tidy burrow — code safe for play.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(e2e): resolve test failures from issue `#244`' accurately summarizes the main objective of the PR, which addresses end-to-end test failures from issue `#244` through multiple fixes across the codebase.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 43 minutes and 1 second.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

platform-integrations/claw-code/plugins/evolve-lite/lib/config.py (1)
339-352: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inconsistent output streams for sibling warnings in _coerce_repo.

The invalid-name warning (lines 339–341) now writes to stdout, while the invalid-scope warning (lines 349–352) still writes to stderr. Both are diagnostic/warning messages emitted from the same function for invalid config entries — they should target the same stream.

If the change was driven by a test that captures stdout, consider capturing stderr in the test instead (which is the conventional target for warnings) rather than routing a warning to stdout.
🔧 Proposed fix — restore `stderr` for the invalid-name path
     if not is_valid_repo_name(name.strip()):
         print(
-            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed"
+            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed",
+            file=sys.stderr,
         )
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claw-code/plugins/evolve-lite/lib/config.py` around
lines 339 - 352, The invalid-name diagnostic in _coerce_repo currently prints to
stdout while the invalid-scope diagnostic prints to stderr, causing inconsistent
output streams; change the print in the invalid-name branch that references name
to write to sys.stderr (matching the other warning), keep the return None
behavior, and ensure you import or reference sys the same way as the scope
branch so both warnings (the print that mentions "skipped - invalid subscription
name" and the print that mentions "unknown scope") go to stderr; symbols to
locate: function _coerce_repo, variables name/remote/scope, VALID_SCOPES, and
the two print calls.
platform-integrations/claude/plugins/evolve-lite/lib/config.py (1)
339-352: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Same inconsistent output-stream issue as in claw-code/plugins/evolve-lite/lib/config.py.

Identical concern: the invalid-name print (lines 339–341) targets stdout while the invalid-scope print (lines 349–352) targets stderr. Both paths should use the same stream. See the proposed fix in the claw-code file review above and apply it here as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py` around lines
339 - 352, The two diagnostic print calls in this module are inconsistent: the
invalid-name print (using name) currently writes to stdout while the
invalid-scope print (using entry.get("scope") and VALID_SCOPES) writes to
sys.stderr; change the invalid-name print so it also writes to sys.stderr (i.e.,
add file=sys.stderr) to match the invalid-scope branch and ensure both error
messages go to the same stream; locate the prints around the name validation and
the scope validation (they reference name, entry, scope and VALID_SCOPES) and
modify only the print for the invalid-name case to include file=sys.stderr.

🧹 Nitpick comments (1)

altk_evolve/backend/filesystem.py (1)
73-75: ⚡ Quick win

Use a unique tmp filename per write to be safe under multi-process deployments.

The deterministic {namespace_id}.json.tmp name is protected by threading.Lock within a single process, but not across OS-level workers. If two processes ever write the same namespace concurrently (e.g., uvicorn with --workers > 1), Writer B can overwrite the tmp file after Writer A wrote it but before Writer A calls os.replace, causing Writer A to atomically commit Writer B's (possibly partial) content.

Using a uniquely-named tmp file keeps the same-directory guarantee that os.replace needs while eliminating the race:
♻️ Proposed fix using `tempfile`
+import tempfile
 ...
     def _save_namespace_data(self, namespace_id: str, data: FilesystemNamespace):
         """Save namespace data to JSON file atomically. ..."""
         file_path = self._namespace_file(namespace_id)
-        tmp_path = file_path.with_suffix(file_path.suffix + ".tmp")
-        tmp_path.write_text(data.model_dump_json(indent=2))
-        os.replace(tmp_path, file_path)
+        fd, tmp_name = tempfile.mkstemp(dir=file_path.parent, suffix=".tmp")
+        try:
+            with os.fdopen(fd, "w", encoding="utf-8") as f:
+                f.write(data.model_dump_json(indent=2))
+            os.replace(tmp_name, file_path)
+        except Exception:
+            os.unlink(tmp_name)
+            raise
This also adds the encoding="utf-8" spec missing from both write_text and read_text (line 55) calls — relevant if any entity content contains non-ASCII characters on Windows where the default locale encoding may not be UTF-8.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@altk_evolve/backend/filesystem.py` around lines 73 - 75, The current write
uses a deterministic tmp file (tmp_path = file_path.with_suffix(...)) which
races across processes; change the write in the function/method that uses
tmp_path/file_path so it creates a unique temp file in the same directory (e.g.,
use tempfile.NamedTemporaryFile or tempfile.mkstemp with dir=file_path.parent
and a unique suffix), write the JSON with encoding="utf-8" (use write() or
write_text(..., encoding="utf-8")), flush/close the temp file, then call
os.replace(temp_path, file_path) to atomically commit; also update the
corresponding read_text call to include encoding="utf-8". Ensure the temp file
is removed on exceptions if not replaced.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py`:
- Around line 339-352: The two diagnostic print calls in this module are
inconsistent: the invalid-name print (using name) currently writes to stdout
while the invalid-scope print (using entry.get("scope") and VALID_SCOPES) writes
to sys.stderr; change the invalid-name print so it also writes to sys.stderr
(i.e., add file=sys.stderr) to match the invalid-scope branch and ensure both
error messages go to the same stream; locate the prints around the name
validation and the scope validation (they reference name, entry, scope and
VALID_SCOPES) and modify only the print for the invalid-name case to include
file=sys.stderr.

In `@platform-integrations/claw-code/plugins/evolve-lite/lib/config.py`:
- Around line 339-352: The invalid-name diagnostic in _coerce_repo currently
prints to stdout while the invalid-scope diagnostic prints to stderr, causing
inconsistent output streams; change the print in the invalid-name branch that
references name to write to sys.stderr (matching the other warning), keep the
return None behavior, and ensure you import or reference sys the same way as the
scope branch so both warnings (the print that mentions "skipped - invalid
subscription name" and the print that mentions "unknown scope") go to stderr;
symbols to locate: function _coerce_repo, variables name/remote/scope,
VALID_SCOPES, and the two print calls.

---

Nitpick comments:
In `@altk_evolve/backend/filesystem.py`:
- Around line 73-75: The current write uses a deterministic tmp file (tmp_path =
file_path.with_suffix(...)) which races across processes; change the write in
the function/method that uses tmp_path/file_path so it creates a unique temp
file in the same directory (e.g., use tempfile.NamedTemporaryFile or
tempfile.mkstemp with dir=file_path.parent and a unique suffix), write the JSON
with encoding="utf-8" (use write() or write_text(..., encoding="utf-8")),
flush/close the temp file, then call os.replace(temp_path, file_path) to
atomically commit; also update the corresponding read_text call to include
encoding="utf-8". Ensure the temp file is removed on exceptions if not replaced.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ff5fdb0b-f09a-4c15-8202-8ffb15eee833

📥 Commits

Reviewing files that changed from the base of the PR and between d8b9ac0 and 83c55f6.

📒 Files selected for processing (7)

.gitignore
altk_evolve/backend/filesystem.py
platform-integrations/claude/plugins/evolve-lite/lib/config.py
platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
platform-integrations/codex/plugins/evolve-lite/skills/subscribe/scripts/subscribe.py
tests/e2e/test_e2e_segmentation.py
tests/e2e/test_sharing.py

coderabbitai

🧹 Nitpick comments (1)

platform-integrations/claude/plugins/evolve-lite/lib/config.py (1)
338-351: ⚡ Quick win

Inconsistent output streams for sibling warnings within _coerce_repo.

The invalid-name path (line 339) now emits to stdout via bare print(), while the invalid-scope path (lines 347-350) still emits to stderr via print(..., file=sys.stderr). Both are diagnostics for malformed config entries in the same function; routing them to different streams is surprising to callers.

As noted in the relevant context, sync.py already uses capture_output=True for subprocess calls. If any parent process captures these script invocations with stdout capture, the invalid-name warning will be inadvertently swallowed into the captured output buffer rather than flowing to the diagnostic stream.

If the test suite change that prompted this was asserting on stdout, consider redirecting both warnings to stderr and updating the tests to check capsys.readouterr().err (or equivalent), which keeps all config diagnostics on the standard diagnostic channel.
♻️ Proposed alignment (both warnings → stderr)
     if not is_valid_repo_name(name.strip()):
-        print(f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed")
+        print(
+            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed",
+            file=sys.stderr,
+        )
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py` around lines
338 - 351, In _coerce_repo, the invalid-name warning currently uses print(...)
to stdout while the invalid-scope warning prints to stderr; change the
invalid-name branch that prints "evolve-lite: {name!r} (skipped - invalid
subscription name) ..." to write to stderr (e.g., use print(...,
file=sys.stderr)) so both diagnostics go to stderr, and update any tests that
asserted on stdout to assert on stderr (capsys.readouterr().err) as needed;
reference symbols: function _coerce_repo, variables name and entry, and
VALID_SCOPES.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py`:
- Around line 338-351: In _coerce_repo, the invalid-name warning currently uses
print(...) to stdout while the invalid-scope warning prints to stderr; change
the invalid-name branch that prints "evolve-lite: {name!r} (skipped - invalid
subscription name) ..." to write to stderr (e.g., use print(...,
file=sys.stderr)) so both diagnostics go to stderr, and update any tests that
asserted on stdout to assert on stderr (capsys.readouterr().err) as needed;
reference symbols: function _coerce_repo, variables name and entry, and
VALID_SCOPES.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a4305af-488c-4d5a-a4b3-343458c79cd7

📥 Commits

Reviewing files that changed from the base of the PR and between 83c55f6 and 904c917.

📒 Files selected for processing (2)

platform-integrations/claude/plugins/evolve-lite/lib/config.py
platform-integrations/claw-code/plugins/evolve-lite/lib/config.py

✅ Files skipped from review due to trivial changes (1)

platform-integrations/claw-code/plugins/evolve-lite/lib/config.py

gaodan-fang · 2026-05-04T15:03:41Z

Codex Review

Target: branch diff against main

The filesystem backend hardening introduces recovery/concurrency gaps around namespace writes. These can still break startup or writes in realistic interrupted-write or multi-process scenarios.

Full review comments

[P2] Use unique temp files for namespace saves — altk_evolve/backend/filesystem.py:73-75
When two processes or two FilesystemEntityBackend instances write the same namespace concurrently, they both use the same <namespace>.json.tmp path; one writer can replace the other writer's temp file, causing FileNotFoundError or committing the wrong payload. The in-process lock only protects a single backend object, so CLI/MCP processes sharing data_dir still hit this. Use a uniquely-created sibling temp file before os.replace.
[P2] Clear corrupt namespace files before recreating — altk_evolve/backend/filesystem.py:56-58
When an interrupted write leaves a 0-byte namespace file, this now raises NamespaceNotFoundException, but ensure_namespace() will then call create_namespace(), which still sees the existing file and raises NamespaceAlreadyExistsException before any replacement. The default startup path remains stuck on the bad file despite "treating as missing"; remove/rename the corrupt file or let namespace creation overwrite files that were classified as corrupt.

Generated by Codex (session 019df380-a019-7521-b0ef-3362ce6cf7d9).

- Unique tmp filenames for namespace saves: concurrent CLI/MCP writers sharing data_dir no longer clobber each other's <ns>.json.tmp, which caused FileNotFoundError at os.replace time. - Unlink empty/corrupt namespace files when classifying as missing, so the create_namespace() call inside ensure_namespace() doesn't trip on the stale file and raise NamespaceAlreadyExistsException, which left startup wedged despite the "treat as missing" recovery branch. Adds tests/unit/test_filesystem_backend.py covering both paths.

visahak · 2026-05-04T15:41:08Z

Codex Review

Target: branch diff against main

The filesystem backend hardening introduces recovery/concurrency gaps around namespace writes. These can still break startup or writes in realistic interrupted-write or multi-process scenarios.

Full review comments

[P2] Use unique temp files for namespace saves — altk_evolve/backend/filesystem.py:73-75
When two processes or two FilesystemEntityBackend instances write the same namespace concurrently, they both use the same <namespace>.json.tmp path; one writer can replace the other writer's temp file, causing FileNotFoundError or committing the wrong payload. The in-process lock only protects a single backend object, so CLI/MCP processes sharing data_dir still hit this. Use a uniquely-created sibling temp file before os.replace.

[P2] Clear corrupt namespace files before recreating — altk_evolve/backend/filesystem.py:56-58
When an interrupted write leaves a 0-byte namespace file, this now raises NamespaceNotFoundException, but ensure_namespace() will then call create_namespace(), which still sees the existing file and raises NamespaceAlreadyExistsException before any replacement. The default startup path remains stuck on the bad file despite "treating as missing"; remove/rename the corrupt file or let namespace creation overwrite files that were classified as corrupt.

Generated by Codex (session 019df380-a019-7521-b0ef-3362ce6cf7d9).

@gaodan-fang Addressed issues in the latest commit 97a3e02

gaodan-fang · 2026-05-04T15:55:25Z

Codex Re-review (post-update)

Target: branch diff against main (head 97a3e02, "fix(filesystem): address Codex review on interrupted-write recovery")

Prior comments status

✅ [P2] Unique temp files for namespace saves (filesystem.py:73-75) — addressed; not reflagged.
✅ [P2] Clear corrupt namespace files before recreating (filesystem.py:56-58) — addressed; not reflagged.

New issue found

[P2] Route invalid config warnings to stderr — platform-integrations/claude/plugins/evolve-lite/lib/config.py:339
When evolve.config.yaml contains an invalid repo name, normalize_repos() now writes this warning to stdout. Commands such as unsubscribe --list emit machine-readable JSON on stdout after calling normalize_repos(), so a stale bad entry will prefix the JSON with this warning and break consumers. The previous code correctly sent the warning to stderr — please restore that. @visahak this comment seems like align with fix(e2e): resolve test failures from issue #244 #245 (review)

No additional blocking issues surfaced in the inspected areas.

Generated by Codex (session 019df3b0-5126-7ac0-8e6e-4b0ceb15264f).

gaodan-fang · 2026-05-04T17:12:53Z

Codex Review (round 3)

Target: branch diff against main (head 97a3e02)

Two P2 issues remain — one new, one still unaddressed from the prior round.

Findings

[P2 — NEW] Resolve the saved transcript path from EVOLVE_DIR — platform-integrations/claude/plugins/evolve-lite/skills/learn/scripts/on_stop.py:23
When users configure EVOLVE_DIR to a non-default location, the save-trajectory hook writes the copied transcript under $EVOLVE_DIR/trajectories, but this stop hook always tells the learn skill to read .evolve/trajectories/.... In that setup the learn skill will not find the saved transcript and will exit without extracting guidelines or influence records. Build this path with the same directory-resolution logic used by the save-trajectory hook.
[P2 — STILL OPEN] Keep invalid-config warnings off stdout — platform-integrations/claude/plugins/evolve-lite/lib/config.py:339
Previously flagged. Still prints to stdout. Commands that call normalize_repos() and produce machine-readable stdout (e.g. unsubscribe --list) will prepend this warning before their JSON output and make it unparsable. The duplicated Codex-side config has the same regression. Please restore stderr routing on both copies.

No additional blocking issues surfaced this pass.

Generated by Codex (session 019df3f7-ef9b-7d00-95ca-2dfc455ec1c7).

visahak · 2026-05-04T17:15:29Z

comment #245 (comment) is addressed.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/unit/test_filesystem_backend.py (1)
33-40: ⚡ Quick win

Consider adding a ValidationError recovery test to complement the corrupt-JSON case.

The existing test_ensure_namespace_recovers_from_corrupt_json covers syntactically invalid JSON ({not json). Once the pydantic.ValidationError gap in _load_namespace_data is fixed (see the comment on filesystem.py), add a parallel test for structurally-invalid-but-valid-JSON (e.g. {"id": "ns_valid_json", "entities": "not_a_list"}) to lock in that recovery path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_filesystem_backend.py` around lines 33 - 40, Add a new unit
test alongside test_ensure_namespace_recovers_from_corrupt_json that creates a
namespace JSON file containing syntactically valid JSON but wrong schema (e.g.
{"id": "ns_valid_json", "entities": "not_a_list"}) to exercise the
pydantic.ValidationError path in _load_namespace_data; then call
client.ensure_namespace("ns_valid_json") and assert the returned namespace has
id == "ns_valid_json" and num_entities == 0, ensuring the code recovers from
schema validation errors the same way it recovers from corrupt JSON.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@altk_evolve/backend/filesystem.py`:
- Around line 65-70: The code currently only catches json.JSONDecodeError when
calling FilesystemNamespace.model_validate(json.loads(raw)), but
pydantic.ValidationError can also be raised for structurally invalid (but
syntactically valid) JSON; update the except block around
FilesystemNamespace.model_validate to catch pydantic.ValidationError as well,
log the same warning (including the exception), unlink the file_path, and raise
NamespaceNotFoundException(f"Namespace `{namespace_id}` not found") from the
caught ValidationError so ensure_namespace in evolve_client.py can perform the
intended recovery path.

In `@tests/unit/test_filesystem_backend.py`:
- Around line 21-53: Add the pytest "unit" marker to the three tests by
decorating each function (test_ensure_namespace_recovers_from_zero_byte_file,
test_ensure_namespace_recovers_from_corrupt_json,
test_save_tolerates_stale_shared_tmp) with `@pytest.mark.unit`; if pytest is not
already imported in the module, add "import pytest" at the top so the decorator
resolves. Ensure the decorator is placed immediately above each def line and
keep existing docstrings and asserts unchanged.

---

Nitpick comments:
In `@tests/unit/test_filesystem_backend.py`:
- Around line 33-40: Add a new unit test alongside
test_ensure_namespace_recovers_from_corrupt_json that creates a namespace JSON
file containing syntactically valid JSON but wrong schema (e.g. {"id":
"ns_valid_json", "entities": "not_a_list"}) to exercise the
pydantic.ValidationError path in _load_namespace_data; then call
client.ensure_namespace("ns_valid_json") and assert the returned namespace has
id == "ns_valid_json" and num_entities == 0, ensuring the code recovers from
schema validation errors the same way it recovers from corrupt JSON.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62d2fa42-654e-47f5-8f6a-96c3b9464c6a

📥 Commits

Reviewing files that changed from the base of the PR and between 904c917 and da23440.

📒 Files selected for processing (7)

altk_evolve/backend/filesystem.py
platform-integrations/claude/plugins/evolve-lite/lib/config.py
platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
tests/platform_integrations/test_bob_sharing.py
tests/platform_integrations/test_codex_sharing.py
tests/platform_integrations/test_sync.py
tests/unit/test_filesystem_backend.py

✅ Files skipped from review due to trivial changes (1)

platform-integrations/claw-code/plugins/evolve-lite/lib/config.py

🚧 Files skipped from review as they are similar to previous changes (1)

platform-integrations/claude/plugins/evolve-lite/lib/config.py

visahak · 2026-05-04T17:24:28Z

[P2 — NEW] Resolve the saved transcript path from EVOLVE_DIR — platform-integrations/claude/plugins/evolve-lite/skills/learn/scripts/on_stop.py:23
When users configure EVOLVE_DIR to a non-default location, the save-trajectory hook writes the copied transcript under $EVOLVE_DIR/trajectories, but this stop hook always tells the learn skill to read .evolve/trajectories/.... In that setup the learn skill will not find the saved transcript and will exit without extracting guidelines or influence records. Build this path with the same directory-resolution logic used by the save-trajectory hook.

Created issue #246 for this bug.

gaodan-fang

LGTM

Conflict-resolution notes (visahak's "address conflicts" ask on PR #235): * Main's PR #245 ("fix(e2e): resolve test failures from issue #244") modified the rendered platform-integrations/<claude,claw-code>/lib/ config.py and platform-integrations/codex/.../subscribe.py directly. Under this branch's unified layout these are generated artifacts — the fixes belong in plugin-source/ so they propagate to every platform on render. Two changes re-applied: 1. plugin-source/lib/config.py — print a stderr warning when _coerce_repo() rejects an invalid repo name (was silent). Updated message wording to match #245. Re-rendered to all four platforms (claude/claw-code/codex/bob); codex+bob were never changed by main but inherit consistent behavior under unify. 2. plugin-source/skills/evolve-lite/subscribe/scripts/subscribe.py — drop the rollback-on-audit-failure block; warn instead. Audit logging is best-effort, a failed append shouldn't undo a successful subscribe. Re-rendered to all four platforms; previously only codex had this fix. * .gitignore — kept both new entries (`example-guidelines` from this branch, `.codex` from main). * altk_evolve/backend/filesystem.py + tests/unit/test_filesystem_backend.py — accepted from main as-is (atomic writes + corrupt-JSON recovery for the filesystem backend). * tests/e2e/, tests/platform_integrations/test_{bob,codex}_sharing.py, tests/platform_integrations/test_sync.py — auto-merged cleanly, inspected, accepted. Verified: render-then-check is clean; 186 platform_integrations tests pass; the 4 new filesystem-backend unit tests pass.

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

visahak and others added 2 commits May 1, 2026 22:11

fix ruff formatting

253338f

Merge branch 'main' into fix/e2e-244-test-failures

904c917

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

visahak requested review from gaodan-fang, illeatmyhat and vinodmut May 3, 2026 00:40

addressing comments

da23440

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Comment thread altk_evolve/backend/filesystem.py

Comment thread tests/unit/test_filesystem_backend.py

visahak mentioned this pull request May 4, 2026

bug(learn-hook): on_stop ignores EVOLVE_DIR when pointing learn skill at saved transcript #246

Closed

3 tasks

addressing code rabbit issues

c2397e3

gaodan-fang approved these changes May 4, 2026

View reviewed changes

visahak merged commit a2ab28c into AgentToolkit:main May 4, 2026
16 checks passed

visahak deleted the fix/e2e-244-test-failures branch May 4, 2026 17:50

Conversation

visahak commented May 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gaodan-fang commented May 4, 2026

Codex Review

Full review comments

Uh oh!

visahak commented May 4, 2026

Codex Review

Full review comments

Uh oh!

gaodan-fang commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codex Re-review (post-update)

Prior comments status

New issue found

Uh oh!

gaodan-fang commented May 4, 2026

Codex Review (round 3)

Findings

Uh oh!

visahak commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

visahak commented May 4, 2026

Uh oh!

gaodan-fang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

visahak commented May 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 1, 2026 •

edited

Loading

gaodan-fang commented May 4, 2026 •

edited

Loading

visahak commented May 4, 2026 •

edited

Loading