Skip to content

fix(e2e): resolve test failures from issue #244#245

Merged
visahak merged 6 commits into
AgentToolkit:mainfrom
visahak:fix/e2e-244-test-failures
May 4, 2026
Merged

fix(e2e): resolve test failures from issue #244#245
visahak merged 6 commits into
AgentToolkit:mainfrom
visahak:fix/e2e-244-test-failures

Conversation

@visahak
Copy link
Copy Markdown
Collaborator

@visahak visahak commented May 1, 2026

Description:

Summary

Changes

  • altk_evolve/backend/filesystem.py — atomic writes via tmp + os.replace; graceful recovery from empty/corrupt JSON
  • altk_evolve/backend/filesystem.py — atomic writes via tmp + os.replace; graceful recovery from empty/corrupt JSON
  • platform-integrations/codex/.../subscribe.py — audit-append failure now warns and continues instead of rolling back with exit 1
  • platform-integrations/{claude,claw-code}/.../lib/config.py — invalid-repo-name warning prints to stdout with "(skipped - invalid subscription name)" wording
  • tests/e2e/test_e2e_segmentation.py@pytest.mark.flaky(retries=2, delay=1) to absorb LLM variance
  • tests/e2e/test_sharing.py — assertion now checks leaked entity content + [public: alice] marker instead of the query term, which always echoes in the response header
  • .gitignore — exclude .codex

Test plan

  • Group A: 3 Phoenix-sync e2e pipeline tests pass
  • Group B: 4 invalid-subscription-name tests pass across bob/codex/claude
  • Group C: test_subscribe_warns_when_audit_write_fails passes; full test_codex_sharing.py green

Addresses issue #244

Summary by CodeRabbit

  • Bug Fixes

    • Safer namespace file handling with atomic saves and recovery for empty or corrupt files
    • Subscription flow now tolerates audit-log write failures (continues with a warning)
  • Tests

    • Added unit tests for filesystem namespace recovery and save robustness
    • Added retry for a flaky segmentation test; strengthened cross-namespace discovery assertions
  • Chores

    • Added .codex to ignore list and clarified subscription validation/log messages

  - filesystem backend: atomic writes + graceful recovery from corrupt JSON
  - codex subscribe: warn on audit-append failure instead of rolling back
  - shared config: invalid-repo-name warning goes to stdout
  - segmentation test: mark flaky to absorb LLM variance
  - cross-namespace sharing test: check leaked content, not echoed query
  - .gitignore: exclude .codex
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

Warning

Rate limit exceeded

@visahak has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 43 minutes and 1 second before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a0291860-8fac-4bf0-bfcf-eb0914ad39b3

📥 Commits

Reviewing files that changed from the base of the PR and between da23440 and c2397e3.

📒 Files selected for processing (2)
  • altk_evolve/backend/filesystem.py
  • tests/unit/test_filesystem_backend.py
📝 Walkthrough

Walkthrough

Namespace persistence now uses atomic temp-file writes and stricter JSON handling; empty or corrupt namespace files are treated as missing and recovered. Added tests for these behaviors. Minor logging/message wording changes in two plugin configs and tests updated to expect messages on stderr. .codex added to .gitignore. Subscription audit-log failures are downgraded from fatal to warnings.

Changes

Git ignore

Layer / File(s) Summary
Config
.gitignore
Add .codex pattern to ignore list.

Filesystem namespace backend (atomic-save and recovery)

Layer / File(s) Summary
Imports / Helpers
altk_evolve/backend/filesystem.py
Add os import to support atomic replace operations.
Read / Validation
altk_evolve/backend/filesystem.py
_load_namespace_data() now reads namespace JSON via read_text(), treats empty/whitespace-only files as invalid (logs warning, unlinks file, raises NamespaceNotFoundException), and treats JSON decode errors similarly (warning, unlink, raise with decode error chained).
Atomic Write
altk_evolve/backend/filesystem.py
_save_namespace_data() writes JSON to a UUID-suffixed temporary sibling file, then atomically replaces the target namespace file with os.replace; temp-file cleanup on failure is implemented.
Unit tests
tests/unit/test_filesystem_backend.py
New tests/fixtures added to verify recovery from zero-byte and corrupt namespace files and tolerance of stale .tmp files during save.

Plugin config validation message changes

Layer / File(s) Summary
Validation messaging
platform-integrations/claude/plugins/evolve-lite/lib/config.py
Update _coerce_repo invalid-name message wording; print still written to stderr.
Validation messaging
platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
Update _coerce_repo invalid-name message wording and remove explicit file=sys.stderr from print.

Subscription audit logging behavior

Layer / File(s) Summary
Error handling change
platform-integrations/codex/plugins/evolve-lite/skills/subscribe/scripts/subscribe.py
On audit-log append failure, change from fatal rollback+exit to printing a warning and continuing (no rollback/deletion).

Tests — expectations and flakiness handling

Layer / File(s) Summary
Flakiness handling
tests/e2e/test_e2e_segmentation.py
Mark test_segment_trajectory_min_subtasks as flaky — up to 2 retries with 1s delay.
Assertion strengthening
tests/e2e/test_sharing.py
Tighten test_cross_namespace_public_discovery to assert full content string and absence of public owner marker when appropriate.
StdErr expectations
tests/platform_integrations/test_bob_sharing.py, tests/platform_integrations/test_codex_sharing.py, tests/platform_integrations/test_sync.py
Update tests to expect the “invalid subscription name” skip message on stderr instead of stdout.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Backend
    participant OS
    Client->>Backend: ensure_namespace(ns)
    Backend->>OS: read_text(ns.json)
    OS-->>Backend: file content (empty / corrupt / valid)
    alt empty or corrupt content
        Backend->>OS: unlink(ns.json)
        Backend-->>Client: raise NamespaceNotFoundException
        Client->>Backend: create_namespace(ns) / recover
    end
    Client->>Backend: save namespace update
    Backend->>OS: write temp file (ns.json.<uuid>.tmp)
    OS-->>Backend: temp file written
    Backend->>OS: os.replace(temp -> ns.json) (atomic swap)
    OS-->>Backend: replace success
    Backend-->>Client: save confirmed
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • gaodan-fang
  • illeatmyhat

Poem

🐰 I nibble at files and mend the fray,
Temp names hop in, then swap away,
Warnings whisper where errors slept,
Tests retry, and stale crumbs are swept,
A tidy burrow — code safe for play.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(e2e): resolve test failures from issue #244' accurately summarizes the main objective of the PR, which addresses end-to-end test failures from issue #244 through multiple fixes across the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 43 minutes and 1 second.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
platform-integrations/claw-code/plugins/evolve-lite/lib/config.py (1)

339-352: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Inconsistent output streams for sibling warnings in _coerce_repo.

The invalid-name warning (lines 339–341) now writes to stdout, while the invalid-scope warning (lines 349–352) still writes to stderr. Both are diagnostic/warning messages emitted from the same function for invalid config entries — they should target the same stream.

If the change was driven by a test that captures stdout, consider capturing stderr in the test instead (which is the conventional target for warnings) rather than routing a warning to stdout.

🔧 Proposed fix — restore `stderr` for the invalid-name path
     if not is_valid_repo_name(name.strip()):
         print(
-            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed"
+            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed",
+            file=sys.stderr,
         )
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claw-code/plugins/evolve-lite/lib/config.py` around
lines 339 - 352, The invalid-name diagnostic in _coerce_repo currently prints to
stdout while the invalid-scope diagnostic prints to stderr, causing inconsistent
output streams; change the print in the invalid-name branch that references name
to write to sys.stderr (matching the other warning), keep the return None
behavior, and ensure you import or reference sys the same way as the scope
branch so both warnings (the print that mentions "skipped - invalid subscription
name" and the print that mentions "unknown scope") go to stderr; symbols to
locate: function _coerce_repo, variables name/remote/scope, VALID_SCOPES, and
the two print calls.
platform-integrations/claude/plugins/evolve-lite/lib/config.py (1)

339-352: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Same inconsistent output-stream issue as in claw-code/plugins/evolve-lite/lib/config.py.

Identical concern: the invalid-name print (lines 339–341) targets stdout while the invalid-scope print (lines 349–352) targets stderr. Both paths should use the same stream. See the proposed fix in the claw-code file review above and apply it here as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py` around lines
339 - 352, The two diagnostic print calls in this module are inconsistent: the
invalid-name print (using name) currently writes to stdout while the
invalid-scope print (using entry.get("scope") and VALID_SCOPES) writes to
sys.stderr; change the invalid-name print so it also writes to sys.stderr (i.e.,
add file=sys.stderr) to match the invalid-scope branch and ensure both error
messages go to the same stream; locate the prints around the name validation and
the scope validation (they reference name, entry, scope and VALID_SCOPES) and
modify only the print for the invalid-name case to include file=sys.stderr.
🧹 Nitpick comments (1)
altk_evolve/backend/filesystem.py (1)

73-75: ⚡ Quick win

Use a unique tmp filename per write to be safe under multi-process deployments.

The deterministic {namespace_id}.json.tmp name is protected by threading.Lock within a single process, but not across OS-level workers. If two processes ever write the same namespace concurrently (e.g., uvicorn with --workers > 1), Writer B can overwrite the tmp file after Writer A wrote it but before Writer A calls os.replace, causing Writer A to atomically commit Writer B's (possibly partial) content.

Using a uniquely-named tmp file keeps the same-directory guarantee that os.replace needs while eliminating the race:

♻️ Proposed fix using `tempfile`
+import tempfile
 ...
     def _save_namespace_data(self, namespace_id: str, data: FilesystemNamespace):
         """Save namespace data to JSON file atomically. ..."""
         file_path = self._namespace_file(namespace_id)
-        tmp_path = file_path.with_suffix(file_path.suffix + ".tmp")
-        tmp_path.write_text(data.model_dump_json(indent=2))
-        os.replace(tmp_path, file_path)
+        fd, tmp_name = tempfile.mkstemp(dir=file_path.parent, suffix=".tmp")
+        try:
+            with os.fdopen(fd, "w", encoding="utf-8") as f:
+                f.write(data.model_dump_json(indent=2))
+            os.replace(tmp_name, file_path)
+        except Exception:
+            os.unlink(tmp_name)
+            raise

This also adds the encoding="utf-8" spec missing from both write_text and read_text (line 55) calls — relevant if any entity content contains non-ASCII characters on Windows where the default locale encoding may not be UTF-8.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@altk_evolve/backend/filesystem.py` around lines 73 - 75, The current write
uses a deterministic tmp file (tmp_path = file_path.with_suffix(...)) which
races across processes; change the write in the function/method that uses
tmp_path/file_path so it creates a unique temp file in the same directory (e.g.,
use tempfile.NamedTemporaryFile or tempfile.mkstemp with dir=file_path.parent
and a unique suffix), write the JSON with encoding="utf-8" (use write() or
write_text(..., encoding="utf-8")), flush/close the temp file, then call
os.replace(temp_path, file_path) to atomically commit; also update the
corresponding read_text call to include encoding="utf-8". Ensure the temp file
is removed on exceptions if not replaced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py`:
- Around line 339-352: The two diagnostic print calls in this module are
inconsistent: the invalid-name print (using name) currently writes to stdout
while the invalid-scope print (using entry.get("scope") and VALID_SCOPES) writes
to sys.stderr; change the invalid-name print so it also writes to sys.stderr
(i.e., add file=sys.stderr) to match the invalid-scope branch and ensure both
error messages go to the same stream; locate the prints around the name
validation and the scope validation (they reference name, entry, scope and
VALID_SCOPES) and modify only the print for the invalid-name case to include
file=sys.stderr.

In `@platform-integrations/claw-code/plugins/evolve-lite/lib/config.py`:
- Around line 339-352: The invalid-name diagnostic in _coerce_repo currently
prints to stdout while the invalid-scope diagnostic prints to stderr, causing
inconsistent output streams; change the print in the invalid-name branch that
references name to write to sys.stderr (matching the other warning), keep the
return None behavior, and ensure you import or reference sys the same way as the
scope branch so both warnings (the print that mentions "skipped - invalid
subscription name" and the print that mentions "unknown scope") go to stderr;
symbols to locate: function _coerce_repo, variables name/remote/scope,
VALID_SCOPES, and the two print calls.

---

Nitpick comments:
In `@altk_evolve/backend/filesystem.py`:
- Around line 73-75: The current write uses a deterministic tmp file (tmp_path =
file_path.with_suffix(...)) which races across processes; change the write in
the function/method that uses tmp_path/file_path so it creates a unique temp
file in the same directory (e.g., use tempfile.NamedTemporaryFile or
tempfile.mkstemp with dir=file_path.parent and a unique suffix), write the JSON
with encoding="utf-8" (use write() or write_text(..., encoding="utf-8")),
flush/close the temp file, then call os.replace(temp_path, file_path) to
atomically commit; also update the corresponding read_text call to include
encoding="utf-8". Ensure the temp file is removed on exceptions if not replaced.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ff5fdb0b-f09a-4c15-8202-8ffb15eee833

📥 Commits

Reviewing files that changed from the base of the PR and between d8b9ac0 and 83c55f6.

📒 Files selected for processing (7)
  • .gitignore
  • altk_evolve/backend/filesystem.py
  • platform-integrations/claude/plugins/evolve-lite/lib/config.py
  • platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
  • platform-integrations/codex/plugins/evolve-lite/skills/subscribe/scripts/subscribe.py
  • tests/e2e/test_e2e_segmentation.py
  • tests/e2e/test_sharing.py

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
platform-integrations/claude/plugins/evolve-lite/lib/config.py (1)

338-351: ⚡ Quick win

Inconsistent output streams for sibling warnings within _coerce_repo.

The invalid-name path (line 339) now emits to stdout via bare print(), while the invalid-scope path (lines 347-350) still emits to stderr via print(..., file=sys.stderr). Both are diagnostics for malformed config entries in the same function; routing them to different streams is surprising to callers.

As noted in the relevant context, sync.py already uses capture_output=True for subprocess calls. If any parent process captures these script invocations with stdout capture, the invalid-name warning will be inadvertently swallowed into the captured output buffer rather than flowing to the diagnostic stream.

If the test suite change that prompted this was asserting on stdout, consider redirecting both warnings to stderr and updating the tests to check capsys.readouterr().err (or equivalent), which keeps all config diagnostics on the standard diagnostic channel.

♻️ Proposed alignment (both warnings → stderr)
     if not is_valid_repo_name(name.strip()):
-        print(f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed")
+        print(
+            f"evolve-lite: {name!r} (skipped - invalid subscription name) — only A-Z, a-z, 0-9, '.', '_', '-' allowed",
+            file=sys.stderr,
+        )
         return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py` around lines
338 - 351, In _coerce_repo, the invalid-name warning currently uses print(...)
to stdout while the invalid-scope warning prints to stderr; change the
invalid-name branch that prints "evolve-lite: {name!r} (skipped - invalid
subscription name) ..." to write to stderr (e.g., use print(...,
file=sys.stderr)) so both diagnostics go to stderr, and update any tests that
asserted on stdout to assert on stderr (capsys.readouterr().err) as needed;
reference symbols: function _coerce_repo, variables name and entry, and
VALID_SCOPES.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@platform-integrations/claude/plugins/evolve-lite/lib/config.py`:
- Around line 338-351: In _coerce_repo, the invalid-name warning currently uses
print(...) to stdout while the invalid-scope warning prints to stderr; change
the invalid-name branch that prints "evolve-lite: {name!r} (skipped - invalid
subscription name) ..." to write to stderr (e.g., use print(...,
file=sys.stderr)) so both diagnostics go to stderr, and update any tests that
asserted on stdout to assert on stderr (capsys.readouterr().err) as needed;
reference symbols: function _coerce_repo, variables name and entry, and
VALID_SCOPES.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7a4305af-488c-4d5a-a4b3-343458c79cd7

📥 Commits

Reviewing files that changed from the base of the PR and between 83c55f6 and 904c917.

📒 Files selected for processing (2)
  • platform-integrations/claude/plugins/evolve-lite/lib/config.py
  • platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
✅ Files skipped from review due to trivial changes (1)
  • platform-integrations/claw-code/plugins/evolve-lite/lib/config.py

@gaodan-fang
Copy link
Copy Markdown
Collaborator

Codex Review

Target: branch diff against main

The filesystem backend hardening introduces recovery/concurrency gaps around namespace writes. These can still break startup or writes in realistic interrupted-write or multi-process scenarios.

Full review comments

  • [P2] Use unique temp files for namespace savesaltk_evolve/backend/filesystem.py:73-75
    When two processes or two FilesystemEntityBackend instances write the same namespace concurrently, they both use the same <namespace>.json.tmp path; one writer can replace the other writer's temp file, causing FileNotFoundError or committing the wrong payload. The in-process lock only protects a single backend object, so CLI/MCP processes sharing data_dir still hit this. Use a uniquely-created sibling temp file before os.replace.

  • [P2] Clear corrupt namespace files before recreatingaltk_evolve/backend/filesystem.py:56-58
    When an interrupted write leaves a 0-byte namespace file, this now raises NamespaceNotFoundException, but ensure_namespace() will then call create_namespace(), which still sees the existing file and raises NamespaceAlreadyExistsException before any replacement. The default startup path remains stuck on the bad file despite "treating as missing"; remove/rename the corrupt file or let namespace creation overwrite files that were classified as corrupt.


Generated by Codex (session 019df380-a019-7521-b0ef-3362ce6cf7d9).

- Unique tmp filenames for namespace saves: concurrent CLI/MCP writers
  sharing data_dir no longer clobber each other's <ns>.json.tmp, which
  caused FileNotFoundError at os.replace time.
- Unlink empty/corrupt namespace files when classifying as missing, so
  the create_namespace() call inside ensure_namespace() doesn't trip on
  the stale file and raise NamespaceAlreadyExistsException, which left
  startup wedged despite the "treat as missing" recovery branch.

Adds tests/unit/test_filesystem_backend.py covering both paths.
@visahak
Copy link
Copy Markdown
Collaborator Author

visahak commented May 4, 2026

Codex Review

Target: branch diff against main

The filesystem backend hardening introduces recovery/concurrency gaps around namespace writes. These can still break startup or writes in realistic interrupted-write or multi-process scenarios.

Full review comments

  • [P2] Use unique temp files for namespace savesaltk_evolve/backend/filesystem.py:73-75
    When two processes or two FilesystemEntityBackend instances write the same namespace concurrently, they both use the same <namespace>.json.tmp path; one writer can replace the other writer's temp file, causing FileNotFoundError or committing the wrong payload. The in-process lock only protects a single backend object, so CLI/MCP processes sharing data_dir still hit this. Use a uniquely-created sibling temp file before os.replace.
  • [P2] Clear corrupt namespace files before recreatingaltk_evolve/backend/filesystem.py:56-58
    When an interrupted write leaves a 0-byte namespace file, this now raises NamespaceNotFoundException, but ensure_namespace() will then call create_namespace(), which still sees the existing file and raises NamespaceAlreadyExistsException before any replacement. The default startup path remains stuck on the bad file despite "treating as missing"; remove/rename the corrupt file or let namespace creation overwrite files that were classified as corrupt.

Generated by Codex (session 019df380-a019-7521-b0ef-3362ce6cf7d9).

@gaodan-fang Addressed issues in the latest commit 97a3e02

@gaodan-fang
Copy link
Copy Markdown
Collaborator

gaodan-fang commented May 4, 2026

Codex Re-review (post-update)

Target: branch diff against main (head 97a3e02, "fix(filesystem): address Codex review on interrupted-write recovery")

Prior comments status

  • [P2] Unique temp files for namespace saves (filesystem.py:73-75) — addressed; not reflagged.
  • [P2] Clear corrupt namespace files before recreating (filesystem.py:56-58) — addressed; not reflagged.

New issue found

  • [P2] Route invalid config warnings to stderrplatform-integrations/claude/plugins/evolve-lite/lib/config.py:339
    When evolve.config.yaml contains an invalid repo name, normalize_repos() now writes this warning to stdout. Commands such as unsubscribe --list emit machine-readable JSON on stdout after calling normalize_repos(), so a stale bad entry will prefix the JSON with this warning and break consumers. The previous code correctly sent the warning to stderr — please restore that. @visahak this comment seems like align with fix(e2e): resolve test failures from issue #244 #245 (review)

No additional blocking issues surfaced in the inspected areas.


Generated by Codex (session 019df3b0-5126-7ac0-8e6e-4b0ceb15264f).

@gaodan-fang
Copy link
Copy Markdown
Collaborator

Codex Review (round 3)

Target: branch diff against main (head 97a3e02)

Two P2 issues remain — one new, one still unaddressed from the prior round.

Findings

  • [P2 — NEW] Resolve the saved transcript path from EVOLVE_DIRplatform-integrations/claude/plugins/evolve-lite/skills/learn/scripts/on_stop.py:23
    When users configure EVOLVE_DIR to a non-default location, the save-trajectory hook writes the copied transcript under $EVOLVE_DIR/trajectories, but this stop hook always tells the learn skill to read .evolve/trajectories/.... In that setup the learn skill will not find the saved transcript and will exit without extracting guidelines or influence records. Build this path with the same directory-resolution logic used by the save-trajectory hook.

  • [P2 — STILL OPEN] Keep invalid-config warnings off stdoutplatform-integrations/claude/plugins/evolve-lite/lib/config.py:339
    Previously flagged. Still prints to stdout. Commands that call normalize_repos() and produce machine-readable stdout (e.g. unsubscribe --list) will prepend this warning before their JSON output and make it unparsable. The duplicated Codex-side config has the same regression. Please restore stderr routing on both copies.

No additional blocking issues surfaced this pass.


Generated by Codex (session 019df3f7-ef9b-7d00-95ca-2dfc455ec1c7).

@visahak
Copy link
Copy Markdown
Collaborator Author

visahak commented May 4, 2026

comment #245 (comment) is addressed.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/unit/test_filesystem_backend.py (1)

33-40: ⚡ Quick win

Consider adding a ValidationError recovery test to complement the corrupt-JSON case.

The existing test_ensure_namespace_recovers_from_corrupt_json covers syntactically invalid JSON ({not json). Once the pydantic.ValidationError gap in _load_namespace_data is fixed (see the comment on filesystem.py), add a parallel test for structurally-invalid-but-valid-JSON (e.g. {"id": "ns_valid_json", "entities": "not_a_list"}) to lock in that recovery path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_filesystem_backend.py` around lines 33 - 40, Add a new unit
test alongside test_ensure_namespace_recovers_from_corrupt_json that creates a
namespace JSON file containing syntactically valid JSON but wrong schema (e.g.
{"id": "ns_valid_json", "entities": "not_a_list"}) to exercise the
pydantic.ValidationError path in _load_namespace_data; then call
client.ensure_namespace("ns_valid_json") and assert the returned namespace has
id == "ns_valid_json" and num_entities == 0, ensuring the code recovers from
schema validation errors the same way it recovers from corrupt JSON.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@altk_evolve/backend/filesystem.py`:
- Around line 65-70: The code currently only catches json.JSONDecodeError when
calling FilesystemNamespace.model_validate(json.loads(raw)), but
pydantic.ValidationError can also be raised for structurally invalid (but
syntactically valid) JSON; update the except block around
FilesystemNamespace.model_validate to catch pydantic.ValidationError as well,
log the same warning (including the exception), unlink the file_path, and raise
NamespaceNotFoundException(f"Namespace `{namespace_id}` not found") from the
caught ValidationError so ensure_namespace in evolve_client.py can perform the
intended recovery path.

In `@tests/unit/test_filesystem_backend.py`:
- Around line 21-53: Add the pytest "unit" marker to the three tests by
decorating each function (test_ensure_namespace_recovers_from_zero_byte_file,
test_ensure_namespace_recovers_from_corrupt_json,
test_save_tolerates_stale_shared_tmp) with `@pytest.mark.unit`; if pytest is not
already imported in the module, add "import pytest" at the top so the decorator
resolves. Ensure the decorator is placed immediately above each def line and
keep existing docstrings and asserts unchanged.

---

Nitpick comments:
In `@tests/unit/test_filesystem_backend.py`:
- Around line 33-40: Add a new unit test alongside
test_ensure_namespace_recovers_from_corrupt_json that creates a namespace JSON
file containing syntactically valid JSON but wrong schema (e.g. {"id":
"ns_valid_json", "entities": "not_a_list"}) to exercise the
pydantic.ValidationError path in _load_namespace_data; then call
client.ensure_namespace("ns_valid_json") and assert the returned namespace has
id == "ns_valid_json" and num_entities == 0, ensuring the code recovers from
schema validation errors the same way it recovers from corrupt JSON.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62d2fa42-654e-47f5-8f6a-96c3b9464c6a

📥 Commits

Reviewing files that changed from the base of the PR and between 904c917 and da23440.

📒 Files selected for processing (7)
  • altk_evolve/backend/filesystem.py
  • platform-integrations/claude/plugins/evolve-lite/lib/config.py
  • platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
  • tests/platform_integrations/test_bob_sharing.py
  • tests/platform_integrations/test_codex_sharing.py
  • tests/platform_integrations/test_sync.py
  • tests/unit/test_filesystem_backend.py
✅ Files skipped from review due to trivial changes (1)
  • platform-integrations/claw-code/plugins/evolve-lite/lib/config.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • platform-integrations/claude/plugins/evolve-lite/lib/config.py

Comment thread altk_evolve/backend/filesystem.py
Comment thread tests/unit/test_filesystem_backend.py
@visahak
Copy link
Copy Markdown
Collaborator Author

visahak commented May 4, 2026

  • [P2 — NEW] Resolve the saved transcript path from EVOLVE_DIRplatform-integrations/claude/plugins/evolve-lite/skills/learn/scripts/on_stop.py:23
    When users configure EVOLVE_DIR to a non-default location, the save-trajectory hook writes the copied transcript under $EVOLVE_DIR/trajectories, but this stop hook always tells the learn skill to read .evolve/trajectories/.... In that setup the learn skill will not find the saved transcript and will exit without extracting guidelines or influence records. Build this path with the same directory-resolution logic used by the save-trajectory hook.

Created issue #246 for this bug.

Copy link
Copy Markdown
Collaborator

@gaodan-fang gaodan-fang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@visahak visahak merged commit a2ab28c into AgentToolkit:main May 4, 2026
16 checks passed
@visahak visahak deleted the fix/e2e-244-test-failures branch May 4, 2026 17:50
illeatmyhat added a commit that referenced this pull request May 4, 2026
Conflict-resolution notes (visahak's "address conflicts" ask on PR #235):

* Main's PR #245 ("fix(e2e): resolve test failures from issue #244")
  modified the rendered platform-integrations/<claude,claw-code>/lib/
  config.py and platform-integrations/codex/.../subscribe.py directly.
  Under this branch's unified layout these are generated artifacts —
  the fixes belong in plugin-source/ so they propagate to every
  platform on render. Two changes re-applied:

    1. plugin-source/lib/config.py — print a stderr warning when
       _coerce_repo() rejects an invalid repo name (was silent).
       Updated message wording to match #245. Re-rendered to all four
       platforms (claude/claw-code/codex/bob); codex+bob were never
       changed by main but inherit consistent behavior under unify.

    2. plugin-source/skills/evolve-lite/subscribe/scripts/subscribe.py
       — drop the rollback-on-audit-failure block; warn instead.
       Audit logging is best-effort, a failed append shouldn't undo a
       successful subscribe. Re-rendered to all four platforms;
       previously only codex had this fix.

* .gitignore — kept both new entries (`example-guidelines` from this
  branch, `.codex` from main).

* altk_evolve/backend/filesystem.py + tests/unit/test_filesystem_backend.py
  — accepted from main as-is (atomic writes + corrupt-JSON recovery
  for the filesystem backend).

* tests/e2e/, tests/platform_integrations/test_{bob,codex}_sharing.py,
  tests/platform_integrations/test_sync.py — auto-merged cleanly,
  inspected, accepted.

Verified: render-then-check is clean; 186 platform_integrations tests
pass; the 4 new filesystem-backend unit tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants