Skip to content

feat: report session source availability#66

Closed
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:feat/54-source-availability
Closed

feat: report session source availability#66
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:feat/54-source-availability

Conversation

@steezkelly
Copy link
Copy Markdown

Summary

Partially implements issue #54 step 1 by making all-source dry runs report explicit per-source availability and candidate counts.

Adds:

  • describe_source_availability(sources, importers)
  • dry-run output of available / unavailable, candidate count, and extraction error if any
  • tests for available, empty, and erroring source importers

This turns the existing dry-run count display into an explicit availability boundary, so --source all --dry-run no longer silently treats empty/erroring sources as just zero messages.

Example behavior

python -m evolution.core.external_importers --source all --skill <skill> --dry-run

Now prints each source as:

claude-code: available, candidates=12
copilot: unavailable, candidates=0
hermes: unavailable, candidates=0 error=<reason>

Scope / non-goals

This is the smallest safe ingestion-boundary slice. It does not yet implement the full canonical event/message schema or persist all source metadata into EvalExample; those should follow as separate PRs because they touch dataset serialization and deduplication behavior.

Test Plan

  • RED first: pytest tests/core/test_external_importer_availability.py -q failed because describe_source_availability did not exist
  • pytest tests/core/test_external_importer_availability.py -q
  • pytest -q
  • static added-line security scan
  • git diff --check

Result: 140 passed, 11 warnings (DSPy deprecation warnings only).

Partially addresses #54.

@steezkelly
Copy link
Copy Markdown
Author

Closing this split PR in favor of consolidated PR #67. Local integration found review/merge overhead across the stack (notably #61/#64 overlap in evolution/skills/evolve_skill.py), and #67 preserves the combined local test evidence: targeted stack tests 21 passed; full suite 160 passed; GitHub checks were absent on the split PRs. Review #67 instead.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

Verdict: Comment — Clean, well-scoped addition. Two non-blocking suggestions below.

💡 Suggestions

  • evolution/core/external_importers.py:737available flag conflates reachability with data presence; see inline.
  • tests/core/test_external_importer_availability.py:27 — Three cases in one assertion; worth splitting for clearer pytest output.

✅ Looks Good

  • Narrow, focused scope — does exactly what the PR description promises
  • Bare-except is scoped to Exception (not BaseException), which is appropriate here
  • Dry-run path now gives genuinely actionable output instead of a silent zero
  • Test fixtures are minimal and self-explanatory
  • RED-first test process documented in the PR description

Reviewed by Hermes Agent

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The available flag conflates reachability with data presence. A source that succeeds but returns an empty list (e.g. the user simply has no history yet) gets marked available: False, which could mislead callers into thinking the source is broken.

Consider tracking reachability separately:

report[source] = {
    "reachable": True,          # True whenever no exception is raised
    "has_candidates": candidate_count > 0,
    "candidate_count": candidate_count,
    "error": None,
}

This way callers (and the dry-run display) can distinguish "the source works but is empty right now" from "the source errored".

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Three distinct behaviours are asserted in a single test function. If only the error branch breaks, pytest still reports the whole function as failed with a large diff, making it harder to pinpoint the regression.

Consider splitting into focused tests:

def test_available_source_reports_count():
    report = describe_source_availability(["a"], {"a": AvailableImporter})
    assert report["a"]["available"] is True
    assert report["a"]["candidate_count"] == 2

def test_empty_source_is_unavailable():
    report = describe_source_availability(["e"], {"e": EmptyImporter})
    assert report["e"]["available"] is False
    assert report["e"]["error"] is None

def test_erroring_source_is_unavailable():
    report = describe_source_availability(["x"], {"x": ErrorImporter})
    assert report["x"]["available"] is False
    assert "unreadable" in report["x"]["error"]

The original single-function form is fine to keep as an integration-style smoke test; the split tests complement it rather than replace it.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Clean, focused slice — good scoping decision to keep this PR small. Two minor suggestions below (nothing blocking).

✅ Looks Good

  • Well-isolated function with a clear single responsibility
  • Dry-run output is now meaningfully differentiated between empty and erroring sources
  • Test stubs cover all three branches (available, empty, error)

💡 Suggestions

  • external_importers.py:737available: candidate_count > 0 conflates a reachable-but-empty source with a broken one
  • test_external_importer_availability.py:27 — one omnibus test covers three distinct scenarios; splitting would make failures self-documenting

Reviewed by Hermes Agent

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: available: candidate_count > 0 silently treats a reachable-but-empty source the same as a broken one. A source whose importer runs without raising is technically available — it just has no candidates yet. Consider splitting the flag into reachable: True + has_candidates: candidate_count > 0, or at minimum renaming to has_candidates so callers aren’t misled into thinking an empty-but-healthy source is down.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: This single test exercises three distinct scenarios (available, empty, error) in one assertion block. Splitting into test_available_source, test_empty_source, and test_erroring_source would make failures self-documenting — a broken ErrorImporter path wouldn’t show up as a generic AssertionError on a 53-line function; the test name alone would identify which case regressed.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Nice, clean implementation of the availability boundary — the dry-run output is much more actionable now. Two non-blocking observations inline; nothing here should block a future re-open/merge.

✅ Looks Good

  • Good exception guard around extract_messages() — errors surface cleanly as error=... in output.
  • Tests cover all three state classes (available, empty, erroring) and read clearly.
  • The scoping note in the PR description ("does not yet implement full canonical event/message schema") is exactly the right way to bound this slice.

💡 Suggestions (non-blocking)

See inline comments for two small suggestions.

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: available: candidate_count > 0 conflates two distinct states — a source that is reachable but has no messages yet (empty history) vs. one that is unreachable/erroring. If callers ever want to distinguish "can I reach this source?" from "does it have candidates?", you'll need a schema change later.

Consider a separate reachable flag:

report[source] = {
    "reachable": True,   # extraction succeeded
    "available": candidate_count > 0,
    "candidate_count": candidate_count,
    "error": None,
}
# and in the except block: "reachable": False

Totally fine to defer if the current two-value display is sufficient — just worth a note in the docstring if so.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: One mega-test covering all three cases means a single assertion failure doesn't immediately tell you which source class is broken. Consider splitting into three focused tests:

def test_available_source_is_reported_correctly():
    report = describe_source_availability(["src"], {"src": AvailableImporter})
    assert report["src"]["available"] is True
    assert report["src"]["candidate_count"] == 2
    assert report["src"]["error"] is None

def test_empty_source_is_reported_as_unavailable():
    ...

def test_erroring_source_is_reported_with_error_message():
    ...

Clearer failure attribution in CI at zero extra cost.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review — PR #66

Verdict: Comment (no blocking issues — two targeted suggestions, one on output safety, one on test structure)

✅ What's solid

  • describe_source_availability is cleanly scoped: pure function, no side effects, no LLM calls, easy to unit-test.
  • Moving the availability probe before the for src in sources render loop is the right call — one pass to gather data, one pass to display it.
  • except Exception as exc is appropriate here (dry-run probe; BaseException subclasses like KeyboardInterrupt are not caught).
  • Good TDD discipline demonstrated in the test plan.

See inline comments for two suggestions.

console.print(f" {src}: {len(msgs)} messages")
info = availability[src]
status = "available" if info["available"] else "unavailable"
error = f" error={info['error']}" if info["error"] else ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The error value is rendered unquoted, so multi-word or punctuated messages (e.g. FileNotFoundError: [Errno 2] No such file or directory) will be visually ambiguous — readers can't tell where the error string ends if it contains spaces or = signs.

Consider quoting the value for unambiguous, machine-parseable output:

error = f' error="{info["error"]}"' if info["error"] else ""

This keeps the format consistent with key=value convention while making boundaries explicit.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The single test bundles three distinct behaviours (available, empty, erroring importer). Splitting into focused tests improves failure isolation — when 'empty source' regresses, a dedicated test_empty_source_is_unavailable immediately surfaces the culprit rather than requiring you to read which assertion inside the monolith failed.

Also worth adding a fourth case: a source key that is present in sources but absent from importers. Right now that raises a bare KeyError which is swallowed by the broad except Exception and reported as error="'missing_key'" — probably fine, but the behaviour is implicit and untested.

def test_available_source():
    report = describe_source_availability(["a"], {"a": AvailableImporter})
    assert report["a"]["available"] is True
    assert report["a"]["candidate_count"] == 2
    assert report["a"]["error"] is None

def test_empty_source_is_unavailable():
    report = describe_source_availability(["e"], {"e": EmptyImporter})
    assert report["e"]["available"] is False

def test_erroring_source_captures_message():
    report = describe_source_availability(["x"], {"x": ErrorImporter})
    assert report["x"]["available"] is False
    assert "history unreadable" in report["x"]["error"]

def test_source_missing_from_importers_is_reported_as_error():
    report = describe_source_availability(["ghost"], {})
    assert report["ghost"]["available"] is False
    assert report["ghost"]["error"] is not None

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Verdict: COMMENT — 0 critical, 0 warnings, 2 suggestions

Clean, well-scoped addition. The security threat scan found no issues; the new function contains no secrets, no injection surfaces, no auth bypass, and no path traversal. The test suite (140 passing, 11 DSPy deprecation warnings) is unaffected. Two suggestions for the team's consideration — neither is a blocker.

try:
messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: available: candidate_count > 0 conflates source reachability with data presence. A source whose directory exists but contains zero messages is reported as unavailable, which may mislead users who see copilot: unavailable, candidates=0 and assume Copilot is misconfigured when it is simply empty.

Consider one of:

  • Rename the field to has_candidates to make the semantics explicit.
  • Split into two fields: reachable: True/False (driven by whether extract_messages() raised) and candidate_count: int, dropping the derived available boolean entirely.

Current output is correct for the described use-case; the naming is the concern.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: All three scenarios (available, empty, error) are bundled into one test function with a single composite assertion. If the empty case regresses, pytest reports the whole function as failed, and the diff covers all three keys — harder to pinpoint at a glance.

Consider splitting into three focused tests:

def test_available_source_reports_candidate_count(): ...
def test_empty_source_is_marked_not_available(): ...
def test_erroring_source_captures_exception_message(): ...

Each test name becomes self-documenting failure output, and the assertions are smaller and faster to read in CI.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEST

def describe_source_availability(
sources: list[str],
importers: dict[str, type],
) -> dict[str, dict[str, Any]]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test inline comment

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Verdict: COMMENT — 0 critical, 0 warnings, 2 suggestions

✅ Looks Good

  • Clean separation of concerns: describe_source_availability is a pure function with no side effects — easy to test and reuse.
  • Error handling: bare except Exception is appropriate here since the goal is precisely to surface importer failures rather than propagate them.
  • Security: Full threat scan applied — no hardcoded secrets, no injection surfaces, no auth bypass, no path traversal. Security posture is unchanged.
  • Test coverage: The new test exercises all three meaningful states (available, empty, erroring) and follows the project's RED-first TDD discipline.
  • CI: 140 tests passing; 11 warnings are pre-existing DSPy deprecations unrelated to this PR.

💡 Suggestions

See inline comments for two non-blocking suggestions.


Reviewed by Hermes Agent — full file context read, security threat scan applied

candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
"candidate_count": candidate_count,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: "available": candidate_count > 0 conflates source reachability with data presence. A source whose directory exists but contains zero messages is reported as unavailable, which may mislead users who see copilot: unavailable, candidates=0 and assume Copilot is misconfigured when it is simply empty.

Consider one of:

  • Rename the field to has_candidates to make the semantics explicit.
  • Split into two fields: reachable: True/False (driven by whether extract_messages() raised) and candidate_count: int, dropping the derived boolean entirely.

Not a blocker — the current behaviour matches the PR description — but the naming may cause confusion in follow-on PRs that extend the availability report.



def test_describe_source_availability_reports_available_empty_and_errors():
report = describe_source_availability(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: All three scenarios (available, empty, erroring) are asserted inside a single test function. If the empty case regresses, pytest reports the whole function as failed and the diff covers all three keys, making it harder to pinpoint the breakage at a glance.

Consider splitting into three focused tests:

def test_available_source_reports_candidate_count(): ...
def test_empty_source_is_marked_not_available(): ...
def test_erroring_source_captures_exception_message(): ...

Each test name becomes self-documenting failure output in CI, and the assertions are smaller and faster to read.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Verdict: Comment — 0 critical, 0 warnings, 2 suggestions.

Full test suite: 140 passed, 11 DSPy deprecation warnings only. Security scan: clean (no credentials, SQL, XSS, path traversal, auth bypass, or secret-to-log issues found).

See inline comments for suggestions.

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The flag equates to "has at least one message", which means a correctly-installed source with an empty history (e.g. a fresh Claude Code install with no prompts yet) is reported as . This conflates "source is reachable / configured" with "source has candidate data". Consider separating the two concepts: a source that returns without raising should be , while a source that raises should be . The current semantics may mislead users into thinking their tool is not installed when it simply has no history yet.

console.print(f" {src}: {len(msgs)} messages")
info = availability[src]
status = "available" if info["available"] else "unavailable"
error = f" error={info['error']}" if info["error"] else ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: When an error message contains spaces (e.g. ) the output line becomes , which is ambiguous — a reader cannot tell where the key ends and the value begins. Consider quoting the error value: or wrapping it in brackets (), so the output is unambiguous regardless of what the exception message contains.

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: COMMENT — 0 critical, 0 warnings, 2 suggestions

✅ Looks Good

  • Clean separation of concerns: describe_source_availability is a pure function with no side effects — easy to test and reuse
  • Error handling: bare except Exception is appropriate here; surfacing importer failures is the entire point
  • Security: full threat scan applied — no hardcoded secrets, no injection surfaces, no auth bypass, no path traversal; security posture unchanged
  • Test coverage: all three meaningful importer states covered (available, empty, erroring); RED-first TDD discipline followed
  • CI: 140 tests passing; 11 warnings are pre-existing DSPy deprecations unrelated to this PR

💡 Suggestions (non-blocking)

  • evolution/core/external_importers.pydescribe_source_availability: "available": candidate_count > 0 conflates source reachability with data presence. An accessible-but-empty source is reported as unavailable. Consider renaming the field to has_candidates or separating reachability from count.
  • tests/core/test_external_importer_availability.py: Three scenarios are bundled into one test function. Splitting into test_available_source_reports_candidate_count, test_empty_source_is_marked_not_available, and test_erroring_source_captures_exception_message gives clearer CI failure output and better test isolation.

Reviewed by Hermes Agent — full file context read, security threat scan applied

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment — 0 critical, 0 warnings, 2 suggestions

Test Results

  • Full suite: 140 passed, 11 DSPy deprecation warnings (pre-existing, not introduced by this PR)
  • New test: test_describe_source_availability_reports_available_empty_and_errors — PASSED

Security Scan

Clean — no hardcoded credentials, SQL injection, XSS, path traversal, auth bypass, or secrets-to-logs issues found.

Suggestions

  • evolution/core/external_importers.py:737available: candidate_count > 0 conflates "source is reachable/configured" with "source has candidate data". A correctly-installed source with zero history (e.g. fresh install, no prompts yet) reports as unavailable, which may mislead users into thinking their tool isn't set up. Consider available=True for empty-but-reachable, and reserve available=False for sources that raise exceptions.

  • evolution/core/external_importers.py:791 — The error output format (error=No such file or directory) is ambiguous when exception messages contain spaces. A reader cannot tell where the key ends and the value begins. Consider quoting the value (error="...") or using brackets (error=[...]) to make the output unambiguous for any exception message.

Looks Good

  • Clean separation of availability reporting from LLM calls — well-motivated by the PR's "no LLM in dry-run" goal
  • Bare except Exception catching in describe_source_availability is appropriate here (we want to surface errors as structured data, not crash the dry-run)
  • Good docstring on describe_source_availability that accurately describes intent
  • TDD approach documented in the test plan and borne out by the new test file
  • The new function is pure and composable — it takes injected importers rather than hard-coding the source map, making it trivially testable (as demonstrated)
  • No regression in existing tests

Reviewed by Hermes Agent — full file context read, security threat scan applied

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Summary

This PR cleanly implements the first step of issue #54: making --source all --dry-run report per-source availability with explicit status, candidate count, and error reason instead of silently showing zero counts. The scope is well-contained, the separation of concerns is good (pure function extracted from CLI), and the test coverage covers all three meaningful states (available, empty, erroring).

Findings

No security issues found. Static scan of added lines found no hardcoded credentials, injection surfaces, or secrets leaking to output.

Correctness — Minor

  • describe_source_availability calls importers[source].extract_messages() without a limit argument. For sources with large histories (especially Copilot, which streams 100 MB+ files), this will read the entire dataset on every dry run — including when the user just wants a quick "is it reachable?" check. The --dry-run flag implies low cost; the full read removes that property.
  • If source is not a key in importers, the function raises KeyError uncaught. The CLI always passes a consistent importers dict, so this is safe today, but the function's public signature offers no protection if called programmatically with a mismatched list.

Code Quality — Minor

  • The bare except Exception as exc swallows the traceback. str(exc) is often just the exception message without context (e.g., for FileNotFoundError it omits the filename from the OS-level message in some Python versions). Storing repr(exc) or at minimum the exception type name improves debuggability without changing the output contract.
  • The error format in the printed output (error=<reason>) is unquoted, so a multi-word error string like Connection refused: [Errno 111] produces ambiguous output. Consider quoting: error="<reason>".

Testing — Minor

  • The single test function covers all three states but is a monolithic assertion. Splitting into three focused tests (test_available_source, test_empty_source, test_erroring_source) would make failure messages more precise and make it easier to add parametrized cases later.
  • No test covers the KeyError path (source in list but not in importers dict), which is an implicit contract the caller must uphold.

Performance — Minor

  • As noted above: describe_source_availability performs a full extract_messages() call with no limit. A limit=1 probe would be sufficient to confirm availability and would be much cheaper for large Copilot/Hermes histories.

Documentation

  • The docstring says "without LLM calls" but says nothing about the side-effect of fully reading source data. Worth noting that this is not a cheap existence check.

Overall

Clean, minimal, well-scoped PR. The main actionable item before the next PR (canonical event schema) is the full-read performance issue — it undermines the low-cost expectation of --dry-run. Everything else is polish.

"error": None,
}
except Exception as exc:
report[source] = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider replacing bare str(exc) with repr(exc) here to preserve the exception type in the stored error string. str(FileNotFoundError("foo")) gives just "foo", while repr() gives "FileNotFoundError(\"foo\")" — easier to triage without a traceback.

Also consider adding a limit=1 probe instead of a full extract_messages() call. A dry-run availability check does not need all messages — just one is enough to confirm the source is reachable and non-empty, avoiding a full 100 MB+ Copilot session read:

messages = importers[source].extract_messages(limit=1)
candidate_count = len(messages)  # 0 or 1 — sufficient for availability

If the actual count matters for display, a separate (lazily deferred) full count could be added in a follow-up.



def test_describe_source_availability_reports_available_empty_and_errors():
report = describe_source_availability(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Consider splitting this monolithic assertion into three focused test functions — one per source state. This way, a failure in the "empty" case does not hide the result of "error" or "available":

def test_available_source_is_reported_correctly():
    report = describe_source_availability(["src"], {"src": AvailableImporter})
    assert report["src"] == {"available": True, "candidate_count": 2, "error": None}

def test_empty_source_is_reported_as_unavailable():
    report = describe_source_availability(["src"], {"src": EmptyImporter})
    assert report["src"] == {"available": False, "candidate_count": 0, "error": None}

def test_erroring_source_captures_exception_message():
    report = describe_source_availability(["src"], {"src": ErrorImporter})
    assert report["src"]["available"] is False
    assert "history unreadable" in report["src"]["error"]

Also worth adding a test for a source key present in sources but absent from importers to document (and enforce) the expected behavior — currently a KeyError is raised uncaught.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review — PR #66

Clean, well-scoped PR. The new describe_source_availability function is a clear improvement over the silent zero-count display — making availability explicit is exactly the right abstraction boundary. Two non-blocking suggestions below.

Verdict: COMMENT (no blocking issues; 2 suggestions)

report: dict[str, dict[str, Any]] = {}
for source in sources:
try:
messages = importers[source].extract_messages()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: importers[source] is inside the try block, so a KeyError from an unknown source key is silently swallowed and reported as available: False, error="source_name". This turns a programming error (caller passed a source not registered in the importers dict) into a quiet data result.

Consider moving the lookup outside the try, or using an explicit guard:

This way a missing key surfaces immediately at the call site rather than appearing as an unavailability signal.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The three cases (available, empty, erroring) are asserted in one monolithic test. If only the error case regresses, pytest reports the whole assertion as failed and you have to dig into the diff to see which source broke.

Splitting into three focused tests gives cleaner signal:

Each test then fails with a self-describing name, and the fixture setup is minimal per case.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Verdict: Comment — 0 critical, 0 warnings, 2 suggestions

Full context read (both changed files + callers in build_dataset_from_external), security threat scan applied, all 140 tests passing, ruff clean. The feature works correctly and is well-scoped. Two suggestions attached as inline comments; see summary below.


✅ Looks Good

  • Correctness: describe_source_availability correctly separates extraction errors from empty-but-healthy sources at the output layer; the except Exception wrapper is appropriate here since any importer failure (KeyError, IOError, runtime) should surface as unavailable rather than crashing the dry-run.
  • Security: No hardcoded credentials, no path traversal, no injection surfaces introduced. The security scan hit on f-strings in console.print lines are confirmed false positives.
  • Testing: TDD discipline demonstrated (RED → GREEN noted in test plan). Three input classes exercise the three meaningful branches.
  • CI: 140 passed, 11 warnings (all pre-existing DSPy deprecation warnings); ruff all-clear.
  • Scope discipline: PR does exactly what it says on the tin — no scope creep into schema or serialization.

💡 Suggestions

  • tests/core/test_external_importer_availability.py:27 — Consider splitting the single mega-test into three focused functions (test_available_source, test_empty_source, test_erroring_source). Pytest failure output will name the exact failing scenario rather than reporting the whole bundle as broken.
  • evolution/core/external_importers.py:736 — The "available": candidate_count > 0 expression conflates reachable with non-empty. A freshly-installed tool with zero history will display as unavailable even though nothing is wrong. Consider using a separate boolean (e.g. "reachable": True) to distinguish "source found, just empty" from "source errored", or at minimum add a note to the docstring.

Reviewed by Hermes Agent — full file context read, security threat scan applied

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: This single test function exercises three distinct scenarios (available, empty, erroring) in one block. Consider splitting into three named tests — test_available_source_is_marked_available, test_empty_source_is_marked_unavailable, test_erroring_source_captures_error_string — so that pytest failure output immediately names the broken case. The current structure makes a future regression harder to triage at a glance.

try:
messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: candidate_count > 0 conflates reachable with non-empty. A source that is fully installed but has an empty history will report as available: false even though the tool is working correctly — which may mislead users into thinking their installation is broken. Consider tracking reachability separately, e.g. "reachable": True, "available": candidate_count > 0, or update the docstring to clarify that available means "has at least one candidate message, not just that the source is installed and readable".

@jarrettj
Copy link
Copy Markdown

Review Summary — feat: report session source availability

Verdict: COMMENT (no blocking issues; a few items worth addressing before or alongside the next PR in this series)

What the PR does

Partially implements issue #54 step 1. Extracts a new pure function describe_source_availability(sources, importers) that probes each configured import source and returns a per-source dict of {available, candidate_count, error}. The --dry-run CLI path is updated to use this function, producing output like:

claude-code: available, candidates=12
copilot: unavailable, candidates=0
hermes: unavailable, candidates=0 error=<reason>

One new test file covers the three distinct states (available, empty, erroring).

Test results

  • 140 tests passed, 11 warnings (all DSPy deprecation, pre-existing)
  • Ruff: no issues
  • No secrets or injection surfaces in the diff

Key findings

Severity Area Finding
Minor Performance describe_source_availability calls extract_messages() with no limit, triggering a full read of potentially 100 MB+ Copilot history on every dry run. A limit=1 probe is sufficient to confirm availability.
Minor Code quality str(exc) in the except block loses the exception type. repr(exc) preserves it for easier triage.
Minor UX Multi-word error strings in the error=<reason> output are unquoted and can look ambiguous — consider error="<reason>".
Minor Testing Single monolithic test function — splitting per state gives clearer failure messages. Missing test for source-not-in-importers (KeyError path).
Nit Docs Docstring says "without LLM calls" but omits that it does a full session-data read — not a cheap existence check.

Inline suggestions

Two inline suggestions were posted on the review:

  1. external_importers.py line ~741 (except clause) — use repr(exc) and consider limit=1 probe instead of full read
  2. tests/core/test_external_importer_availability.py line ~27 (test function) — split into focused per-state tests; add missing-key test

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment — 0 critical, 0 warnings, 2 suggestions

✅ Looks Good

  • Correctness: describe_source_availability correctly separates extraction errors from empty-but-healthy sources; except Exception is appropriate — any importer failure should surface as unavailable rather than crashing the dry-run.
  • Security: No hardcoded credentials, path traversal, or injection surfaces. Security scan f-string hits on console.print lines are confirmed false positives.
  • Testing: TDD discipline (RED → GREEN in test plan). Three stub importers cover all meaningful branches. 140 tests pass, 11 pre-existing DSPy deprecation warnings only.
  • Linting: ruff all-clear.
  • Scope: PR does exactly what the description says — no scope creep into schema or serialization changes.

💡 Suggestions

  • tests/core/test_external_importer_availability.py:27 — Split the single mega-test into three focused functions (test_available_source_is_marked_available, test_empty_source_is_marked_unavailable, test_erroring_source_captures_error_string). Pytest failure output will name the exact failing case rather than the whole bundle.
  • evolution/core/external_importers.py:736"available": candidate_count > 0 conflates reachable with non-empty. A freshly-installed source with zero history shows as unavailable even though nothing is broken. Consider a separate "reachable" boolean, or clarify in the docstring that available means "has at least one candidate message."

Reviewed by Hermes Agent — full file context read, security threat scan applied

@jarrettj
Copy link
Copy Markdown

Code Review Summary — PR #66

Verdict: COMMENT (no blocking issues — 2 suggestions only)

This is a clean, well-scoped PR. describe_source_availability is the right abstraction: it turns a silent zero-count display into an explicit availability boundary, the function is pure and testable, and the dry-run output is now meaningfully diagnostic. Test coverage is solid with the three fixture classes covering all three branches.


💡 Suggestions

  • evolution/core/external_importers.py:734importers[source] is inside the try block, so a KeyError from an unknown source key is swallowed and reported as available: False, error="source_name" rather than raising. Move the lookup outside the try (or add an explicit if source not in importers guard) so programming errors surface immediately rather than appearing as availability signals.

  • tests/core/test_external_importer_availability.py:27 — All three cases (available, empty, erroring) live in one monolithic assertion. A regression in just the error case makes the whole test fail with an opaque name. Splitting into three focused test functions (test_available_source_reports_candidate_count, test_empty_source_reports_unavailable_with_zero_count, test_erroring_source_reports_unavailable_with_error_message) gives self-describing failure signals and minimal per-case fixture setup.


✅ Looks Good

  • Clear, documented function signature with proper type annotations
  • except Exception is appropriate here (not BaseException; KeyboardInterrupt isn't caught)
  • Dry-run caller correctly builds the importers dict before passing to the function — no coupling leak
  • error=<reason> only appended to output when error is non-None — no noisy error=None in clean output
  • All 140 tests passing per PR description

Reviewed by Hermes Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Verdict: Comment — 0 critical, 0 warnings, 2 suggestions

This is a clean, well-scoped slice. The new describe_source_availability function is easy to follow, the dry-run path no longer silently zeroes erroring sources, and the test directly maps the three observable states. Two suggestions below worth considering before the follow-on PR builds on this interface.


Reviewed by Hermes Agent — full file context read, security threat scan applied

candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
"candidate_count": candidate_count,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: "available": candidate_count > 0 conflates accessibility with non-emptiness. A source that responds without error but has zero history entries (e.g. a brand-new install) will show as unavailable, which is misleading — the importer is fully functional. Consider splitting the flag:

"accessible": True,
"available": candidate_count > 0,

Or at minimum add a note in the docstring that available=False without an error means "reachable but empty," so callers building on this in the next PR do not misread it as a connection failure.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The three scenarios (available, empty, erroring) are bundled into one omnibus assertion. When one case regresses, the failure message will not point directly at the broken scenario. Three focused tests would read more clearly and produce more actionable output:

def test_available_source_reports_available_and_count():
    ...

def test_empty_source_reports_unavailable_with_zero_count():
    ...

def test_erroring_source_reports_unavailable_with_error_string():
    ...

The shared stubs (AvailableImporter, EmptyImporter, ErrorImporter) can stay as module-level fixtures — nothing needs to change structurally, just split the single assert report == {...} into three.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Verdict: Comment — 0 critical, 0 warnings, 2 suggestions

Clean, well-scoped PR. is a tidy boundary function and the test covers all three importer states. Full file context read and security threat scan applied — no findings. Two non-blocking suggestions inline.


Reviewed by Hermes Agent — full file context read, security threat scan applied

candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
"candidate_count": candidate_count,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: available is set to candidate_count > 0, which conflates "empty but healthy" with "broken/unavailable." A source that is correctly configured but returns no messages will show unavailable in dry-run output, which could mislead users into thinking the integration itself is broken.

Consider keeping available: True whenever extract_messages() returns without raising — even for an empty list — and letting the candidate count speak for itself:

report[source] = {
    "available": True,
    "candidate_count": candidate_count,
    "error": None,
}

The CLI output could then read available, candidates=0 vs unavailable (error=...), making the availability boundary clearer.



def test_describe_source_availability_reports_available_empty_and_errors():
report = describe_source_availability(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The single test function covers three independent scenarios (available, empty, erroring) in one assertion block. If the available assertion fails, pytest will stop there and swallow the empty and error checks, making failure diagnosis harder.

Consider splitting into three focused functions:

def test_available_source_reports_candidates():
    ...

def test_empty_source_reports_zero_candidates():
    ...

def test_erroring_source_reports_unavailable_with_message():
    ...

This also makes each test name self-documenting in pytest output.

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment — 0 critical, 0 warnings, 2 suggestions

✅ Looks Good

  • Correctness: describe_source_availability correctly separates the pre-scan from the display loop, eliminating the previous double-call to extract_messages() (called once in the loop before, now batched).
  • Security: No hardcoded secrets, no injection surfaces, no auth bypasses. Error messages from exceptions are only surfaced to the local CLI, not persisted or transmitted.
  • Scope: Well-contained slice — no dataset serialization touched, consistent with the PR's stated non-goals.
  • Tests: All three importer states (available, empty, erroring) are covered.

💡 Suggestions

  • evolution/core/external_importers.py:737available: candidate_count > 0 conflates "empty but healthy" with "broken." An empty-but-configured source shows unavailable, potentially misleading users. Consider setting available: True for any successful (no-exception) call and letting candidates=0 carry the emptiness signal.
  • tests/core/test_external_importer_availability.py:27 — The single monolithic test function tests three scenarios in one assertion block. Splitting into three focused functions improves failure isolation and makes pytest output self-documenting.

Reviewed by Hermes Agent — full file context read, security threat scan applied

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review — PR #66

Found 0 critical, 0 blocking warnings, 2 suggestions. See inline comments.

All tests pass (1/1). Ruff clean. This is a clean, focused addition — the two notes below are observations worth considering before a follow-up or merge, not blockers.

report: dict[str, dict[str, Any]] = {}
for source in sources:
try:
messages = importers[source].extract_messages()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: extract_messages() is called here without a limit argument, so a dry-run eagerly reads the full session history for every source. The CopilotImporter docstring warns that session files can be 100 MB+, meaning --dry-run may be slower than a real run for large histories. Consider passing a small sentinel limit (e.g. limit=1) to probe availability cheaply — the candidate count from len(messages) would then be capped, but the available boolean would still be accurate.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The test suite covers available, empty, and error importers well. One edge case worth adding: a source key that is not present in the importers dict. Currently importers[source] raises KeyError, which is caught by except Exception and stored as error="'missing_source'" — technically correct, but the quoted-repr string may surprise callers who check info['error'] for diagnostics. A test documenting (or asserting on) this behaviour would lock in the contract and make it obvious to future readers that the function treats missing keys the same as runtime errors from extract_messages().

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment — 0 critical, 0 blocking warnings, 2 suggestions

✅ Looks Good

  • Clean extraction-boundary separationdescribe_source_availability is a well-scoped pure function; no LLM calls, no side effects, easy to test in isolation.
  • Error handling is correctexcept Exception as exc wraps extract_messages() per source so one failing importer never aborts the whole dry-run.
  • Test coverage is solid for the happy pathavailable, empty, and error importer variants are all exercised in one readable assertion block.
  • str(exc) in user output is appropriate — exception messages are shown to the CLI user (the person running the dry run), not logged to a remote service; no secret-leakage risk.
  • Ruff clean, 1/1 tests pass. All 11 warnings are pre-existing DSPy deprecation notices unrelated to this PR.

💡 Suggestions

  • external_importers.py:734extract_messages() is called without a limit, so --dry-run triggers a full history read. Copilot sessions can be 100 MB+; a probe limit (e.g. limit=1) would keep dry-runs fast while still producing an accurate available boolean.
  • test_external_importer_availability.py:27 — Missing test for a source key absent from the importers dict. A KeyError is currently caught and stored as error="'missing_source'" (quoted repr), which may surprise callers inspecting info['error']. A test pinning this behaviour would make the contract explicit.

Reviewed by Hermes Agent — full file context read, security threat scan applied. PR is closed; comments are informational.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review — PR #66

Verdict: Comment — 0 critical, 1 warning, 2 suggestions.

Tests pass (1 passed) and ruff reports clean on both changed files. The core addition is solid: the function correctly isolates the availability check from the dry-run display, handles importer errors gracefully, and ships with a targeted test. Two observations worth addressing before merge are noted as inline comments below.

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: available conflates reachable with has candidates.

A source that connects successfully but returns an empty list is marked available: False, even though the importer is operational. This can mask a working-but-idle source and mislead callers that filter on this flag.

Consider either:

  • Rename the field to has_candidates to make the semantics explicit, or
  • Set available: True whenever extract_messages() succeeds (regardless of count) and let candidate_count == 0 carry the empty-source signal.
# Option B — available means reachable
report[source] = {
    "available": True,
    "candidate_count": candidate_count,
    "error": None,
}

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Consider splitting this into three focused tests.

The single function asserts all three cases (available, empty, error) in one block. If the assertion fails, pytest will report which key mismatched but not which scenario broke it — you'd need to read the full diff to determine whether the available, empty, or error path regressed.

Splitting gives self-documenting failure names:

def test_available_source_reports_available_with_count(): ...
def test_empty_source_reports_unavailable_with_zero_count(): ...
def test_erroring_source_reports_unavailable_with_error_message(): ...

This is a style preference rather than a blocking issue — the current test does exercise all three paths correctly.

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment — 0 critical, 0 warnings, 2 suggestions

✅ Looks Good

  • Correct behaviour: describe_source_availability correctly isolates the availability check from the dry-run display. The three source states (available, empty, erroring) are all handled.
  • Error isolation: wrapping extract_messages() in a try/except means one broken importer doesn't abort the whole dry-run.
  • Test coverage: the new test exercises all three cases against the real function, and ships in the same PR as the feature — good discipline.
  • Linting: ruff check reports clean on both changed files; 1 passed on the new test suite.

💡 Suggestions

  • external_importers.py:737available: candidate_count > 0 conflates reachable with has candidates. A working-but-empty source is reported as unavailable, which is semantically misleading. Consider renaming the field to has_candidates, or setting available: True whenever extract_messages() succeeds regardless of count.
  • test_external_importer_availability.py:27 — The single test asserts all three scenarios in one block. Splitting into test_available_source_*, test_empty_source_*, and test_erroring_source_* makes failures self-documenting without requiring readers to diagnose which scenario regressed.

Reviewed by Hermes Agent — full file context read, security threat scan applied, tests executed (1 passed), ruff clean

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review — PR #66

Verdict: Comment (not blocking merge) — 0 critical, 2 warnings, 3 suggestions.

The overall approach is clean and well-tested. Two warnings are worth the author's attention before this lands in a long-running environment. See inline comments for detail.

See the full summary comment for the complete checklist breakdown.

report: dict[str, dict[str, Any]] = {}
for source in sources:
try:
messages = importers[source].extract_messages()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Warning: Full I/O scan on every dry run

extract_messages() is called with no limit, triggering a complete scan of all session files (Copilot session stores can reach 100 MB+). Since this function only needs to determine availability, a full scan is unnecessarily expensive.

Suggested fix:

# Pass limit=1 — a single message is sufficient to confirm the source is reachable
messages = importers[source].extract_messages(limit=1)
# Note: candidate_count will then reflect only 0 or 1; document this if an exact count is needed

If an exact candidate count is intentional (for reporting purposes rather than just availability), add a comment explaining the trade-off and consider making it configurable via a full_count: bool = False parameter.

try:
messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Warning: available conflates "inaccessible" with "no messages"

Setting available = candidate_count > 0 means a reachable source with zero messages (e.g. a brand-new Copilot install with no chat history yet) is reported as unavailable. That is misleading — the source exists and is healthy, it just has no candidates.

Suggested fix: Separate reachability from count:

"reachable": True,          # set here; set to False in the except block
"candidate_count": candidate_count,

In the except block:

report[source] = {"reachable": False, "candidate_count": 0, "error": str(exc)}

This way callers can distinguish "could not open the source" (reachable=False) from "source is healthy but empty" (reachable=True, candidate_count=0). The test stubs and assertions would need updating to match.

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: available is currently candidate_count > 0, which conflates "source is reachable" with "source has messages". An empty-but-healthy source (extraction succeeded, returned []) gets reported as unavailable, which may mislead operators into thinking the integration is broken.

Consider separating the two signals:

"available": True,          # extraction succeeded without raising
"candidate_count": candidate_count,

and reserving available: False only for the except branch. The calling code can then distinguish "nothing to import today" from "source is broken".

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: This single test function validates three independent behaviours (available, empty, erroring) in one assertion block. When any of the three cases regresses, the failure message just says the whole dict does not match, making it harder to pinpoint which scenario broke.

Consider splitting into focused tests:

def test_available_source_is_reported_correctly(): ...
def test_empty_source_is_marked_unavailable(): ...
def test_erroring_source_captures_exception_message(): ...

Each test becomes its own failure point with a self-documenting name.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review — PR #66

Verdict: Comment (no blocking issues, 2 suggestions)

Overall this is clean, well-scoped work. The new describe_source_availability function correctly isolates the extraction boundary from the CLI output loop, the error path is handled gracefully, and the test covers all three branches. Two non-blocking suggestions below.

See inline comments for details.


Reviewed by Hermes Agent

report: dict[str, dict[str, Any]] = {}
for source in sources:
try:
messages = importers[source].extract_messages()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: importers[source] will raise KeyError if an unknown source name is passed. The surrounding except Exception block does catch it, but str(KeyError("claude-code")) produces the opaque string "'claude-code'" in the error field — no context that this was a missing-key problem.

Consider guarding explicitly for a clearer signal:

if source not in importers:
    report[source] = {
        "available": False,
        "candidate_count": 0,
        "error": f"unknown source '{source}'",
    }
    continue

Low priority since callers are currently the controlled CLI, but worth hardening before this becomes a public API surface.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The three scenarios (available, empty, erroring) are bundled into one assertion. If the available sub-case regresses, pytest reports the entire test as failed with a diff across all three keys — it's hard to tell at a glance which branch broke.

Consider splitting into focused tests:

def test_available_source_reports_available_and_count(): ...
def test_empty_source_reports_unavailable_zero_count(): ...
def test_erroring_source_reports_unavailable_with_error_string(): ...

Each test name then acts as living documentation for the contract, and failures pinpoint the broken case instantly.

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment — no blocking issues · 2 suggestions

🔴 Critical

None

⚠️ Warnings

None

💡 Suggestions

  • evolution/core/external_importers.py:734importers[source] lookup on an unknown source produces an opaque KeyError string in the error field. The except Exception block catches it, but the message "'claude-code'" gives no context. A pre-check with a clear "unknown source 'X'" message would improve debuggability, especially before this becomes a wider API surface.
  • tests/core/test_external_importer_availability.py:27 — Three independent scenarios (available, empty, erroring) are asserted in one monolithic test. Splitting into three focused tests would give instant failure diagnosis and make each test name serve as living documentation of the contract.

✅ Looks Good

  • Clean separation of concerns: describe_source_availability is a pure function that owns the extraction boundary — the CLI loop just formats its output.
  • Error handling is correct and consistent: all three paths (messages returned, empty list, exception) produce a well-typed dict with the same keys.
  • TDD discipline evident from the PR description (RED first, then GREEN).
  • Test coverage is thorough for the happy path and both failure modes.
  • No secrets, debug statements, or merge conflict markers found.

Reviewed by Hermes Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #66: feat: report session source availability

Verdict: Comment — solid, focused change. The new describe_source_availability function is clean, well-typed, and the test coverage is good. Two non-blocking suggestions inline; nothing preventing merge.

💡 Suggestions

  • evolution/core/external_importers.py:737available semantics conflate "reachable" with "has candidates" (see inline)
  • tests/core/test_external_importer_availability.py:27 — monolithic assertion + missing KeyError edge case (see inline)

✅ Looks Good

  • Clean separation: describe_source_availability is pure and independently testable
  • Error path swallows exceptions correctly and surfaces them in the report
  • Dry-run output is now meaningfully explicit (available/unavailable + candidate count + optional error)
  • TDD approach documented in the PR description — RED first, then green
  • Type hints are thorough (dict[str, dict[str, Any]])
  • importers is called only once per dry run; results are reused in the display loop (no double extraction)

Reviewed by Hermes Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Verdict: Comment — nice, focused slice. The function is correct, the test covers the three cases, and the dry-run output is now clearly bounded. Two suggestions below, nothing blocking.

💡 Suggestions

  • external_importers.py:737"available": candidate_count > 0 conflates source is reachable with source has messages. An importer that works perfectly but currently has zero candidates will print unavailable in the dry-run output, which is misleading. Consider splitting into reachable: True + has_candidates: candidate_count > 0, or at minimum renaming the key to has_candidates to match its actual semantics.
  • test_external_importer_availability.py:27 — The single combined test packs all three assertions into one function. If the empty case ever breaks, the failure message is buried. Consider splitting into test_available_source, test_empty_source, and test_erroring_source so each failure is self-describing.

✅ Looks Good

  • Clean exception boundary — except Exception as exc is appropriate here since we don't own the importer implementations and any error should be reported, not re-raised.
  • Good docstring: "without LLM calls" is exactly the right signal for a dry-run helper.
  • The eagerly-evaluated availability dict in main is a nice simplification over the previous inline loop.

Reviewed by Hermes Agent

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: "available": candidate_count > 0 conflates reachable with has messages. A perfectly functional importer with an empty history will show as unavailable in the dry-run output, which is misleading. Consider renaming to has_candidates or splitting into two keys (reachable: True, has_candidates: candidate_count > 0) so callers can distinguish "broken source" from "source with no data yet".

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The three scenarios (available, empty, erroring) are all asserted in a single test function. If the empty case regresses, the failure message is hard to triage at a glance. Consider splitting into test_available_source, test_empty_source, and test_erroring_source — each with its own assert block — so pytest output is immediately self-describing on failure.

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: "available": candidate_count > 0 conflates two distinct concepts — the source is reachable vs the source has messages right now. A source that connects successfully but returns 0 messages will be printed as unavailable, which is misleading: the importer is available, it just has no candidates today.

Consider splitting the semantics:

report[source] = {
    "reachable": True,        # the importer ran without error
    "has_candidates": candidate_count > 0,
    "candidate_count": candidate_count,
    "error": None,
}

And update the display logic to show reachable (not has_candidates) as the availability status.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Two improvements worth considering here:

  1. Split into focused tests — packing available + empty + error into one assert report == {...} block means a single failure shows the whole dict mismatch instead of pointing directly at the broken case. Three small tests are easier to diagnose:
def test_available_source_reports_reachable(): ...
def test_empty_source_is_not_available(): ...
def test_erroring_source_captures_message(): ...
  1. Add a KeyError edge-case test — if a caller passes a source name that is absent from the importers dict, the broad except Exception in describe_source_availability will silently catch the KeyError and mark the source as unavailable with error="some_key". That may be intentional, but it should be an explicit, tested behaviour rather than an undocumented side effect.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review — PR #66

Clean, well-scoped implementation. No critical or blocking issues. Two suggestions below (see inline comments).

Verdict: COMMENT — nothing preventing merge, but worth considering before the next follow-up PR.

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: "available": candidate_count > 0 conflates two distinct states — a source that is reachable but has no messages vs one that errors. Downstream code (and humans reading the report) can't tell whether unavailable means "broken source" or "working source, no data yet." Consider using "reachable": True alongside candidate_count, and let callers decide what "available" means for their context. Alternatively, rename the key to has_candidates to make the semantics explicit.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: All three cases (available, empty, error) live in a single assertion block. When one assertion fails, pytest stops and you lose visibility into whether the other cases also broke. Splitting into test_available_source, test_empty_source, and test_erroring_source makes each failure independent and the test names self-documenting.

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: COMMENT — no blocking issues; two suggestions worth considering before the follow-up PRs.

✅ Looks Good

  • Clean, minimal scope — exactly what issue Implement all-agent session ingestion and promotion gates #54 step 1 called for
  • describe_source_availability is pure and side-effect-free; easy to test and reuse
  • Error handling correctly isolates per-source failures so one broken source doesn't abort the whole dry run
  • Test file covers the three meaningful cases (available, empty, erroring) with clear stub classes
  • No secrets, no SQL/XSS risk, no performance concerns for a CLI dry-run path

💡 Suggestions

  • external_importers.py:737"available": candidate_count > 0 conflates reachability with data presence. A working-but-empty source and a broken source both land as available: False, losing information callers may need. Consider "reachable": True / False + candidate_count, or rename the key to has_candidates.

  • test_external_importer_availability.py:27 — All three cases are asserted in one function. Splitting into test_available_source, test_empty_source, and test_erroring_source makes failures independent and the names self-documenting.


Reviewed by Hermes Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Verdict: Comment — implementation is solid and tests pass. Two non-blocking suggestions below.

  • ✅ Clean fail-closed design in describe_source_availability
  • ✅ Tests cover all three cases (available, empty, erroring)
  • ✅ No secrets, no debug code, no security concerns

report: dict[str, dict[str, Any]] = {}
for source in sources:
try:
messages = importers[source].extract_messages()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: extract_messages() is called here with no limit, loading the entire message history into memory just to count it. Since all three importers already support a limit parameter (extract_messages(limit: int = 0)), consider passing a small cap — e.g. extract_messages(limit=500) — to keep dry-run fast on large session stores (Copilot history files can be 100MB+).

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: "available": candidate_count > 0 conflates "source is accessible" with "source has messages." A properly configured but empty source will show available: false, which looks like a misconfiguration rather than a healthy-but-empty importer. Consider renaming this key to has_candidates (mirroring candidate_count) and reserving available to mean "importer did not raise" — making the three states unambiguous: accessible+has_candidates, accessible+empty, and error.

@jarrettj
Copy link
Copy Markdown

Code Review Summary — PR #66: feat: report session source availability

Verdict: Comment (2 non-blocking suggestions — nothing blocking merge)

✅ Looks Good

  • Fail-closed design is correct: an erroring importer is never silently treated as zero messages
  • describe_source_availability is properly isolated and independently testable
  • Tests cover all three states: available, empty, and erroring importer
  • No secrets, no debug code, no security concerns
  • CLI output format matches the PR description examples exactly

💡 Suggestions

  • evolution/core/external_importers.py:734extract_messages() called without a limit, loading the full history into memory just to count it. Consider extract_messages(limit=500) — all three importers already support the limit parameter.

  • evolution/core/external_importers.py:737"available": candidate_count > 0 conflates "source is accessible" with "source has messages." An empty-but-healthy source reports available: false, which looks like a misconfiguration. Consider renaming to has_candidates, reserving available for "importer did not raise."


Reviewed by Hermes Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #66: feat: report session source availability

Verdict: Comment (2 non-blocking suggestions, nothing critical)

✅ Looks Good

  • Fail-closed design is correct: sources that raise are marked unavailable, not silently counted as zero
  • str(exc) is safe for all standard exceptions and will not swallow the root cause
  • New function is pure (no side-effects, no I/O of its own) — easy to unit-test in isolation
  • All three code paths (available, empty, error) are covered by tests
  • No secrets, no debug artifacts, no merge markers

💡 Suggestions (2)

See inline comments for details.


Reviewed by Hermes Agent

report: dict[str, dict[str, Any]] = {}
for source in sources:
try:
messages = importers[source].extract_messages()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: extract_messages() is called with no limit here, so even a --dry-run will fully materialise the entire message list before counting it. For large Copilot session stores this can be slow and memory-heavy. Consider passing a sentinel limit (e.g. extract_messages(limit=500)) and summing counts in batches, or adding a dedicated count_messages() classmethod that streams rather than collects.

"candidate_count": 0,
"error": "history unreadable",
},
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: All three scenarios (available, empty, error) are asserted inside a single test function. If the empty assertion fails, pytest marks the whole function as failed without running the error assertions, making triage harder. Consider splitting into three focused tests — test_available_source, test_empty_source, test_erroring_source — so each failure message pinpoints exactly which scenario broke.

@jarrettj
Copy link
Copy Markdown

Code Review Summary — PR #66

Verdict: Comment — 2 non-blocking suggestions, nothing critical or blocking.

💡 Suggestions

  • evolution/core/external_importers.py:734extract_messages() called without a limit inside describe_source_availability. For large session stores (e.g. Copilot), this fully materialises the entire message list just to count it, making even --dry-run slow and memory-heavy. Consider extract_messages(limit=500) or a dedicated count_messages() method.

  • tests/core/test_external_importer_availability.py:27 — All three scenarios (available, empty, error) are asserted in a single test function. A failure in one assertion suppresses the others, making triage harder. Suggest splitting into test_available_source, test_empty_source, test_erroring_source.

✅ Looks Good

  • Fail-closed design is correct — raising importers are marked unavailable, not silently zero
  • str(exc) is safe and preserves the root cause in the report
  • Function is pure (no I/O of its own) — straightforward to unit-test
  • All three code paths are covered by the new test
  • No secrets, debug artifacts, or merge conflict markers

Reviewed by Hermes Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary — PR #66: feat: report session source availability

Verdict: Comment (2 suggestions, nothing blocking)

💡 Suggestions

  • evolution/core/external_importers.py:737"available": candidate_count > 0 conflates reachability with occupancy. A healthy-but-empty source (e.g. a fresh Claude Code install) is reported as unavailable, which could mislead operators into thinking the importer is broken. Consider two distinct fields — reachable: True (set in the success branch, False in the except) and keep candidate_count — or rename the existing field to has_candidates to be accurate about what is being tested.

  • tests/core/test_external_importer_availability.py:27 — All three scenarios (available, empty, error) are asserted in a single test_* function. When any one case fails, pytest prints the whole dict diff which mixes all three cases together, making it harder to isolate the failure. Splitting into three focused functions (test_available_source, test_empty_source, test_erroring_source) would give one clear failure message per scenario.

✅ Looks Good

  • Fail-closed design: exceptions surface as available: False with error message rather than silently swallowing them
  • Clean separation — describe_source_availability is a pure function with no side effects; easy to test and reuse
  • The CLI integration correctly avoids calling extract_messages() twice (once for availability, once for the loop) — single call via the pre-computed availability dict
  • Good docstring: "without LLM calls" is exactly the right signal for a dry-run helper
  • No secrets, no debug prints, no conflict markers

Reviewed by Hermes Agent

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: "available": candidate_count > 0 conflates reachability with occupancy. A source that works fine but currently has zero messages (e.g. a fresh Claude Code install) will be reported as unavailable, which could mislead an operator into thinking the importer is broken rather than just empty.

Consider separating the two signals:

report[source] = {
    "reachable": True,          # source connected successfully
    "candidate_count": candidate_count,
    "error": None,
}

Or, at minimum, rename the field to has_candidates so the meaning is unambiguous.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: All three scenarios (available, empty, error) are collapsed into one assertion block. When any case fails, pytest prints the entire dict diff — mixing all three sources — which makes it harder to pinpoint which scenario regressed.

Split into three focused tests:

def test_available_source_reports_candidate_count(): ...
def test_empty_source_reports_unavailable(): ...
def test_erroring_source_reports_error_message(): ...

Each test then fails with a single, unambiguous message.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Two non-blocking suggestions — nothing that should prevent merge. See inline comments.

report: dict[str, dict[str, Any]] = {}
for source in sources:
try:
messages = importers[source].extract_messages()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: extract_messages() is called here with no limit, which loads the entire message corpus just to produce a count. Since all three real importers define extract_messages(limit: int = 0), a dry-run existence check only needs limit=1 (or a small sentinel like limit=100). For users with large Copilot histories this can make --dry-run noticeably slow.

# before
messages = importers[source].extract_messages()

# after — cheap existence check for dry-run
messages = importers[source].extract_messages(limit=100)

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: All three behaviours — available, empty, and error — are asserted in one function. When this test fails, pytest will report the whole block as failed without identifying which case broke. Splitting into three focused functions (test_available_source, test_empty_source, test_error_source) would make failures self-describing and make it easier to add future cases independently.

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment (0 critical, 0 warnings, 2 suggestions)

💡 Suggestions

  • evolution/core/external_importers.py:734extract_messages() is called without a limit during dry-run, loading the full corpus just to count candidates. All importers accept limit: int = 0; passing limit=100 (or limit=1 for a pure existence check) would make dry-runs faster on large histories.

  • tests/core/test_external_importer_availability.py:27 — The three behaviours (available, empty, error) are packed into one assertion. Splitting into three focused test functions would give self-describing failures and make the suite easier to extend.

✅ Looks Good

  • Fail-open → fail-explicit boundary is well-designed; available: False with explicit error= is the right shape for a dry-run report
  • except Exception is appropriate here — importer errors are inherently unpredictable and should never crash the CLI
  • The describe_source_availability function is clean and pure (no side-effects, easily testable in isolation)
  • Test coverage hits all three meaningful branches; RED-first TDD workflow noted in the PR description
  • No secrets, no debug statements, no merge conflict markers

Reviewed by Hermes Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review — PR #66

Good, focused PR. The availability boundary is clear, dry-run output is explicit, and the test covers all three paths. Two non-blocking suggestions below — nothing prevents merge.

💡 Suggestions

  • external_importers.py:734extract_messages() loads the full message list just to count it; can be slow for large Copilot stores on dry-run.
  • test_external_importer_availability.py:27 — All three scenarios in one function; splitting makes failures self-documenting.

✅ Looks Good

  • Empty result sets correctly treated as unavailable (not silently "0 messages")
  • error: None sentinel is explicit and consistent
  • Bare except Exception is acceptable here — the goal is to surface any import failure as a string, not to swallow it

Reviewed by Hermes Agent

report: dict[str, dict[str, Any]] = {}
for source in sources:
try:
messages = importers[source].extract_messages()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: extract_messages() is called with no limit, loading the full list into memory just to len() it. For large Copilot history files this makes even --dry-run noticeably slow. Consider capping the load:

messages = importers[source].extract_messages(limit=500)

Or expose a lightweight count_messages() on the importer protocol so dry-run probing never materialises the full dataset.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: All three scenarios (available, empty, erroring) are asserted inside one function. If the available sub-assertion fails, pytest names the whole block, making it hard to tell at a glance which source broke. Consider splitting:

def test_available_source_reports_candidate_count(): ...
def test_empty_source_is_unavailable(): ...
def test_error_source_captures_error_string(): ...

This also keeps each test focused and makes adding new edge cases (e.g. importer raises OSError) straightforward.

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review

Solid PR — the dry-run availability boundary is a clear improvement over silent zero-counts. Two non-blocking suggestions below.

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The flag conflates empty with unreachable. A source that returns 0 messages is reachable and healthy — it just has no candidates right now. Setting means the CLI will print for a working-but-empty source, which could mislead operators into thinking there's a connectivity problem.

Consider renaming this field to , or keeping for the success path and deriving the display label in the CLI loop instead:

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Bundling all three scenarios (available, empty, error) into one monolithic assertion makes failure output noisy — pytest will diff the whole dict rather than naming the broken case. Consider splitting into three focused tests:

Not blocking, but it pays off immediately when one edge case regresses.

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment (0 blocking issues, 2 suggestions)

✅ Looks Good

  • Clean function decomposition — describe_source_availability is focused and pure (no side effects, testable in isolation)
  • Dry-run output is now explicit and unambiguous; the old silent zero-count behaviour was a real trap
  • Test fixtures (AvailableImporter, EmptyImporter, ErrorImporter) are minimal and well-named
  • Exception is caught at the right granularity for this use case

💡 Suggestions

  • evolution/core/external_importers.py:737available: candidate_count > 0 conflates empty with unreachable. A source that returns 0 messages is still reachable; printing unavailable for it could mislead operators. Consider renaming to has_candidates or keeping available: True for the success path and deriving the display label in the CLI.

  • tests/core/test_external_importer_availability.py:27 — All three scenarios are asserted in a single dict comparison. When one case regresses, pytest diffs the entire dict rather than naming the failing scenario. Splitting into three focused test functions would make failures self-documenting.


Reviewed by Hermes Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #66: Report session source availability

Verdict: Comment (no blockers; two suggestions worth considering before the next PR in this series builds on top)

💡 Suggestions

See inline comments for details on both. Nothing blocking — clean implementation overall.

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Marking a source as available: False when extract_messages() returns [] conflates two distinct states: "importer is working but has no messages" vs "importer is broken/unconfigured." A healthy-but-empty source would appear the same as a crashing one in the output, which could mislead diagnostics.

Consider reserving available: False for the except path only:

report[source] = {
    "available": True,   # importer ran without error
    "candidate_count": candidate_count,
    "error": None,
}

The dry-run output already prints candidates=0, so empty-but-working sources are still clearly distinguishable.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Bundling all three cases (available, empty, error) into a single assertion makes failures hard to isolate — if the error branch regresses, the diff output will show the full nested dict mismatch rather than pointing directly at the broken case.

Consider splitting into three focused tests:

def test_available_source_reports_candidate_count(): ...
def test_empty_source_reports_unavailable(): ...
def test_erroring_source_captures_exception_message(): ...

Each test stays short and its name communicates the exact behavior under test.

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment — no blockers, two suggestions worth considering

💡 Suggestions

  • evolution/core/external_importers.py:737available: False is set when extract_messages() returns [], conflating "no messages" with "importer broken." Consider available: True for the success path and reserve False for the except branch only.
  • tests/core/test_external_importer_availability.py:27 — All three cases (available, empty, error) are asserted in a single test. A regression in any one case produces a confusing full-dict mismatch. Consider three focused tests instead.

✅ Looks Good

  • Clean, minimal function signature — sources + importers dict is easy to call and test in isolation
  • Explicit error capture (str(exc)) surfaces actionable messages in dry-run output
  • TDD approach documented in the PR (RED first, then green) — good discipline
  • Test fixture classes are clear stubs with no unnecessary setup

Reviewed by Hermes Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #66: feat: report session source availability

Overall this is a clean, well-scoped change. The dry-run output is now meaningfully more informative and the test file follows a good RED-first pattern. Two minor suggestions below — nothing blocking.

Reviewed by Claude Agent

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: available: candidate_count > 0 conflates reachability with having messages. A source that successfully returns [] is healthy — it connected, read history, and found nothing — but it is reported as unavailable, which may mislead debugging.

Consider making available mean "the importer ran without error" and surfacing emptiness separately:

report[source] = {
    "available": True,
    "candidate_count": candidate_count,
    "empty": candidate_count == 0,
    "error": None,
}

This way the dry-run output can distinguish broken from empty, which is the boundary the PR description says it is trying to surface.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: Packing all three cases (available, empty, error) into one assertion block makes failure messages ambiguous — if the dict shape changes for just the error case, the whole test fails with a full-dict diff.

Consider splitting into three focused tests:

def test_available_source_reports_available_and_count():
    report = describe_source_availability(["s"], {"s": AvailableImporter})
    assert report["s"]["available"] is True
    assert report["s"]["candidate_count"] == 2

def test_empty_source_reports_unavailable():
    report = describe_source_availability(["s"], {"s": EmptyImporter})
    assert report["s"]["available"] is False
    assert report["s"]["error"] is None

def test_erroring_source_captures_message():
    report = describe_source_availability(["s"], {"s": ErrorImporter})
    assert report["s"]["available"] is False
    assert "unreadable" in report["s"]["error"]

Each test now has a single failure mode and a self-documenting name.

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment (2 suggestions, nothing blocking)

✅ Looks Good

  • describe_source_availability is cleanly extracted and well-named — easy to test in isolation
  • Error capture via bare except Exception is appropriate here (importers are untrusted third-party integrations)
  • PR description is honest about scope and non-goals — stays focused on the availability boundary
  • RED-first test discipline documented and followed

💡 Suggestions

  • evolution/core/external_importers.py:737available: candidate_count > 0 conflates reachability with emptiness. A source that successfully returns [] is healthy but gets labelled unavailable. Consider available: True on success, plus an empty: bool field for zero-message sources.

  • tests/core/test_external_importer_availability.py:27 — A single test covering three cases (available, empty, error) produces an ambiguous failure diff. Split into three focused tests so failure messages are self-documenting.


Reviewed by Claude Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #66: feat: report session source availability

Overall this is a clean, well-scoped addition. Two small suggestions below (nothing blocking).

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: "available": candidate_count > 0 conflates reachability with having content. An empty-but-working importer gets available: False, making it look the same as an erroring importer (modulo the error field). Consider setting "available": True in the success branch to mean "source responded without throwing", and let callers use candidate_count == 0 to detect emptiness separately.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: This single function exercises all three cases in one assertion block. If one case regresses, pytest output will only point at the function name rather than the specific case. Consider splitting into three focused tests (test_available_source, test_empty_source, test_erroring_source) for faster failure diagnosis.

@jarrettj
Copy link
Copy Markdown

Code Review Summary

Verdict: Comment (0 critical, 0 warnings, 2 suggestions)

✅ Looks Good

  • Clean, minimal scope — exactly what step 1 of Implement all-agent session ingestion and promotion gates #54 asks for, no over-reach
  • describe_source_availability is pure (no side-effects, no LLM calls), easy to test
  • Dry-run output is now clearly structured with available/unavailable, candidates=N, and optional error=…
  • Test file uses RED-first discipline and covers the three expected states
  • No secrets, no debug leftovers, no merge-conflict markers

💡 Suggestions

  • evolution/core/external_importers.py:737"available": candidate_count > 0 conflates reachability with having content. An empty-but-working source ends up available: False, indistinguishable from an erroring one at the boolean level. Consider "available": True in the success branch (source responded without raising) and let callers decide what to do with candidate_count == 0.
  • tests/core/test_external_importer_availability.py:27 — one function covers all three cases in a single assertion block. Splitting into test_available_source, test_empty_source, and test_erroring_source would give faster, more precise failure attribution from pytest output.

Reviewed by Hermes Agent

Copy link
Copy Markdown

@jarrettj jarrettj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hermes Agent Review — PR #66

Overall this is a clean, well-scoped addition. The dry-run reporting is now meaningfully richer and the test coverage matches the three distinct cases. Two non-blocking suggestions below.

messages = importers[source].extract_messages()
candidate_count = len(messages)
report[source] = {
"available": candidate_count > 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: available: candidate_count > 0 conflates reachability with having data. A source that connects fine but returns zero messages will show as unavailable in the CLI output, which could mislead debugging. Consider separating the two signals — e.g., track reachable: True (no exception raised) independently from candidate_count, and derive the available/unavailable label from reachability rather than count.

raise RuntimeError("history unreadable")


def test_describe_source_availability_reports_available_empty_and_errors():
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: The single omnibus test covers three cases in one assertion block. Splitting into test_available_source, test_empty_source, and test_erroring_source would make pytest output immediately pinpoint which case regresses, rather than showing the whole combined assertion failing. Nothing blocking — just easier to diagnose.

@jarrettj
Copy link
Copy Markdown

Code Review Summary — PR #66

Verdict: Comment (0 critical, 0 warnings, 2 suggestions — nothing blocking merge)

💡 Suggestions

  • evolution/core/external_importers.py:737available: candidate_count > 0 conflates reachability with having data. A source that connects but returns 0 messages is shown as unavailable, which could mislead debugging. Consider tracking reachable: True (no exception) separately from count, and deriving the CLI label from reachability.

  • tests/core/test_external_importer_availability.py:27 — Single omnibus assertion covers three cases at once. Splitting into test_available_source, test_empty_source, test_erroring_source would make failure output immediately pinpoint which case regressed.

✅ Looks Good

  • Clean, minimal function signature — sources + importers dict is all it needs
  • Exception boundary is correctly placed: catches any Exception from extract_messages() and surfaces the message as error
  • Dry-run CLI output is now meaningfully richer without touching the non-dry-run path
  • Test stubs are readable and cover the three distinct behaviors

Reviewed by Hermes Agent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants