feat: report session source availability#66
Conversation
|
Closing this split PR in favor of consolidated PR #67. Local integration found review/merge overhead across the stack (notably #61/#64 overlap in evolution/skills/evolve_skill.py), and #67 preserves the combined local test evidence: targeted stack tests 21 passed; full suite 160 passed; GitHub checks were absent on the split PRs. Review #67 instead. |
jarrettj
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Comment — Clean, well-scoped addition. Two non-blocking suggestions below.
💡 Suggestions
- evolution/core/external_importers.py:737 —
availableflag conflates reachability with data presence; see inline. - tests/core/test_external_importer_availability.py:27 — Three cases in one assertion; worth splitting for clearer pytest output.
✅ Looks Good
- Narrow, focused scope — does exactly what the PR description promises
- Bare-except is scoped to
Exception(notBaseException), which is appropriate here - Dry-run path now gives genuinely actionable output instead of a silent zero
- Test fixtures are minimal and self-explanatory
- RED-first test process documented in the PR description
Reviewed by Hermes Agent
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: The available flag conflates reachability with data presence. A source that succeeds but returns an empty list (e.g. the user simply has no history yet) gets marked available: False, which could mislead callers into thinking the source is broken.
Consider tracking reachability separately:
report[source] = {
"reachable": True, # True whenever no exception is raised
"has_candidates": candidate_count > 0,
"candidate_count": candidate_count,
"error": None,
}This way callers (and the dry-run display) can distinguish "the source works but is empty right now" from "the source errored".
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: Three distinct behaviours are asserted in a single test function. If only the error branch breaks, pytest still reports the whole function as failed with a large diff, making it harder to pinpoint the regression.
Consider splitting into focused tests:
def test_available_source_reports_count():
report = describe_source_availability(["a"], {"a": AvailableImporter})
assert report["a"]["available"] is True
assert report["a"]["candidate_count"] == 2
def test_empty_source_is_unavailable():
report = describe_source_availability(["e"], {"e": EmptyImporter})
assert report["e"]["available"] is False
assert report["e"]["error"] is None
def test_erroring_source_is_unavailable():
report = describe_source_availability(["x"], {"x": ErrorImporter})
assert report["x"]["available"] is False
assert "unreadable" in report["x"]["error"]The original single-function form is fine to keep as an integration-style smoke test; the split tests complement it rather than replace it.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Clean, focused slice — good scoping decision to keep this PR small. Two minor suggestions below (nothing blocking).
✅ Looks Good
- Well-isolated function with a clear single responsibility
- Dry-run output is now meaningfully differentiated between empty and erroring sources
- Test stubs cover all three branches (available, empty, error)
💡 Suggestions
- external_importers.py:737 —
available: candidate_count > 0conflates a reachable-but-empty source with a broken one - test_external_importer_availability.py:27 — one omnibus test covers three distinct scenarios; splitting would make failures self-documenting
Reviewed by Hermes Agent
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: available: candidate_count > 0 silently treats a reachable-but-empty source the same as a broken one. A source whose importer runs without raising is technically available — it just has no candidates yet. Consider splitting the flag into reachable: True + has_candidates: candidate_count > 0, or at minimum renaming to has_candidates so callers aren’t misled into thinking an empty-but-healthy source is down.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: This single test exercises three distinct scenarios (available, empty, error) in one assertion block. Splitting into test_available_source, test_empty_source, and test_erroring_source would make failures self-documenting — a broken ErrorImporter path wouldn’t show up as a generic AssertionError on a 53-line function; the test name alone would identify which case regressed.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Nice, clean implementation of the availability boundary — the dry-run output is much more actionable now. Two non-blocking observations inline; nothing here should block a future re-open/merge.
✅ Looks Good
- Good exception guard around
extract_messages()— errors surface cleanly aserror=...in output. - Tests cover all three state classes (available, empty, erroring) and read clearly.
- The scoping note in the PR description ("does not yet implement full canonical event/message schema") is exactly the right way to bound this slice.
💡 Suggestions (non-blocking)
See inline comments for two small suggestions.
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: available: candidate_count > 0 conflates two distinct states — a source that is reachable but has no messages yet (empty history) vs. one that is unreachable/erroring. If callers ever want to distinguish "can I reach this source?" from "does it have candidates?", you'll need a schema change later.
Consider a separate reachable flag:
report[source] = {
"reachable": True, # extraction succeeded
"available": candidate_count > 0,
"candidate_count": candidate_count,
"error": None,
}
# and in the except block: "reachable": FalseTotally fine to defer if the current two-value display is sufficient — just worth a note in the docstring if so.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: One mega-test covering all three cases means a single assertion failure doesn't immediately tell you which source class is broken. Consider splitting into three focused tests:
def test_available_source_is_reported_correctly():
report = describe_source_availability(["src"], {"src": AvailableImporter})
assert report["src"]["available"] is True
assert report["src"]["candidate_count"] == 2
assert report["src"]["error"] is None
def test_empty_source_is_reported_as_unavailable():
...
def test_erroring_source_is_reported_with_error_message():
...Clearer failure attribution in CI at zero extra cost.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review — PR #66
Verdict: Comment (no blocking issues — two targeted suggestions, one on output safety, one on test structure)
✅ What's solid
describe_source_availabilityis cleanly scoped: pure function, no side effects, no LLM calls, easy to unit-test.- Moving the availability probe before the
for src in sourcesrender loop is the right call — one pass to gather data, one pass to display it. except Exception as excis appropriate here (dry-run probe;BaseExceptionsubclasses likeKeyboardInterruptare not caught).- Good TDD discipline demonstrated in the test plan.
See inline comments for two suggestions.
| console.print(f" {src}: {len(msgs)} messages") | ||
| info = availability[src] | ||
| status = "available" if info["available"] else "unavailable" | ||
| error = f" error={info['error']}" if info["error"] else "" |
There was a problem hiding this comment.
💡 Suggestion: The error value is rendered unquoted, so multi-word or punctuated messages (e.g. FileNotFoundError: [Errno 2] No such file or directory) will be visually ambiguous — readers can't tell where the error string ends if it contains spaces or = signs.
Consider quoting the value for unambiguous, machine-parseable output:
error = f' error="{info["error"]}"' if info["error"] else ""This keeps the format consistent with key=value convention while making boundaries explicit.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: The single test bundles three distinct behaviours (available, empty, erroring importer). Splitting into focused tests improves failure isolation — when 'empty source' regresses, a dedicated test_empty_source_is_unavailable immediately surfaces the culprit rather than requiring you to read which assertion inside the monolith failed.
Also worth adding a fourth case: a source key that is present in sources but absent from importers. Right now that raises a bare KeyError which is swallowed by the broad except Exception and reported as error="'missing_key'" — probably fine, but the behaviour is implicit and untested.
def test_available_source():
report = describe_source_availability(["a"], {"a": AvailableImporter})
assert report["a"]["available"] is True
assert report["a"]["candidate_count"] == 2
assert report["a"]["error"] is None
def test_empty_source_is_unavailable():
report = describe_source_availability(["e"], {"e": EmptyImporter})
assert report["e"]["available"] is False
def test_erroring_source_captures_message():
report = describe_source_availability(["x"], {"x": ErrorImporter})
assert report["x"]["available"] is False
assert "history unreadable" in report["x"]["error"]
def test_source_missing_from_importers_is_reported_as_error():
report = describe_source_availability(["ghost"], {})
assert report["ghost"]["available"] is False
assert report["ghost"]["error"] is not None
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Verdict: COMMENT — 0 critical, 0 warnings, 2 suggestions
Clean, well-scoped addition. The security threat scan found no issues; the new function contains no secrets, no injection surfaces, no auth bypass, and no path traversal. The test suite (140 passing, 11 DSPy deprecation warnings) is unaffected. Two suggestions for the team's consideration — neither is a blocker.
| try: | ||
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { |
There was a problem hiding this comment.
💡 Suggestion: available: candidate_count > 0 conflates source reachability with data presence. A source whose directory exists but contains zero messages is reported as unavailable, which may mislead users who see copilot: unavailable, candidates=0 and assume Copilot is misconfigured when it is simply empty.
Consider one of:
- Rename the field to
has_candidatesto make the semantics explicit. - Split into two fields:
reachable: True/False(driven by whetherextract_messages()raised) andcandidate_count: int, dropping the derivedavailableboolean entirely.
Current output is correct for the described use-case; the naming is the concern.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: All three scenarios (available, empty, error) are bundled into one test function with a single composite assertion. If the empty case regresses, pytest reports the whole function as failed, and the diff covers all three keys — harder to pinpoint at a glance.
Consider splitting into three focused tests:
def test_available_source_reports_candidate_count(): ...
def test_empty_source_is_marked_not_available(): ...
def test_erroring_source_captures_exception_message(): ...Each test name becomes self-documenting failure output, and the assertions are smaller and faster to read in CI.
| def describe_source_availability( | ||
| sources: list[str], | ||
| importers: dict[str, type], | ||
| ) -> dict[str, dict[str, Any]]: |
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Verdict: COMMENT — 0 critical, 0 warnings, 2 suggestions
✅ Looks Good
- Clean separation of concerns:
describe_source_availabilityis a pure function with no side effects — easy to test and reuse. - Error handling: bare
except Exceptionis appropriate here since the goal is precisely to surface importer failures rather than propagate them. - Security: Full threat scan applied — no hardcoded secrets, no injection surfaces, no auth bypass, no path traversal. Security posture is unchanged.
- Test coverage: The new test exercises all three meaningful states (available, empty, erroring) and follows the project's RED-first TDD discipline.
- CI: 140 tests passing; 11 warnings are pre-existing DSPy deprecations unrelated to this PR.
💡 Suggestions
See inline comments for two non-blocking suggestions.
Reviewed by Hermes Agent — full file context read, security threat scan applied
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, | ||
| "candidate_count": candidate_count, |
There was a problem hiding this comment.
💡 Suggestion: "available": candidate_count > 0 conflates source reachability with data presence. A source whose directory exists but contains zero messages is reported as unavailable, which may mislead users who see copilot: unavailable, candidates=0 and assume Copilot is misconfigured when it is simply empty.
Consider one of:
- Rename the field to
has_candidatesto make the semantics explicit. - Split into two fields:
reachable: True/False(driven by whetherextract_messages()raised) andcandidate_count: int, dropping the derived boolean entirely.
Not a blocker — the current behaviour matches the PR description — but the naming may cause confusion in follow-on PRs that extend the availability report.
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): | ||
| report = describe_source_availability( |
There was a problem hiding this comment.
💡 Suggestion: All three scenarios (available, empty, erroring) are asserted inside a single test function. If the empty case regresses, pytest reports the whole function as failed and the diff covers all three keys, making it harder to pinpoint the breakage at a glance.
Consider splitting into three focused tests:
def test_available_source_reports_candidate_count(): ...
def test_empty_source_is_marked_not_available(): ...
def test_erroring_source_captures_exception_message(): ...Each test name becomes self-documenting failure output in CI, and the assertions are smaller and faster to read.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Verdict: Comment — 0 critical, 0 warnings, 2 suggestions.
Full test suite: 140 passed, 11 DSPy deprecation warnings only. Security scan: clean (no credentials, SQL, XSS, path traversal, auth bypass, or secret-to-log issues found).
See inline comments for suggestions.
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
Suggestion: The flag equates to "has at least one message", which means a correctly-installed source with an empty history (e.g. a fresh Claude Code install with no prompts yet) is reported as . This conflates "source is reachable / configured" with "source has candidate data". Consider separating the two concepts: a source that returns without raising should be , while a source that raises should be . The current semantics may mislead users into thinking their tool is not installed when it simply has no history yet.
| console.print(f" {src}: {len(msgs)} messages") | ||
| info = availability[src] | ||
| status = "available" if info["available"] else "unavailable" | ||
| error = f" error={info['error']}" if info["error"] else "" |
There was a problem hiding this comment.
Suggestion: When an error message contains spaces (e.g. ) the output line becomes , which is ambiguous — a reader cannot tell where the key ends and the value begins. Consider quoting the error value: or wrapping it in brackets (), so the output is unambiguous regardless of what the exception message contains.
Code Review SummaryVerdict: COMMENT — 0 critical, 0 warnings, 2 suggestions ✅ Looks Good
💡 Suggestions (non-blocking)
Reviewed by Hermes Agent — full file context read, security threat scan applied |
Code Review SummaryVerdict: Comment — 0 critical, 0 warnings, 2 suggestions Test Results
Security ScanClean — no hardcoded credentials, SQL injection, XSS, path traversal, auth bypass, or secrets-to-logs issues found. Suggestions
Looks Good
Reviewed by Hermes Agent — full file context read, security threat scan applied |
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Summary
This PR cleanly implements the first step of issue #54: making --source all --dry-run report per-source availability with explicit status, candidate count, and error reason instead of silently showing zero counts. The scope is well-contained, the separation of concerns is good (pure function extracted from CLI), and the test coverage covers all three meaningful states (available, empty, erroring).
Findings
No security issues found. Static scan of added lines found no hardcoded credentials, injection surfaces, or secrets leaking to output.
Correctness — Minor
describe_source_availabilitycallsimporters[source].extract_messages()without alimitargument. For sources with large histories (especially Copilot, which streams 100 MB+ files), this will read the entire dataset on every dry run — including when the user just wants a quick "is it reachable?" check. The--dry-runflag implies low cost; the full read removes that property.- If
sourceis not a key inimporters, the function raisesKeyErroruncaught. The CLI always passes a consistentimportersdict, so this is safe today, but the function's public signature offers no protection if called programmatically with a mismatched list.
Code Quality — Minor
- The bare
except Exception as excswallows the traceback.str(exc)is often just the exception message without context (e.g., forFileNotFoundErrorit omits the filename from the OS-level message in some Python versions). Storingrepr(exc)or at minimum the exception type name improves debuggability without changing the output contract. - The error format in the printed output (
error=<reason>) is unquoted, so a multi-word error string likeConnection refused: [Errno 111]produces ambiguous output. Consider quoting:error="<reason>".
Testing — Minor
- The single test function covers all three states but is a monolithic assertion. Splitting into three focused tests (
test_available_source,test_empty_source,test_erroring_source) would make failure messages more precise and make it easier to add parametrized cases later. - No test covers the
KeyErrorpath (source in list but not in importers dict), which is an implicit contract the caller must uphold.
Performance — Minor
- As noted above:
describe_source_availabilityperforms a fullextract_messages()call with no limit. Alimit=1probe would be sufficient to confirm availability and would be much cheaper for large Copilot/Hermes histories.
Documentation
- The docstring says "without LLM calls" but says nothing about the side-effect of fully reading source data. Worth noting that this is not a cheap existence check.
Overall
Clean, minimal, well-scoped PR. The main actionable item before the next PR (canonical event schema) is the full-read performance issue — it undermines the low-cost expectation of --dry-run. Everything else is polish.
| "error": None, | ||
| } | ||
| except Exception as exc: | ||
| report[source] = { |
There was a problem hiding this comment.
Suggestion: Consider replacing bare str(exc) with repr(exc) here to preserve the exception type in the stored error string. str(FileNotFoundError("foo")) gives just "foo", while repr() gives "FileNotFoundError(\"foo\")" — easier to triage without a traceback.
Also consider adding a limit=1 probe instead of a full extract_messages() call. A dry-run availability check does not need all messages — just one is enough to confirm the source is reachable and non-empty, avoiding a full 100 MB+ Copilot session read:
messages = importers[source].extract_messages(limit=1)
candidate_count = len(messages) # 0 or 1 — sufficient for availabilityIf the actual count matters for display, a separate (lazily deferred) full count could be added in a follow-up.
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): | ||
| report = describe_source_availability( |
There was a problem hiding this comment.
Suggestion: Consider splitting this monolithic assertion into three focused test functions — one per source state. This way, a failure in the "empty" case does not hide the result of "error" or "available":
def test_available_source_is_reported_correctly():
report = describe_source_availability(["src"], {"src": AvailableImporter})
assert report["src"] == {"available": True, "candidate_count": 2, "error": None}
def test_empty_source_is_reported_as_unavailable():
report = describe_source_availability(["src"], {"src": EmptyImporter})
assert report["src"] == {"available": False, "candidate_count": 0, "error": None}
def test_erroring_source_captures_exception_message():
report = describe_source_availability(["src"], {"src": ErrorImporter})
assert report["src"]["available"] is False
assert "history unreadable" in report["src"]["error"]Also worth adding a test for a source key present in sources but absent from importers to document (and enforce) the expected behavior — currently a KeyError is raised uncaught.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review — PR #66
Clean, well-scoped PR. The new describe_source_availability function is a clear improvement over the silent zero-count display — making availability explicit is exactly the right abstraction boundary. Two non-blocking suggestions below.
Verdict: COMMENT (no blocking issues; 2 suggestions)
| report: dict[str, dict[str, Any]] = {} | ||
| for source in sources: | ||
| try: | ||
| messages = importers[source].extract_messages() |
There was a problem hiding this comment.
💡 Suggestion: importers[source] is inside the try block, so a KeyError from an unknown source key is silently swallowed and reported as available: False, error="source_name". This turns a programming error (caller passed a source not registered in the importers dict) into a quiet data result.
Consider moving the lookup outside the try, or using an explicit guard:
This way a missing key surfaces immediately at the call site rather than appearing as an unavailability signal.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: The three cases (available, empty, erroring) are asserted in one monolithic test. If only the error case regresses, pytest reports the whole assertion as failed and you have to dig into the diff to see which source broke.
Splitting into three focused tests gives cleaner signal:
Each test then fails with a self-describing name, and the fixture setup is minimal per case.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Verdict: Comment — 0 critical, 0 warnings, 2 suggestions
Full context read (both changed files + callers in build_dataset_from_external), security threat scan applied, all 140 tests passing, ruff clean. The feature works correctly and is well-scoped. Two suggestions attached as inline comments; see summary below.
✅ Looks Good
- Correctness:
describe_source_availabilitycorrectly separates extraction errors from empty-but-healthy sources at the output layer; theexcept Exceptionwrapper is appropriate here since any importer failure (KeyError, IOError, runtime) should surface asunavailablerather than crashing the dry-run. - Security: No hardcoded credentials, no path traversal, no injection surfaces introduced. The security scan hit on f-strings in console.print lines are confirmed false positives.
- Testing: TDD discipline demonstrated (RED → GREEN noted in test plan). Three input classes exercise the three meaningful branches.
- CI: 140 passed, 11 warnings (all pre-existing DSPy deprecation warnings); ruff all-clear.
- Scope discipline: PR does exactly what it says on the tin — no scope creep into schema or serialization.
💡 Suggestions
- tests/core/test_external_importer_availability.py:27 — Consider splitting the single mega-test into three focused functions (
test_available_source,test_empty_source,test_erroring_source). Pytest failure output will name the exact failing scenario rather than reporting the whole bundle as broken. - evolution/core/external_importers.py:736 — The
"available": candidate_count > 0expression conflates reachable with non-empty. A freshly-installed tool with zero history will display asunavailableeven though nothing is wrong. Consider using a separate boolean (e.g."reachable": True) to distinguish "source found, just empty" from "source errored", or at minimum add a note to the docstring.
Reviewed by Hermes Agent — full file context read, security threat scan applied
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: This single test function exercises three distinct scenarios (available, empty, erroring) in one block. Consider splitting into three named tests — test_available_source_is_marked_available, test_empty_source_is_marked_unavailable, test_erroring_source_captures_error_string — so that pytest failure output immediately names the broken case. The current structure makes a future regression harder to triage at a glance.
| try: | ||
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { |
There was a problem hiding this comment.
💡 Suggestion: candidate_count > 0 conflates reachable with non-empty. A source that is fully installed but has an empty history will report as available: false even though the tool is working correctly — which may mislead users into thinking their installation is broken. Consider tracking reachability separately, e.g. "reachable": True, "available": candidate_count > 0, or update the docstring to clarify that available means "has at least one candidate message, not just that the source is installed and readable".
Review Summary — feat: report session source availabilityVerdict: COMMENT (no blocking issues; a few items worth addressing before or alongside the next PR in this series) What the PR doesPartially implements issue #54 step 1. Extracts a new pure function One new test file covers the three distinct states (available, empty, erroring). Test results
Key findings
Inline suggestionsTwo inline suggestions were posted on the review:
|
Code Review SummaryVerdict: Comment — 0 critical, 0 warnings, 2 suggestions ✅ Looks Good
💡 Suggestions
Reviewed by Hermes Agent — full file context read, security threat scan applied |
Code Review Summary — PR #66Verdict: COMMENT (no blocking issues — 2 suggestions only) This is a clean, well-scoped PR. 💡 Suggestions
✅ Looks Good
Reviewed by Hermes Agent |
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Verdict: Comment — 0 critical, 0 warnings, 2 suggestions
This is a clean, well-scoped slice. The new describe_source_availability function is easy to follow, the dry-run path no longer silently zeroes erroring sources, and the test directly maps the three observable states. Two suggestions below worth considering before the follow-on PR builds on this interface.
Reviewed by Hermes Agent — full file context read, security threat scan applied
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, | ||
| "candidate_count": candidate_count, |
There was a problem hiding this comment.
💡 Suggestion: "available": candidate_count > 0 conflates accessibility with non-emptiness. A source that responds without error but has zero history entries (e.g. a brand-new install) will show as unavailable, which is misleading — the importer is fully functional. Consider splitting the flag:
"accessible": True,
"available": candidate_count > 0,Or at minimum add a note in the docstring that available=False without an error means "reachable but empty," so callers building on this in the next PR do not misread it as a connection failure.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: The three scenarios (available, empty, erroring) are bundled into one omnibus assertion. When one case regresses, the failure message will not point directly at the broken scenario. Three focused tests would read more clearly and produce more actionable output:
def test_available_source_reports_available_and_count():
...
def test_empty_source_reports_unavailable_with_zero_count():
...
def test_erroring_source_reports_unavailable_with_error_string():
...The shared stubs (AvailableImporter, EmptyImporter, ErrorImporter) can stay as module-level fixtures — nothing needs to change structurally, just split the single assert report == {...} into three.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Verdict: Comment — 0 critical, 0 warnings, 2 suggestions
Clean, well-scoped PR. is a tidy boundary function and the test covers all three importer states. Full file context read and security threat scan applied — no findings. Two non-blocking suggestions inline.
Reviewed by Hermes Agent — full file context read, security threat scan applied
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, | ||
| "candidate_count": candidate_count, |
There was a problem hiding this comment.
💡 Suggestion: available is set to candidate_count > 0, which conflates "empty but healthy" with "broken/unavailable." A source that is correctly configured but returns no messages will show unavailable in dry-run output, which could mislead users into thinking the integration itself is broken.
Consider keeping available: True whenever extract_messages() returns without raising — even for an empty list — and letting the candidate count speak for itself:
report[source] = {
"available": True,
"candidate_count": candidate_count,
"error": None,
}The CLI output could then read available, candidates=0 vs unavailable (error=...), making the availability boundary clearer.
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): | ||
| report = describe_source_availability( |
There was a problem hiding this comment.
💡 Suggestion: The single test function covers three independent scenarios (available, empty, erroring) in one assertion block. If the available assertion fails, pytest will stop there and swallow the empty and error checks, making failure diagnosis harder.
Consider splitting into three focused functions:
def test_available_source_reports_candidates():
...
def test_empty_source_reports_zero_candidates():
...
def test_erroring_source_reports_unavailable_with_message():
...This also makes each test name self-documenting in pytest output.
Code Review SummaryVerdict: Comment — 0 critical, 0 warnings, 2 suggestions ✅ Looks Good
💡 Suggestions
Reviewed by Hermes Agent — full file context read, security threat scan applied |
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review — PR #66
Found 0 critical, 0 blocking warnings, 2 suggestions. See inline comments.
All tests pass (1/1). Ruff clean. This is a clean, focused addition — the two notes below are observations worth considering before a follow-up or merge, not blockers.
| report: dict[str, dict[str, Any]] = {} | ||
| for source in sources: | ||
| try: | ||
| messages = importers[source].extract_messages() |
There was a problem hiding this comment.
💡 Suggestion: extract_messages() is called here without a limit argument, so a dry-run eagerly reads the full session history for every source. The CopilotImporter docstring warns that session files can be 100 MB+, meaning --dry-run may be slower than a real run for large histories. Consider passing a small sentinel limit (e.g. limit=1) to probe availability cheaply — the candidate count from len(messages) would then be capped, but the available boolean would still be accurate.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: The test suite covers available, empty, and error importers well. One edge case worth adding: a source key that is not present in the importers dict. Currently importers[source] raises KeyError, which is caught by except Exception and stored as error="'missing_source'" — technically correct, but the quoted-repr string may surprise callers who check info['error'] for diagnostics. A test documenting (or asserting on) this behaviour would lock in the contract and make it obvious to future readers that the function treats missing keys the same as runtime errors from extract_messages().
Code Review SummaryVerdict: Comment — 0 critical, 0 blocking warnings, 2 suggestions ✅ Looks Good
💡 Suggestions
Reviewed by Hermes Agent — full file context read, security threat scan applied. PR is closed; comments are informational. |
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review — PR #66
Verdict: Comment — 0 critical, 1 warning, 2 suggestions.
Tests pass (1 passed) and ruff reports clean on both changed files. The core addition is solid: the function correctly isolates the availability check from the dry-run display, handles importer errors gracefully, and ships with a targeted test. Two observations worth addressing before merge are noted as inline comments below.
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: available conflates reachable with has candidates.
A source that connects successfully but returns an empty list is marked available: False, even though the importer is operational. This can mask a working-but-idle source and mislead callers that filter on this flag.
Consider either:
- Rename the field to
has_candidatesto make the semantics explicit, or - Set
available: Truewheneverextract_messages()succeeds (regardless of count) and letcandidate_count == 0carry the empty-source signal.
# Option B — available means reachable
report[source] = {
"available": True,
"candidate_count": candidate_count,
"error": None,
}| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: Consider splitting this into three focused tests.
The single function asserts all three cases (available, empty, error) in one block. If the assertion fails, pytest will report which key mismatched but not which scenario broke it — you'd need to read the full diff to determine whether the available, empty, or error path regressed.
Splitting gives self-documenting failure names:
def test_available_source_reports_available_with_count(): ...
def test_empty_source_reports_unavailable_with_zero_count(): ...
def test_erroring_source_reports_unavailable_with_error_message(): ...This is a style preference rather than a blocking issue — the current test does exercise all three paths correctly.
Code Review SummaryVerdict: Comment — 0 critical, 0 warnings, 2 suggestions ✅ Looks Good
💡 Suggestions
Reviewed by Hermes Agent — full file context read, security threat scan applied, tests executed (1 passed), ruff clean |
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review — PR #66
Verdict: Comment (not blocking merge) — 0 critical, 2 warnings, 3 suggestions.
The overall approach is clean and well-tested. Two warnings are worth the author's attention before this lands in a long-running environment. See inline comments for detail.
See the full summary comment for the complete checklist breakdown.
| report: dict[str, dict[str, Any]] = {} | ||
| for source in sources: | ||
| try: | ||
| messages = importers[source].extract_messages() |
There was a problem hiding this comment.
extract_messages() is called with no limit, triggering a complete scan of all session files (Copilot session stores can reach 100 MB+). Since this function only needs to determine availability, a full scan is unnecessarily expensive.
Suggested fix:
# Pass limit=1 — a single message is sufficient to confirm the source is reachable
messages = importers[source].extract_messages(limit=1)
# Note: candidate_count will then reflect only 0 or 1; document this if an exact count is neededIf an exact candidate count is intentional (for reporting purposes rather than just availability), add a comment explaining the trade-off and consider making it configurable via a full_count: bool = False parameter.
| try: | ||
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { |
There was a problem hiding this comment.
available conflates "inaccessible" with "no messages"
Setting available = candidate_count > 0 means a reachable source with zero messages (e.g. a brand-new Copilot install with no chat history yet) is reported as unavailable. That is misleading — the source exists and is healthy, it just has no candidates.
Suggested fix: Separate reachability from count:
"reachable": True, # set here; set to False in the except block
"candidate_count": candidate_count,In the except block:
report[source] = {"reachable": False, "candidate_count": 0, "error": str(exc)}This way callers can distinguish "could not open the source" (reachable=False) from "source is healthy but empty" (reachable=True, candidate_count=0). The test stubs and assertions would need updating to match.
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: available is currently candidate_count > 0, which conflates "source is reachable" with "source has messages". An empty-but-healthy source (extraction succeeded, returned []) gets reported as unavailable, which may mislead operators into thinking the integration is broken.
Consider separating the two signals:
"available": True, # extraction succeeded without raising
"candidate_count": candidate_count,and reserving available: False only for the except branch. The calling code can then distinguish "nothing to import today" from "source is broken".
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: This single test function validates three independent behaviours (available, empty, erroring) in one assertion block. When any of the three cases regresses, the failure message just says the whole dict does not match, making it harder to pinpoint which scenario broke.
Consider splitting into focused tests:
def test_available_source_is_reported_correctly(): ...
def test_empty_source_is_marked_unavailable(): ...
def test_erroring_source_captures_exception_message(): ...Each test becomes its own failure point with a self-documenting name.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review — PR #66
Verdict: Comment (no blocking issues, 2 suggestions)
Overall this is clean, well-scoped work. The new describe_source_availability function correctly isolates the extraction boundary from the CLI output loop, the error path is handled gracefully, and the test covers all three branches. Two non-blocking suggestions below.
See inline comments for details.
Reviewed by Hermes Agent
| report: dict[str, dict[str, Any]] = {} | ||
| for source in sources: | ||
| try: | ||
| messages = importers[source].extract_messages() |
There was a problem hiding this comment.
💡 Suggestion: importers[source] will raise KeyError if an unknown source name is passed. The surrounding except Exception block does catch it, but str(KeyError("claude-code")) produces the opaque string "'claude-code'" in the error field — no context that this was a missing-key problem.
Consider guarding explicitly for a clearer signal:
if source not in importers:
report[source] = {
"available": False,
"candidate_count": 0,
"error": f"unknown source '{source}'",
}
continueLow priority since callers are currently the controlled CLI, but worth hardening before this becomes a public API surface.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: The three scenarios (available, empty, erroring) are bundled into one assertion. If the available sub-case regresses, pytest reports the entire test as failed with a diff across all three keys — it's hard to tell at a glance which branch broke.
Consider splitting into focused tests:
def test_available_source_reports_available_and_count(): ...
def test_empty_source_reports_unavailable_zero_count(): ...
def test_erroring_source_reports_unavailable_with_error_string(): ...Each test name then acts as living documentation for the contract, and failures pinpoint the broken case instantly.
Code Review SummaryVerdict: Comment — no blocking issues · 2 suggestions 🔴 CriticalNone
|
jarrettj
left a comment
There was a problem hiding this comment.
Code Review — PR #66: feat: report session source availability
Verdict: Comment — solid, focused change. The new describe_source_availability function is clean, well-typed, and the test coverage is good. Two non-blocking suggestions inline; nothing preventing merge.
💡 Suggestions
evolution/core/external_importers.py:737—availablesemantics conflate "reachable" with "has candidates" (see inline)tests/core/test_external_importer_availability.py:27— monolithic assertion + missingKeyErroredge case (see inline)
✅ Looks Good
- Clean separation:
describe_source_availabilityis pure and independently testable - Error path swallows exceptions correctly and surfaces them in the report
- Dry-run output is now meaningfully explicit (
available/unavailable+ candidate count + optional error) - TDD approach documented in the PR description — RED first, then green
- Type hints are thorough (
dict[str, dict[str, Any]]) importersis called only once per dry run; results are reused in the display loop (no double extraction)
Reviewed by Hermes Agent
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Verdict: Comment — nice, focused slice. The function is correct, the test covers the three cases, and the dry-run output is now clearly bounded. Two suggestions below, nothing blocking.
💡 Suggestions
- external_importers.py:737 —
"available": candidate_count > 0conflates source is reachable with source has messages. An importer that works perfectly but currently has zero candidates will printunavailablein the dry-run output, which is misleading. Consider splitting intoreachable: True+has_candidates: candidate_count > 0, or at minimum renaming the key tohas_candidatesto match its actual semantics. - test_external_importer_availability.py:27 — The single combined test packs all three assertions into one function. If the
emptycase ever breaks, the failure message is buried. Consider splitting intotest_available_source,test_empty_source, andtest_erroring_sourceso each failure is self-describing.
✅ Looks Good
- Clean exception boundary —
except Exception as excis appropriate here since we don't own the importer implementations and any error should be reported, not re-raised. - Good docstring: "without LLM calls" is exactly the right signal for a dry-run helper.
- The eagerly-evaluated
availabilitydict inmainis a nice simplification over the previous inline loop.
Reviewed by Hermes Agent
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: "available": candidate_count > 0 conflates reachable with has messages. A perfectly functional importer with an empty history will show as unavailable in the dry-run output, which is misleading. Consider renaming to has_candidates or splitting into two keys (reachable: True, has_candidates: candidate_count > 0) so callers can distinguish "broken source" from "source with no data yet".
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: The three scenarios (available, empty, erroring) are all asserted in a single test function. If the empty case regresses, the failure message is hard to triage at a glance. Consider splitting into test_available_source, test_empty_source, and test_erroring_source — each with its own assert block — so pytest output is immediately self-describing on failure.
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: "available": candidate_count > 0 conflates two distinct concepts — the source is reachable vs the source has messages right now. A source that connects successfully but returns 0 messages will be printed as unavailable, which is misleading: the importer is available, it just has no candidates today.
Consider splitting the semantics:
report[source] = {
"reachable": True, # the importer ran without error
"has_candidates": candidate_count > 0,
"candidate_count": candidate_count,
"error": None,
}And update the display logic to show reachable (not has_candidates) as the availability status.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: Two improvements worth considering here:
- Split into focused tests — packing available + empty + error into one
assert report == {...}block means a single failure shows the whole dict mismatch instead of pointing directly at the broken case. Three small tests are easier to diagnose:
def test_available_source_reports_reachable(): ...
def test_empty_source_is_not_available(): ...
def test_erroring_source_captures_message(): ...- Add a
KeyErroredge-case test — if a caller passes a source name that is absent from theimportersdict, the broadexcept Exceptionindescribe_source_availabilitywill silently catch theKeyErrorand mark the source asunavailablewitherror="some_key". That may be intentional, but it should be an explicit, tested behaviour rather than an undocumented side effect.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review — PR #66
Clean, well-scoped implementation. No critical or blocking issues. Two suggestions below (see inline comments).
Verdict: COMMENT — nothing preventing merge, but worth considering before the next follow-up PR.
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: "available": candidate_count > 0 conflates two distinct states — a source that is reachable but has no messages vs one that errors. Downstream code (and humans reading the report) can't tell whether unavailable means "broken source" or "working source, no data yet." Consider using "reachable": True alongside candidate_count, and let callers decide what "available" means for their context. Alternatively, rename the key to has_candidates to make the semantics explicit.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: All three cases (available, empty, error) live in a single assertion block. When one assertion fails, pytest stops and you lose visibility into whether the other cases also broke. Splitting into test_available_source, test_empty_source, and test_erroring_source makes each failure independent and the test names self-documenting.
Code Review SummaryVerdict: COMMENT — no blocking issues; two suggestions worth considering before the follow-up PRs. ✅ Looks Good
💡 Suggestions
Reviewed by Hermes Agent |
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Verdict: Comment — implementation is solid and tests pass. Two non-blocking suggestions below.
- ✅ Clean fail-closed design in
describe_source_availability - ✅ Tests cover all three cases (available, empty, erroring)
- ✅ No secrets, no debug code, no security concerns
| report: dict[str, dict[str, Any]] = {} | ||
| for source in sources: | ||
| try: | ||
| messages = importers[source].extract_messages() |
There was a problem hiding this comment.
💡 Suggestion: extract_messages() is called here with no limit, loading the entire message history into memory just to count it. Since all three importers already support a limit parameter (extract_messages(limit: int = 0)), consider passing a small cap — e.g. extract_messages(limit=500) — to keep dry-run fast on large session stores (Copilot history files can be 100MB+).
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: "available": candidate_count > 0 conflates "source is accessible" with "source has messages." A properly configured but empty source will show available: false, which looks like a misconfiguration rather than a healthy-but-empty importer. Consider renaming this key to has_candidates (mirroring candidate_count) and reserving available to mean "importer did not raise" — making the three states unambiguous: accessible+has_candidates, accessible+empty, and error.
Code Review Summary — PR #66: feat: report session source availabilityVerdict: Comment (2 non-blocking suggestions — nothing blocking merge) ✅ Looks Good
💡 Suggestions
Reviewed by Hermes Agent |
jarrettj
left a comment
There was a problem hiding this comment.
Code Review — PR #66: feat: report session source availability
Verdict: Comment (2 non-blocking suggestions, nothing critical)
✅ Looks Good
- Fail-closed design is correct: sources that raise are marked
unavailable, not silently counted as zero str(exc)is safe for all standard exceptions and will not swallow the root cause- New function is pure (no side-effects, no I/O of its own) — easy to unit-test in isolation
- All three code paths (available, empty, error) are covered by tests
- No secrets, no debug artifacts, no merge markers
💡 Suggestions (2)
See inline comments for details.
Reviewed by Hermes Agent
| report: dict[str, dict[str, Any]] = {} | ||
| for source in sources: | ||
| try: | ||
| messages = importers[source].extract_messages() |
There was a problem hiding this comment.
💡 Suggestion: extract_messages() is called with no limit here, so even a --dry-run will fully materialise the entire message list before counting it. For large Copilot session stores this can be slow and memory-heavy. Consider passing a sentinel limit (e.g. extract_messages(limit=500)) and summing counts in batches, or adding a dedicated count_messages() classmethod that streams rather than collects.
| "candidate_count": 0, | ||
| "error": "history unreadable", | ||
| }, | ||
| } |
There was a problem hiding this comment.
💡 Suggestion: All three scenarios (available, empty, error) are asserted inside a single test function. If the empty assertion fails, pytest marks the whole function as failed without running the error assertions, making triage harder. Consider splitting into three focused tests — test_available_source, test_empty_source, test_erroring_source — so each failure message pinpoints exactly which scenario broke.
Code Review Summary — PR #66Verdict: Comment — 2 non-blocking suggestions, nothing critical or blocking. 💡 Suggestions
✅ Looks Good
Reviewed by Hermes Agent |
jarrettj
left a comment
There was a problem hiding this comment.
Code Review Summary — PR #66: feat: report session source availability
Verdict: Comment (2 suggestions, nothing blocking)
💡 Suggestions
-
evolution/core/external_importers.py:737 —
"available": candidate_count > 0conflates reachability with occupancy. A healthy-but-empty source (e.g. a fresh Claude Code install) is reported asunavailable, which could mislead operators into thinking the importer is broken. Consider two distinct fields —reachable: True(set in the success branch,Falsein the except) and keepcandidate_count— or rename the existing field tohas_candidatesto be accurate about what is being tested. -
tests/core/test_external_importer_availability.py:27 — All three scenarios (available, empty, error) are asserted in a single
test_*function. When any one case fails, pytest prints the whole dict diff which mixes all three cases together, making it harder to isolate the failure. Splitting into three focused functions (test_available_source,test_empty_source,test_erroring_source) would give one clear failure message per scenario.
✅ Looks Good
- Fail-closed design: exceptions surface as
available: Falsewith error message rather than silently swallowing them - Clean separation —
describe_source_availabilityis a pure function with no side effects; easy to test and reuse - The CLI integration correctly avoids calling
extract_messages()twice (once for availability, once for the loop) — single call via the pre-computedavailabilitydict - Good docstring: "without LLM calls" is exactly the right signal for a dry-run helper
- No secrets, no debug prints, no conflict markers
Reviewed by Hermes Agent
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: "available": candidate_count > 0 conflates reachability with occupancy. A source that works fine but currently has zero messages (e.g. a fresh Claude Code install) will be reported as unavailable, which could mislead an operator into thinking the importer is broken rather than just empty.
Consider separating the two signals:
report[source] = {
"reachable": True, # source connected successfully
"candidate_count": candidate_count,
"error": None,
}Or, at minimum, rename the field to has_candidates so the meaning is unambiguous.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: All three scenarios (available, empty, error) are collapsed into one assertion block. When any case fails, pytest prints the entire dict diff — mixing all three sources — which makes it harder to pinpoint which scenario regressed.
Split into three focused tests:
def test_available_source_reports_candidate_count(): ...
def test_empty_source_reports_unavailable(): ...
def test_erroring_source_reports_error_message(): ...Each test then fails with a single, unambiguous message.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Two non-blocking suggestions — nothing that should prevent merge. See inline comments.
| report: dict[str, dict[str, Any]] = {} | ||
| for source in sources: | ||
| try: | ||
| messages = importers[source].extract_messages() |
There was a problem hiding this comment.
💡 Suggestion: extract_messages() is called here with no limit, which loads the entire message corpus just to produce a count. Since all three real importers define extract_messages(limit: int = 0), a dry-run existence check only needs limit=1 (or a small sentinel like limit=100). For users with large Copilot histories this can make --dry-run noticeably slow.
# before
messages = importers[source].extract_messages()
# after — cheap existence check for dry-run
messages = importers[source].extract_messages(limit=100)| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: All three behaviours — available, empty, and error — are asserted in one function. When this test fails, pytest will report the whole block as failed without identifying which case broke. Splitting into three focused functions (test_available_source, test_empty_source, test_error_source) would make failures self-describing and make it easier to add future cases independently.
Code Review SummaryVerdict: Comment (0 critical, 0 warnings, 2 suggestions) 💡 Suggestions
✅ Looks Good
Reviewed by Hermes Agent |
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review — PR #66
Good, focused PR. The availability boundary is clear, dry-run output is explicit, and the test covers all three paths. Two non-blocking suggestions below — nothing prevents merge.
💡 Suggestions
- external_importers.py:734 —
extract_messages()loads the full message list just to count it; can be slow for large Copilot stores on dry-run. - test_external_importer_availability.py:27 — All three scenarios in one function; splitting makes failures self-documenting.
✅ Looks Good
- Empty result sets correctly treated as unavailable (not silently "0 messages")
error: Nonesentinel is explicit and consistent- Bare
except Exceptionis acceptable here — the goal is to surface any import failure as a string, not to swallow it
Reviewed by Hermes Agent
| report: dict[str, dict[str, Any]] = {} | ||
| for source in sources: | ||
| try: | ||
| messages = importers[source].extract_messages() |
There was a problem hiding this comment.
💡 Suggestion: extract_messages() is called with no limit, loading the full list into memory just to len() it. For large Copilot history files this makes even --dry-run noticeably slow. Consider capping the load:
messages = importers[source].extract_messages(limit=500)Or expose a lightweight count_messages() on the importer protocol so dry-run probing never materialises the full dataset.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: All three scenarios (available, empty, erroring) are asserted inside one function. If the available sub-assertion fails, pytest names the whole block, making it hard to tell at a glance which source broke. Consider splitting:
def test_available_source_reports_candidate_count(): ...
def test_empty_source_is_unavailable(): ...
def test_error_source_captures_error_string(): ...This also keeps each test focused and makes adding new edge cases (e.g. importer raises OSError) straightforward.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Solid PR — the dry-run availability boundary is a clear improvement over silent zero-counts. Two non-blocking suggestions below.
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: The flag conflates empty with unreachable. A source that returns 0 messages is reachable and healthy — it just has no candidates right now. Setting means the CLI will print for a working-but-empty source, which could mislead operators into thinking there's a connectivity problem.
Consider renaming this field to , or keeping for the success path and deriving the display label in the CLI loop instead:
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: Bundling all three scenarios (available, empty, error) into one monolithic assertion makes failure output noisy — pytest will diff the whole dict rather than naming the broken case. Consider splitting into three focused tests:
Not blocking, but it pays off immediately when one edge case regresses.
Code Review SummaryVerdict: Comment (0 blocking issues, 2 suggestions) ✅ Looks Good
💡 Suggestions
Reviewed by Hermes Agent |
jarrettj
left a comment
There was a problem hiding this comment.
Code Review — PR #66: Report session source availability
Verdict: Comment (no blockers; two suggestions worth considering before the next PR in this series builds on top)
💡 Suggestions
See inline comments for details on both. Nothing blocking — clean implementation overall.
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: Marking a source as available: False when extract_messages() returns [] conflates two distinct states: "importer is working but has no messages" vs "importer is broken/unconfigured." A healthy-but-empty source would appear the same as a crashing one in the output, which could mislead diagnostics.
Consider reserving available: False for the except path only:
report[source] = {
"available": True, # importer ran without error
"candidate_count": candidate_count,
"error": None,
}The dry-run output already prints candidates=0, so empty-but-working sources are still clearly distinguishable.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: Bundling all three cases (available, empty, error) into a single assertion makes failures hard to isolate — if the error branch regresses, the diff output will show the full nested dict mismatch rather than pointing directly at the broken case.
Consider splitting into three focused tests:
def test_available_source_reports_candidate_count(): ...
def test_empty_source_reports_unavailable(): ...
def test_erroring_source_captures_exception_message(): ...Each test stays short and its name communicates the exact behavior under test.
Code Review SummaryVerdict: Comment — no blockers, two suggestions worth considering 💡 Suggestions
✅ Looks Good
Reviewed by Hermes Agent |
jarrettj
left a comment
There was a problem hiding this comment.
Code Review — PR #66: feat: report session source availability
Overall this is a clean, well-scoped change. The dry-run output is now meaningfully more informative and the test file follows a good RED-first pattern. Two minor suggestions below — nothing blocking.
Reviewed by Claude Agent
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: available: candidate_count > 0 conflates reachability with having messages. A source that successfully returns [] is healthy — it connected, read history, and found nothing — but it is reported as unavailable, which may mislead debugging.
Consider making available mean "the importer ran without error" and surfacing emptiness separately:
report[source] = {
"available": True,
"candidate_count": candidate_count,
"empty": candidate_count == 0,
"error": None,
}This way the dry-run output can distinguish broken from empty, which is the boundary the PR description says it is trying to surface.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: Packing all three cases (available, empty, error) into one assertion block makes failure messages ambiguous — if the dict shape changes for just the error case, the whole test fails with a full-dict diff.
Consider splitting into three focused tests:
def test_available_source_reports_available_and_count():
report = describe_source_availability(["s"], {"s": AvailableImporter})
assert report["s"]["available"] is True
assert report["s"]["candidate_count"] == 2
def test_empty_source_reports_unavailable():
report = describe_source_availability(["s"], {"s": EmptyImporter})
assert report["s"]["available"] is False
assert report["s"]["error"] is None
def test_erroring_source_captures_message():
report = describe_source_availability(["s"], {"s": ErrorImporter})
assert report["s"]["available"] is False
assert "unreadable" in report["s"]["error"]Each test now has a single failure mode and a self-documenting name.
Code Review SummaryVerdict: Comment (2 suggestions, nothing blocking) ✅ Looks Good
💡 Suggestions
Reviewed by Claude Agent |
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: "available": candidate_count > 0 conflates reachability with having content. An empty-but-working importer gets available: False, making it look the same as an erroring importer (modulo the error field). Consider setting "available": True in the success branch to mean "source responded without throwing", and let callers use candidate_count == 0 to detect emptiness separately.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: This single function exercises all three cases in one assertion block. If one case regresses, pytest output will only point at the function name rather than the specific case. Consider splitting into three focused tests (test_available_source, test_empty_source, test_erroring_source) for faster failure diagnosis.
Code Review SummaryVerdict: Comment (0 critical, 0 warnings, 2 suggestions) ✅ Looks Good
💡 Suggestions
Reviewed by Hermes Agent |
| messages = importers[source].extract_messages() | ||
| candidate_count = len(messages) | ||
| report[source] = { | ||
| "available": candidate_count > 0, |
There was a problem hiding this comment.
💡 Suggestion: available: candidate_count > 0 conflates reachability with having data. A source that connects fine but returns zero messages will show as unavailable in the CLI output, which could mislead debugging. Consider separating the two signals — e.g., track reachable: True (no exception raised) independently from candidate_count, and derive the available/unavailable label from reachability rather than count.
| raise RuntimeError("history unreadable") | ||
|
|
||
|
|
||
| def test_describe_source_availability_reports_available_empty_and_errors(): |
There was a problem hiding this comment.
💡 Suggestion: The single omnibus test covers three cases in one assertion block. Splitting into test_available_source, test_empty_source, and test_erroring_source would make pytest output immediately pinpoint which case regresses, rather than showing the whole combined assertion failing. Nothing blocking — just easier to diagnose.
Code Review Summary — PR #66Verdict: Comment (0 critical, 0 warnings, 2 suggestions — nothing blocking merge) 💡 Suggestions
✅ Looks Good
Reviewed by Hermes Agent |
Summary
Partially implements issue #54 step 1 by making all-source dry runs report explicit per-source availability and candidate counts.
Adds:
describe_source_availability(sources, importers)available/unavailable, candidate count, and extraction error if anyThis turns the existing dry-run count display into an explicit availability boundary, so
--source all --dry-runno longer silently treats empty/erroring sources as just zero messages.Example behavior
Now prints each source as:
Scope / non-goals
This is the smallest safe ingestion-boundary slice. It does not yet implement the full canonical event/message schema or persist all source metadata into
EvalExample; those should follow as separate PRs because they touch dataset serialization and deduplication behavior.Test Plan
pytest tests/core/test_external_importer_availability.py -qfailed becausedescribe_source_availabilitydid not existpytest tests/core/test_external_importer_availability.py -qpytest -qgit diff --checkResult: 140 passed, 11 warnings (DSPy deprecation warnings only).
Partially addresses #54.