Skip to content

feat(engine): traces search API — cog_search_traces + GET /v1/traces#17

Merged
chazmaniandinkle merged 2 commits intocogos-dev:mainfrom
chazmaniandinkle:feat/traces-search
Apr 21, 2026
Merged

feat(engine): traces search API — cog_search_traces + GET /v1/traces#17
chazmaniandinkle merged 2 commits intocogos-dev:mainfrom
chazmaniandinkle:feat/traces-search

Conversation

@chazmaniandinkle
Copy link
Copy Markdown
Contributor

Summary

Unified traces search surface over the kernel's .cog/run/*.jsonl streams — closes Agent F's #7 critical MCP blindspot, reframed per Agent Q's design from "logs with search" to trace observability. (The .cog/run/ streams are client-originated semantic metabolites per the kernel's metabolic-cycle framing, not diagnostic text.)

What landed

  • cog_search_traces MCP tool — unified query over multiple JSONL sources with filters for source, session_id, since/until (duration or RFC3339), level, substring, limit, order.
  • GET /v1/traces HTTP route — 1:1 with the MCP input via query params; returns {results, count, truncated}.
  • Multi-source normalization across attention.jsonl, turn_metrics.jsonl, internal-requests.jsonl, proprioceptive.jsonl (where present) into unified {source, timestamp, session_id?, level?, line} shape.

Real drift surfaced + handled

Agent Q's normalization spec listed internal-requests as RFC3339 strings. On the live workspace the file's timestamps are Python-style Unix float seconds (e.g. "timestamp": 1769281958.2577438). Without detection this would have produced zero-timestamps for that entire source. parseTimestamp() now branches on the first byte of the raw JSON value (" → RFC3339, else → float64 unix seconds). No escape to callers; drift commented explicitly in sourceSpecs.

/v1/proprioceptive preserved byte-for-byte

dashboard.html:1265 and canvas.html:1706 both consume the exact shape. TestProprioceptiveEndpointUnchanged asserts the top-level keys are exactly {entries, light_cone} with no extras. Any future reshape will break CI.

Design judgments beyond spec

  • Per-source pre-limit = limit+1 so single-source queries can report truncation correctly (exact limit hides the overflow).
  • Unknown-source validation at parse-time (400, not 500) via resolveSources in both HTTP and MCP parsers.
  • internal.log plain-text excluded from v1 per Agent Q §8 Q3 — JSONL-only keeps the normalized shape clean.
  • Defensive byte-copy of scanner linesbufio.Scanner reuses internal buffer between Scan() calls; retained TraceResult.Line without copy would corrupt earlier entries silently.

Test plan

  • 14 unit tests on QueryTraces — empty workspace, single-source, multi-source ordering, filters (session_id, since/until, level, source, substring), limit+truncation, malformed-line skip, order=asc, bad source reject, normalization correctness, substring-too-long reject
  • 2 integration tests — TestHandleTracesHTTP (shape + 400 bad source), TestToolSearchTracesMCP (filter roundtrip)
  • 1 regression test — TestProprioceptiveEndpointUnchanged
  • 1 utility test — TestParseTraceDurationOrTime
  • go test ./internal/engine/... -short -count=1 -raceok 1.674s
  • go build ./... + go vet ./... silent

Not in scope

  • Kernel slog / stderr log API — separate follow-up gap coupled to Agent K's Windows stderr-capture work. Agent Q's §8 surfaces this as "gap #7b."
  • Live tail — belongs to Agent N's event bus (cog_tail_events / GET /v1/events/stream), not this surface.
  • Plain-text internal.log (non-JSONL).

Design reference

Agent Q's design CogDoc: ~/cog-workspace/.cog/mem/semantic/surveys/2026-04-21-consolidation/agent-Q-logs-search-design.cog.md

Pre-existing flakes flagged (not this PR)

On -count>=2, ledger package has package-level state leaks between iterations in TestCrossSessionChain, TestAppendEventConcurrent, TestAppendEventChainIntegrity (appendMu + lastEventCache). Also TestSyncWatcherMarksAlreadyPresentBlob races the 10 ms poll vs fsnotify. Confirmed present on clean upstream/main; unrelated to this commit. Will file as a follow-up issue.

… + HTTP

Closes Agent F's gap #7 — reframed per Agent Q's Survey Q design
(2026-04-21) as trace observability rather than diagnostic logs.
The content in `.cog/run/*.jsonl` is client-originated semantic
metabolite material (turn_metrics, attention signals, TRM
prediction-vs-reality, router traces), not kernel slog text.
Agent Q's naming reflects that: traces, not logs.

Additive only:
- New `QueryTraces(root, TraceQuery) (*TraceQueryResult, error)` in
  internal/engine/traces_query.go. Per-source scan + normalize,
  filter, merge with order-by-timestamp desc/asc, limit + truncated
  flag. Mirrors Agent L's QueryLedger algorithmic shape.
- New MCP tool `cog_search_traces` registered in registerTools().
  Handler delegates to QueryTraces; shares semantics with the HTTP
  surface via buildTraceQueryFromInput.
- New HTTP route `GET /v1/traces` + handleTraces. `/v1/proprioceptive`
  stays byte-for-byte identical — web/dashboard.html:1265 and
  web/canvas.html:1706 still consume `{entries, light_cone}` unchanged.

Unified result shape: `{source, timestamp, session_id?, level?, line}`
hides per-source schema drift. The normalization table owns translation:

  turn_metrics    → timestamp  + session_id
  attention       → occurred_at (no session)
  proprioceptive  → timestamp  + event-as-level
  internal_requests → timestamp (float unix seconds — drifts from spec's
                    'observed RFC3339'; parseTimestamp accepts both forms)

Filters: source, level, session_id, case-insensitive substring (byte
check before JSON parse), since/until (RFC3339 or Go duration), limit
(default 100, max 1000), order (desc|asc).

Diagnostics: sources_checked[] reports per-file {scanned, matched,
file_exists}. Missing file is not an error; callers can distinguish
"file absent" from "empty match".

Out of scope (follow-ups):
- Kernel slog stderr (cli.go:117). Couples to Windows service-manager
  stderr capture (Agent K territory) — gets its own PR.
- Live tail of trace files. Belongs on Agent N's ledger-backed event bus,
  not in a file-watcher bolted to this surface.
- Rotation semantics for turn_metrics.jsonl when it crosses 100 MB.
  Writer-side concern; spec is stable across that boundary.

Tests (18 total):
- 14 unit tests on QueryTraces: empty workspace, single-source, merged
  multi-source, session_id filter (incl. graceful 0 from no-session source),
  since duration + RFC3339, until bound, substring (case-insensitive),
  level on proprioceptive, truncation, malformed-line skipped, order=asc,
  bad source rejected, normalization correctness across
  turn_metrics/attention/internal_requests (including float unix drift),
  substring-too-long rejected.
- HTTP roundtrip + 400-on-unknown-source.
- MCP roundtrip with session_id filter.
- Regression guard: /v1/proprioceptive top-level shape stays exactly
  {entries, light_cone} — guards dashboard.html:1265 and canvas.html:1706
  from silent breakage.
- ParseTraceDurationOrTime: RFC3339, duration, rejection of garbage.
# Conflicts:
#	internal/engine/mcp_server.go
#	internal/engine/serve.go
@chazmaniandinkle chazmaniandinkle merged commit 789c17c into cogos-dev:main Apr 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant