feat(observability): planner CoT capture (gzipped ledger + 8KB Langfuse) [W2.D.3]#46
Conversation
…ED [W2.D.3] Failing tests per BUILD_PLAN W2.D.3.a: - test_gzipped_cot_in_ledger: payload["cot_gzip"] base64+gzip roundtrip; payload["cot_gzip_sha256"] SHA-256 over compressed bytes - test_8kb_attached_to_langfuse_span: CotCaptureResult.langfuse_span_payload <= 8192 bytes, first 8KB of original uncompressed CoT - test_extract_cloud_cot_returns_full_text: cloud mode = no stripping - test_extract_airgap_cot_strips_think_tags: Qwen3 <think>...</think> unwrap - test_extract_airgap_cot_passthrough_if_no_think_tags: no-op for plain text - test_cot_sha256_is_over_compressed_bytes: hash discipline check - test_cot_capture_result_has_entry_id: ledger_entry_id round-trip Currently RED: verdict.planning.cot_capture missing.
…se) [W2.D.3] Adds verdict/planning/cot_capture.py: extract_cot(response_text, mode): - mode="cloud": passthrough (Claude SDK responses have no <think> tags) - mode="airgap": strips Qwen3 <think>...</think> wrapper; nested tags are removed from inner content via re.sub; no-tag passthrough for GLM-4.5-Air non-thinking mode capture_planner_cot(cot_text, case_id, ledger, mode) -> CotCaptureResult: - gzip-compresses CoT at level 9 - SHA-256 over compressed bytes (hash what you store, not what you discard) - base64-encodes for JSON-safe ledger storage - writes planner_cot LedgerEntry with cot_gzip + cot_gzip_sha256 + metadata - returns CotCaptureResult(ledger_entry_id, langfuse_span_payload, sha256) where langfuse_span_payload = first 8192 bytes of original text ARCHITECTURE.md §9: "planner_cot" is the 13th canonical event type. ARCHITECTURE.md §9: "planner CoT capture goes to ledger AND first 8KB to Langfuse span." All 28 planning tests GREEN (11 new); ruff clean.
W2.D.3 — Planner CoT capture (gzipped ledger + 8KB Langfuse)CI gate: 28/28 GREEN (11 new + 17 from W2.D.1+D.2). Ruff: clean. PASS — hard rules
BUG —
|
Summary
verdict/planning/cot_capture.pywith two entry points:extract_cot(response_text, mode)— strips Qwen3<think>…</think>tags for air-gap mode; passthrough for cloud (Claude SDK). Handles nested/malformed tags viare.subcleanup.capture_planner_cot(cot_text, case_id, ledger, mode) -> CotCaptureResult— gzip-compresses, SHA-256 hashes (over compressed bytes, not original), base64-encodes, writesplanner_cotLedgerEntry, returnsCotCaptureResultwith first 8192 bytes aslangfuse_span_payload.CotCaptureResult(ledger_entry_id, langfuse_span_payload, cot_gzip_sha256)— the graph node attacheslangfuse_span_payloadto the Langfuse span attribute (live Langfuse API call not wired here per §3.10).Test plan
pytest tests/planning/test_cot_capture.py— 11 tests GREENruff check verdict/ tests/planning/— clean