Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@

## [Unreleased]

### Phase 7 — admin "view transcript" UI for crew/agent runs (AI-045) (2026-06-15)

Makes the multi-agent reasoning chain inspectable. Every crew run (`crew.autopublish`/`crew.seo`) and single-agent run (`studybuddy`) already persists an `agent_run` row with a nested-step transcript (researcher→drafter→critic→editor); AI-045 surfaces it in the admin app. **No schema change, no new table** — read-only over the existing Phase 6 `agent_run`.

- **Backend** (`Api/Endpoints/AdminAiQualityEndpoints.cs`, mirrors the existing `GetTraces`/`GetTrace`): `GET /admin/ai-quality/agent-runs?agent=&limit=&offset=` (list, newest-first, `agent` filter = exact-or-prefix so `crew.` narrows to all crew runs; clamp 1–100; **list projection omits the heavy `StepsJson`+`Output`** and truncates `Goal` to 120 chars — guarded `Length > 120 ? Goal[..120] : Goal`) and `GET /admin/ai-quality/agent-runs/{id}` (detail, full **raw** `StepsJson` + `Output`, 404 on unknown). DTOs in `Contracts/Admin/AiQualityDtos.cs` (`AgentRunListItemDto`/`AgentRunsPageDto`/`AgentRunDetailDto`). `StepsJson` is passed through RAW and parsed client-side — same pattern as `TraceDetailDto`'s `MessagesJson`/`ToolCallsJson`; no brittle second server-side schema.
- **Frontend** (`apps/admin/src/pages/AiQualityPage.tsx`, new `Transcripts` tab modeled on `TracesTab`/`TraceModal`): filterable list (All / `crew.autopublish` / `crew.seo` / `studybuddy`) + pager → click row → modal. The modal parses `stepsJson` into the step tree: each `sub_agent` step is a collapsible `{stage} · {agentName}` panel with its per-step usage; the **critic** panel is special-cased + default-expanded — its inner `llm_response` JSON is rendered as score chips (factual_accuracy/tone/length/banned_phrases) + a severity-colored issue list (blocker red / major amber / minor gray). Single-agent steps (`llm_response`/`tool_result`) render via `pretty()`. Every layer is defensive — a malformed/empty/non-JSON step falls back to raw and never throws.
- **The JSON casing contract is the load-bearing risk** (a mismatch = blank transcript). The writer `DbAgentRunWriter` serializes `run.Steps` with **default** STJ options (unaffected by the HTTP pipeline's web-defaults), so the shape is mixed: top-level `AgentStep` records → PascalCase (`Index`/`Kind`/`Payload`/`At`), the `sub_agent` payload anon-object → camelCase (`stage`/`agentName`/`status`/`usage`/`steps`), the nested `AgentUsage` record → PascalCase again, inner `llm_response` payload `new { text }` → camelCase. The frontend reads each exactly. **`CrewTranscriptJsonContractTests`** (pure, no DB) serializes a real 4-stage `crew.autopublish` run through the same factory+options and asserts every key the UI depends on, **with negative asserts** so the test fails loudly if anyone ever puts camelCase/web options on the writer.
- Entry points from the AutoPublish/SEO pages (deep-link by the runIds those `crew-generate` endpoints already return) are a deliberate fast-follow; this PR is the self-contained AI-quality tab. Admin-only (`/admin/*` auth) — the `Goal` can carry user passages / book source, acceptable for the owner's own audit surface.

### Phase 7 — synthetic-defect critic harness (AI-044) (2026-06-15)

A calibration gate for the AI-041 `CriticAgent`: instead of trusting that the critic *would* catch a bad draft, we inject KNOWN defects into clean drafts and measure whether it actually does. ~23 fixtures over a single edition-description `ContentBrief` (`AutoPublishBriefs.Description("en")` — real 800–1600 char bounds + the shared `CrewBannedPhrases` blocklist): factual_hallucination ×6, banned_phrase ×4, length over/under ×2+2, tone_break ×4, plus 5 clean controls. The harness mirrors `ToolCallEvalRunner` exactly — real nano per case, **pure deterministic scoring, no judge**, `JudgeModelId="n/a"`, persists a reused `EvalRun` (**no schema change**), Score = catch-rate.
Expand Down
49 changes: 49 additions & 0 deletions apps/admin/src/api/client.ts
Original file line number Diff line number Diff line change
Expand Up @@ -485,6 +485,42 @@ export interface TraceDetail {
userId: string | null
createdAt: string
}
export interface AgentRunListItem {
id: string
agent: string
userId?: string | null
editionId?: string | null
status: string
goal: string
iterations: number
tokensIn: number
tokensOut: number
costUsd: number
latencyMs: number
hasError: boolean
createdAt: string
}
export interface AgentRunsPage {
total: number
items: AgentRunListItem[]
}
export interface AgentRunDetail {
id: string
agent: string
userId?: string | null
editionId?: string | null
status: string
goal: string
output?: string | null
stepsJson: string
iterations: number
tokensIn: number
tokensOut: number
costUsd: number
latencyMs: number
error?: string | null
createdAt: string
}
export interface EvalRun {
id: string
feature: string
Expand Down Expand Up @@ -1107,6 +1143,19 @@ export const adminApi = {
return fetchJson<TraceDetail>(`/admin/ai-quality/traces/${id}`)
},

getAgentRuns: async (params?: { agent?: string; limit?: number; offset?: number }): Promise<AgentRunsPage> => {
const query = new URLSearchParams()
if (params?.agent) query.set('agent', params.agent)
if (params?.limit) query.set('limit', String(params.limit))
if (params?.offset) query.set('offset', String(params.offset))
const qs = query.toString()
return fetchJson<AgentRunsPage>(`/admin/ai-quality/agent-runs${qs ? `?${qs}` : ''}`)
},

getAgentRun: async (id: string): Promise<AgentRunDetail> => {
return fetchJson<AgentRunDetail>(`/admin/ai-quality/agent-runs/${id}`)
},

getAiEvals: async (params?: { feature?: string; limit?: number }): Promise<EvalRun[]> => {
const query = new URLSearchParams()
if (params?.feature) query.set('feature', params.feature)
Expand Down
Loading
Loading