diff --git a/CHANGELOG.md b/CHANGELOG.md index 92e81d7f..0e0eda7f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,18 @@ ## [Unreleased] +### Phase 7 — crew specialist sub-agents + prompts (AI-041) (2026-06-15) + +The four generic, **single-call** crew specialists the content crews (AI-042/043) compose via `CrewTasks.Of` + `CrewOrchestrator` (AI-040). Each is exactly ONE `ILlmService` gateway call — no tools, no `AgentLoop`, no iteration — and is domain-agnostic (no SEO/AutoPublish specifics): they operate on a shared `ContentBrief` (length in CHARACTERS, banned phrases, target language, optional style guide). **Why these four, in this order**: a researcher condenses the source into grounded bullet FACTS; a drafter writes the field strictly from those notes; a critic scores the draft 1-5 **against the research notes** (not its own knowledge) — every claim not supported by the notes is a factual-accuracy `blocker` — and an editor rewrites fixing each issue blockers-first. Grounding the critic on the research notes is the crux: it turns "does this sound plausible?" into "is this actually in the source?", which is what catches hallucinations the drafter slipped in. + +- **`Application/Agents/CrewAgentContracts.cs`** (new) — the records threaded through a crew: `ContentBrief`, `ResearchInput`/`ResearchNotes`, `DraftInput`/`Draft`, `CritiqueInput`/`EditInput`, `CritiqueResult` (four 1-5 scores + `Issues` + `ParseFailed`), `CritiqueIssue` (severity `blocker|major|minor`). All read the same brief so "write to N chars" and "score length against N" can never drift. +- **`Application/Agents/SingleCallAgent.cs`** (new, abstract) — base for a one-gateway-call `IAgent`: subclasses own only `FeatureTag` (routing), `MaxOutputTokens`, `BuildPrompt`, `Parse`; the base does the request/timing/step/usage plumbing so each specialist is ~15 lines. Produces the same shape the orchestrator schedules — one `"llm_response"` `AgentStep` + `AgentUsage(Iterations: 1, …)` mapped from the gateway's `LlmUsage`. +- **`ResearcherAgent` / `DrafterAgent` / `CriticAgent` / `EditorAgent`** (new) — feature tags `crew.researcher` / `crew.drafter` / `crew.critic` / `crew.editor`; token budgets 600/500/700/500. Researcher/drafter/editor parse = trimmed text; critic parses via `CriticOutputParser`. +- **`Application/Agents/CriticOutputParser.cs`** (new, pure) — strips ```` ```json ````/```` ``` ```` fences, slices first `{` to last `}`, deserializes case-insensitively into a private DTO matching the prompt's schema, clamps each score to [1,5], coerces unknown/blank severity → `minor`, drops issues with no fix. **Fail-closed**: any exception / empty / no-brace → `CritiqueResult(1,1,1,1, [blocker "unparseable"], ParseFailed: true)` and NEVER throws — an unreadable critic must read as "reject", never as a silent clean pass. +- **`Application/Agents/Prompts/`** (new dir) — `ResearcherPrompt`/`DrafterPrompt`/`CriticPrompt`/`EditorPrompt` (pure `BuildSystemPrompt`/`BuildUserPrompt`, mirroring `ExplainPrompt`) + an internal `BriefConstraints` so drafter/critic/editor render length + banned phrases IDENTICALLY. `CriticPrompt` inlines the literal JSON schema it shares with the parser and demands a bare JSON object only. +- **DI** — the four registered as singletons next to `StudyBuddyAgent` in `Api/Program.cs` (stateless, take the singleton `ILlmService`). +- Tests: `CrewSpecialistsTests` (23, fake `ILlmService`, no key/network) — each agent maps its canned response to the typed output with one `llm_response` step + `Iterations==1` + usage from `LlmUsage`; the `CriticOutputParser` battery (well-formed, fenced, trailing prose, score clamp 0→1 / 9→5, unknown severity → minor, missing-fix dropped, garbage/empty/no-brace fail-closed, never-throws sweep); prompt builders surface length range / banned phrase / language / the critic schema tokens (`factual_accuracy`, `severity`, `blocker`); and a crew integration smoke test wiring all four via `CrewTasks.Of` into a 4-stage `CrewPlan` run through the real `CrewOrchestrator` — asserts state threads research→draft→critique→edit and the transcript has 4 `CrewStepEntry`s in declaration order (proves the AI-040 contract; no stray `ITool` added). + ### Phase 7 — CrewOrchestrator primitive (AI-040) (2026-06-15) Phase 7 opens with the **generic multi-agent orchestration engine** — the crew-level analogue of `AgentLoop` (AI-034). Engine-only: no concrete crews, no specialist agents, no SEO/AutoPublish wiring, no endpoint (those are AI-041+). Like `AgentLoop` shipped before `StudyBuddyAgent`, the primitive lands first and migrates callers later. **Reuses Phase 6 seams**: no new tables, no new persistence interface — a crew run persists through the same `IAgentRunWriter`/`agent_run` path as a single agent. diff --git a/backend/src/Api/Program.cs b/backend/src/Api/Program.cs index 4042f84d..55b952c9 100644 --- a/backend/src/Api/Program.cs +++ b/backend/src/Api/Program.cs @@ -89,6 +89,12 @@ // Agent loop engine (Phase 6, AI-034). Concrete agents (StudyBuddy, AI-035) build on it. TextStack.Ai.Agents.ServiceCollectionExtensions.AddAiAgents(builder.Services); builder.Services.AddScoped(); +// Crew specialists (Phase 7, AI-041): single-call IAgent sub-agents the content crews +// (AI-042/043) compose via CrewTasks.Of. Stateless + ILlmService is a singleton, so singleton is fine. +builder.Services.AddSingleton(); +builder.Services.AddSingleton(); +builder.Services.AddSingleton(); +builder.Services.AddSingleton(); builder.Services.AddAuthSettings(builder.Configuration); var connectionString = builder.Configuration.GetConnectionString("Default") diff --git a/backend/src/Application/Agents/CrewAgentContracts.cs b/backend/src/Application/Agents/CrewAgentContracts.cs new file mode 100644 index 00000000..6b264885 --- /dev/null +++ b/backend/src/Application/Agents/CrewAgentContracts.cs @@ -0,0 +1,50 @@ +namespace Application.Agents; + +/// +/// The shared writing assignment threaded through a content crew (Phase 7, AI-041). Every specialist reads +/// the SAME brief so the constraints (length, banned phrases, language, style) are enforced identically by +/// drafter, critic and editor — there is no drift between "write to N chars" and "score length against N". +/// Lengths are in CHARACTERS (SEO fields are character-bounded, not token-bounded). +/// +public record ContentBrief( + string EntityType, + string FieldName, + int MinLength, + int MaxLength, + IReadOnlyList BannedPhrases, + string TargetLanguage, + string? StyleGuide); + +/// Input to the researcher: the brief plus the raw source material to condense into neutral facts. +public record ResearchInput(ContentBrief Brief, string SourceMaterial); + +/// The researcher's output: bullet FACTS grounded entirely in the source, ready for the drafter. +public record ResearchNotes(string Notes); + +/// Input to the drafter: the brief plus the research notes it must write strictly from. +public record DraftInput(ContentBrief Brief, ResearchNotes Research); + +/// A produced draft of the requested field. +public record Draft(string Text); + +/// Input to the critic: the brief, the draft to score, and the research notes to ground it against. +public record CritiqueInput(ContentBrief Brief, Draft Draft, ResearchNotes Research); + +/// Input to the editor: the brief, the draft to revise, and the critique listing what to fix. +public record EditInput(ContentBrief Brief, Draft Draft, CritiqueResult Critique); + +/// +/// The critic's structured verdict: 1-5 scores on each axis plus an issue list. is +/// true when the critic's output could not be parsed (fail-closed all-1s) — callers treat that as "reject, +/// needs work", never as a clean pass. +/// +public record CritiqueResult( + int FactualAccuracy, + int Tone, + int Length, + int BannedPhrases, + IReadOnlyList Issues, + bool ParseFailed); + +/// One actionable defect the critic found. Severity is one of "blocker", "major", "minor". +public record CritiqueIssue(string Location, string Severity, string Fix); diff --git a/backend/src/Application/Agents/CriticAgent.cs b/backend/src/Application/Agents/CriticAgent.cs new file mode 100644 index 00000000..8ad1868e --- /dev/null +++ b/backend/src/Application/Agents/CriticAgent.cs @@ -0,0 +1,21 @@ +using Application.Agents.Prompts; +using TextStack.Ai.Core; + +namespace Application.Agents; + +/// +/// Crew critic (AI-041): one gateway call that scores the draft 1-5 on four axes against the research notes +/// and lists actionable issues. Grounds factual accuracy on the notes (unsupported claims = blockers) and +/// fails closed if its JSON can't be parsed — see . +/// +public sealed class CriticAgent(ILlmService llm) : SingleCallAgent(llm) +{ + protected override string FeatureTag => "crew.critic"; + protected override int MaxOutputTokens => 700; + + protected override (string system, string user) BuildPrompt(CritiqueInput input) => + (CriticPrompt.BuildSystemPrompt(input.Brief), + CriticPrompt.BuildUserPrompt(input.Draft, input.Research)); + + protected override CritiqueResult Parse(string text, CritiqueInput input) => CriticOutputParser.Parse(text); +} diff --git a/backend/src/Application/Agents/CriticOutputParser.cs b/backend/src/Application/Agents/CriticOutputParser.cs new file mode 100644 index 00000000..be857b5b --- /dev/null +++ b/backend/src/Application/Agents/CriticOutputParser.cs @@ -0,0 +1,153 @@ +using System.Text.Json; +using System.Text.Json.Serialization; + +namespace Application.Agents; + +/// +/// Parses the critic's JSON verdict into a typed (AI-041). LLMs leak fences and +/// prose around JSON, so we strip code fences and extract the first top-level balanced {...} object via a +/// string-aware brace scan before deserializing leniently. Scores are clamped to [1,5]; a blank severity stays +/// "minor" (absent = not asserted) while a non-empty-but-unrecognized severity coerces to "major" (fail-closed +/// direction — never silently downgrade an unknown severity below the blocker threshold); issues without a fix +/// are dropped. Critically it is FAIL-CLOSED: any failure (empty, no brace, malformed, exception) yields the +/// worst-possible verdict with ParseFailed: true, never an exception and never a silent clean pass — an +/// unparseable critic must read as "reject", so a hallucinating drafter can't sneak past on a critic that +/// merely failed to format its output. +/// +public static class CriticOutputParser +{ + private static readonly JsonSerializerOptions Options = new() + { + PropertyNameCaseInsensitive = true, + NumberHandling = JsonNumberHandling.AllowReadingFromString, + }; + + private static readonly CritiqueResult FailClosed = new( + 1, 1, 1, 1, + [new CritiqueIssue("output", "blocker", "Critic output was unparseable.")], + ParseFailed: true); + + public static CritiqueResult Parse(string llmText) + { + try + { + var json = ExtractJson(llmText); + if (json is null) + return FailClosed; + + var dto = JsonSerializer.Deserialize(json, Options); + if (dto?.Scores is null) + return FailClosed; + + var issues = (dto.Issues ?? []) + .Where(i => !string.IsNullOrWhiteSpace(i.Fix)) + .Select(i => new CritiqueIssue( + i.Location ?? "draft", + NormalizeSeverity(i.Severity), + i.Fix!.Trim())) + .ToList(); + + return new CritiqueResult( + Clamp(dto.Scores.FactualAccuracy), + Clamp(dto.Scores.Tone), + Clamp(dto.Scores.Length), + Clamp(dto.Scores.BannedPhrases), + issues, + ParseFailed: false); + } + catch + { + // Never throw: a critic we cannot read must fail closed, not crash the crew. + return FailClosed; + } + } + + /// + /// Strips ```` ``` ```` / ```` ```json ```` fences and returns the first top-level balanced {...}. + /// A string-aware brace scan from the first {: braces inside JSON string literals are ignored (an + /// in-string flag toggles on each unescaped ", honoring \ so \" doesn't toggle), so the + /// scan stops at the object's own closing brace. This makes "valid object then arbitrary prose" parse + /// cleanly and extracts object-0 from [{...},{...}] without splicing across elements. Returns null if + /// there is no { or the object never balances — caller fail-closes. + /// + private static string? ExtractJson(string text) + { + if (string.IsNullOrWhiteSpace(text)) + return null; + + var cleaned = text.Replace("```json", string.Empty, StringComparison.OrdinalIgnoreCase) + .Replace("```", string.Empty); + + var start = cleaned.IndexOf('{'); + if (start < 0) + return null; + + var depth = 0; + var inString = false; + var escaped = false; + + for (var i = start; i < cleaned.Length; i++) + { + var c = cleaned[i]; + + if (inString) + { + if (escaped) + escaped = false; + else if (c == '\\') + escaped = true; + else if (c == '"') + inString = false; + continue; + } + + switch (c) + { + case '"': + inString = true; + break; + case '{': + depth++; + break; + case '}': + depth--; + if (depth == 0) + return cleaned[start..(i + 1)]; + break; + } + } + + return null; // never balanced → fail closed + } + + private static int Clamp(int score) => Math.Clamp(score, 1, 5); + + private static string NormalizeSeverity(string? severity) + { + var s = severity?.Trim().ToLowerInvariant(); + if (string.IsNullOrEmpty(s)) + return "minor"; // absent = not asserted, don't manufacture severity + return s is "blocker" or "major" or "minor" ? s : "major"; // unknown-but-stated → fail-closed direction + } + + private sealed class CriticDto + { + [JsonPropertyName("scores")] public ScoresDto? Scores { get; set; } + [JsonPropertyName("issues")] public List? Issues { get; set; } + } + + private sealed class ScoresDto + { + [JsonPropertyName("factual_accuracy")] public int FactualAccuracy { get; set; } + [JsonPropertyName("tone")] public int Tone { get; set; } + [JsonPropertyName("length")] public int Length { get; set; } + [JsonPropertyName("banned_phrases")] public int BannedPhrases { get; set; } + } + + private sealed class IssueDto + { + [JsonPropertyName("location")] public string? Location { get; set; } + [JsonPropertyName("severity")] public string? Severity { get; set; } + [JsonPropertyName("fix")] public string? Fix { get; set; } + } +} diff --git a/backend/src/Application/Agents/DrafterAgent.cs b/backend/src/Application/Agents/DrafterAgent.cs new file mode 100644 index 00000000..c505a253 --- /dev/null +++ b/backend/src/Application/Agents/DrafterAgent.cs @@ -0,0 +1,20 @@ +using Application.Agents.Prompts; +using TextStack.Ai.Core; + +namespace Application.Agents; + +/// +/// Crew drafter (AI-041): one gateway call that writes the requested field strictly from the research notes, +/// within the brief's character bounds and language. Stays groundable — the critic scores it against the notes. +/// +public sealed class DrafterAgent(ILlmService llm) : SingleCallAgent(llm) +{ + protected override string FeatureTag => "crew.drafter"; + protected override int MaxOutputTokens => 500; + + protected override (string system, string user) BuildPrompt(DraftInput input) => + (DrafterPrompt.BuildSystemPrompt(input.Brief), + DrafterPrompt.BuildUserPrompt(input.Research)); + + protected override Draft Parse(string text, DraftInput input) => new(text.Trim()); +} diff --git a/backend/src/Application/Agents/EditorAgent.cs b/backend/src/Application/Agents/EditorAgent.cs new file mode 100644 index 00000000..a5be23e0 --- /dev/null +++ b/backend/src/Application/Agents/EditorAgent.cs @@ -0,0 +1,20 @@ +using Application.Agents.Prompts; +using TextStack.Ai.Core; + +namespace Application.Agents; + +/// +/// Crew editor (AI-041): one gateway call that rewrites the draft fixing each critique issue (blockers first), +/// keeping the supported facts and staying within the brief's character bounds. Last link in the content crew. +/// +public sealed class EditorAgent(ILlmService llm) : SingleCallAgent(llm) +{ + protected override string FeatureTag => "crew.editor"; + protected override int MaxOutputTokens => 500; + + protected override (string system, string user) BuildPrompt(EditInput input) => + (EditorPrompt.BuildSystemPrompt(input.Brief), + EditorPrompt.BuildUserPrompt(input.Draft, input.Critique)); + + protected override Draft Parse(string text, EditInput input) => new(text.Trim()); +} diff --git a/backend/src/Application/Agents/Prompts/BriefConstraints.cs b/backend/src/Application/Agents/Prompts/BriefConstraints.cs new file mode 100644 index 00000000..b1cbc28b --- /dev/null +++ b/backend/src/Application/Agents/Prompts/BriefConstraints.cs @@ -0,0 +1,18 @@ +namespace Application.Agents.Prompts; + +/// +/// Shared, single-source rendering of the brief's length + banned-phrase constraints so the drafter, critic +/// and editor read them IDENTICALLY (AI-041). If the drafter is told "120-300 characters" and the critic +/// scores length against a different phrasing, length critique drifts — so all three call through here. +/// +internal static class BriefConstraints +{ + public static string Length(ContentBrief brief) => + $"{brief.MinLength}-{brief.MaxLength} characters"; + + /// Comma-separated banned phrases, or null when the brief bans none. + public static string? BannedPhrases(ContentBrief brief) => + brief.BannedPhrases.Count == 0 + ? null + : string.Join(", ", brief.BannedPhrases.Select(p => $"\"{p}\"")); +} diff --git a/backend/src/Application/Agents/Prompts/CriticPrompt.cs b/backend/src/Application/Agents/Prompts/CriticPrompt.cs new file mode 100644 index 00000000..6627b146 --- /dev/null +++ b/backend/src/Application/Agents/Prompts/CriticPrompt.cs @@ -0,0 +1,44 @@ +namespace Application.Agents.Prompts; + +/// +/// The crew critic's prompt (AI-041) — the most load-bearing prompt in the crew. It scores the draft 1-5 on +/// four axes AGAINST THE RESEARCH NOTES (not the model's own knowledge): any draft claim not supported by the +/// notes is a factual-accuracy blocker. That research-grounding is the whole point — it turns "does +/// this sound plausible?" into "is this actually in the source?", which is what makes the critic catch +/// hallucinations the drafter slipped in. The output MUST be a bare JSON object matching the exact schema +/// deserializes; the parser fails closed on anything else. Pure strings. +/// +public static class CriticPrompt +{ + /// The literal schema the critic must emit and parses — kept in one place. + public const string Schema = + "{\"scores\":{\"factual_accuracy\":1-5,\"tone\":1-5,\"length\":1-5,\"banned_phrases\":1-5}," + + "\"issues\":[{\"location\":\"…\",\"severity\":\"blocker|major|minor\",\"fix\":\"…\"}]}"; + + public static string BuildSystemPrompt(ContentBrief brief) + { + var prompt = + $"You are an editor reviewing a draft {brief.FieldName} for a {brief.EntityType}. " + + "Score the draft 1-5 (1 worst, 5 best) on each of four axes:\n" + + "- factual_accuracy: every claim in the draft must be supported BY THE RESEARCH NOTES. " + + "Flag EVERY claim that is not supported by the notes as a \"blocker\" issue and lower this score.\n" + + $"- tone: does it match the requested style{(string.IsNullOrWhiteSpace(brief.StyleGuide) ? string.Empty : $" ({brief.StyleGuide!.Trim()})")} " + + $"and read naturally in {brief.TargetLanguage}?\n" + + $"- length: is the draft within {BriefConstraints.Length(brief)}?\n" + + "- banned_phrases: does the draft avoid the banned phrases?"; + + if (BriefConstraints.BannedPhrases(brief) is { } banned) + prompt += $" Banned phrases: {banned}."; + + prompt += + "\nList every concrete problem as an issue with a location, a severity " + + "(\"blocker\", \"major\", or \"minor\"), and a specific fix.\n" + + "Output a bare JSON object ONLY — no markdown, no code fences, no commentary before or after it — " + + "with EXACTLY this schema:\n" + Schema; + + return prompt; + } + + public static string BuildUserPrompt(Draft draft, ResearchNotes research) => + $"Research notes:\n{research.Notes}\n\nDraft to review:\n{draft.Text}"; +} diff --git a/backend/src/Application/Agents/Prompts/DrafterPrompt.cs b/backend/src/Application/Agents/Prompts/DrafterPrompt.cs new file mode 100644 index 00000000..ed32e067 --- /dev/null +++ b/backend/src/Application/Agents/Prompts/DrafterPrompt.cs @@ -0,0 +1,31 @@ +namespace Application.Agents.Prompts; + +/// +/// The crew drafter's prompt (AI-041): write the requested field using ONLY the researcher's notes, within +/// the brief's character bounds, in the target language, avoiding banned phrases. Writing strictly from the +/// notes is what keeps the draft groundable — the critic later scores it against those same notes, so any +/// claim the drafter invents shows up as an unsupported blocker. Pure string building. +/// +public static class DrafterPrompt +{ + public static string BuildSystemPrompt(ContentBrief brief) + { + var prompt = + $"You are a copywriter writing the {brief.FieldName} of a {brief.EntityType}. " + + "Use ONLY the facts in the research notes provided — do not add any information that is not in the notes. " + + $"Write the text in {brief.TargetLanguage}. " + + $"The text must be {BriefConstraints.Length(brief)} long."; + + if (BriefConstraints.BannedPhrases(brief) is { } banned) + prompt += $" Do not use these phrases: {banned}."; + + if (!string.IsNullOrWhiteSpace(brief.StyleGuide)) + prompt += $" Style guide: {brief.StyleGuide.Trim()}."; + + prompt += " Output only the finished text — no markdown, no preface, no quotes around it."; + return prompt; + } + + public static string BuildUserPrompt(ResearchNotes research) => + $"Research notes:\n{research.Notes}"; +} diff --git a/backend/src/Application/Agents/Prompts/EditorPrompt.cs b/backend/src/Application/Agents/Prompts/EditorPrompt.cs new file mode 100644 index 00000000..3d50824a --- /dev/null +++ b/backend/src/Application/Agents/Prompts/EditorPrompt.cs @@ -0,0 +1,36 @@ +namespace Application.Agents.Prompts; + +/// +/// The crew editor's prompt (AI-041): rewrite the draft fixing each issue the critic raised, blockers first, +/// while keeping the facts the draft got right and staying within the brief's character bounds. Reads length +/// and banned phrases from the SAME the drafter and critic use, so a rewrite +/// can't satisfy the editor yet fail the critic on length. Pure string building. +/// +public static class EditorPrompt +{ + public static string BuildSystemPrompt(ContentBrief brief) + { + var prompt = + $"You are an editor revising a draft {brief.FieldName} for a {brief.EntityType}. " + + "Rewrite the draft to fix every issue listed in the critique, addressing the \"blocker\" issues first, " + + "then \"major\", then \"minor\". " + + "Keep the facts the draft already got right; remove or correct any claim flagged as unsupported. " + + $"Write in {brief.TargetLanguage}. " + + $"The revised text must be {BriefConstraints.Length(brief)} long."; + + if (BriefConstraints.BannedPhrases(brief) is { } banned) + prompt += $" Do not use these phrases: {banned}."; + + prompt += " Output only the revised text — no markdown, no preface, no commentary."; + return prompt; + } + + public static string BuildUserPrompt(Draft draft, CritiqueResult critique) + { + var issues = critique.Issues.Count == 0 + ? "(no specific issues listed)" + : string.Join("\n", critique.Issues.Select(i => $"- [{i.Severity}] {i.Location}: {i.Fix}")); + + return $"Draft to revise:\n{draft.Text}\n\nIssues to fix:\n{issues}"; + } +} diff --git a/backend/src/Application/Agents/Prompts/ResearcherPrompt.cs b/backend/src/Application/Agents/Prompts/ResearcherPrompt.cs new file mode 100644 index 00000000..b465350f --- /dev/null +++ b/backend/src/Application/Agents/Prompts/ResearcherPrompt.cs @@ -0,0 +1,21 @@ +namespace Application.Agents.Prompts; + +/// +/// The crew researcher's prompt (AI-041): condense the raw source material into neutral, grounded bullet +/// FACTS that a later drafter will write from. Drops anything not present in the source — the whole crew's +/// factual grounding starts here, so hallucinated "facts" at this stage poison every downstream step. +/// Pure string building, no dependencies (mirrors ExplainPrompt). +/// +public static class ResearcherPrompt +{ + public static string BuildSystemPrompt(ContentBrief brief) => + $"You are a research assistant preparing notes for writing the {brief.FieldName} of a {brief.EntityType}. " + + "Read the source material and extract only the concrete, verifiable facts relevant to that field. " + + "Write them as short neutral bullet points, one fact per line, prefixed with \"- \". " + + "Use ONLY information present in the source material. " + + "If a detail is not in the source, do not include it — never guess, infer, or add outside knowledge. " + + "No prose, no headings, no preface, no commentary — just the bullet list of facts."; + + public static string BuildUserPrompt(string sourceMaterial) => + $"Source material:\n{sourceMaterial}"; +} diff --git a/backend/src/Application/Agents/ResearcherAgent.cs b/backend/src/Application/Agents/ResearcherAgent.cs new file mode 100644 index 00000000..cdd2d8b8 --- /dev/null +++ b/backend/src/Application/Agents/ResearcherAgent.cs @@ -0,0 +1,20 @@ +using Application.Agents.Prompts; +using TextStack.Ai.Core; + +namespace Application.Agents; + +/// +/// Crew researcher (AI-041): one gateway call that condenses the source material into grounded bullet facts. +/// First link in the content crew — its notes are what every later specialist writes from and is scored against. +/// +public sealed class ResearcherAgent(ILlmService llm) : SingleCallAgent(llm) +{ + protected override string FeatureTag => "crew.researcher"; + protected override int MaxOutputTokens => 600; + + protected override (string system, string user) BuildPrompt(ResearchInput input) => + (ResearcherPrompt.BuildSystemPrompt(input.Brief), + ResearcherPrompt.BuildUserPrompt(input.SourceMaterial)); + + protected override ResearchNotes Parse(string text, ResearchInput input) => new(text.Trim()); +} diff --git a/backend/src/Application/Agents/SingleCallAgent.cs b/backend/src/Application/Agents/SingleCallAgent.cs new file mode 100644 index 00000000..50e8bcea --- /dev/null +++ b/backend/src/Application/Agents/SingleCallAgent.cs @@ -0,0 +1,54 @@ +using System.Diagnostics; +using System.Text.Json; +using TextStack.Ai.Core; + +namespace Application.Agents; + +/// +/// Base for a crew specialist that is exactly ONE gateway call (Phase 7, AI-041) +/// — no tools, no AgentLoop, no iteration. The crew (AI-042/043) composes these via CrewTasks.Of, so +/// each must satisfy the same contract the orchestrator schedules: produce a +/// step transcript (one "llm_response" step) and a usage row (Iterations = 1) mapped from the gateway's +/// . Subclasses own only their feature tag (routing), token budget, prompt and parse — +/// the gateway plumbing (request, timing, step, usage) lives here so the four specialists stay ~15 lines each. +/// +public abstract class SingleCallAgent(ILlmService llm) : IAgent +{ + /// Routes the call (model selection, tracing, cost caps) — one tag per specialist. + protected abstract string FeatureTag { get; } + + /// Output-token budget for this specialist's single completion. + protected abstract int MaxOutputTokens { get; } + + /// Builds the system + user prompt for this input. Pure string production (delegates to a Prompts builder). + protected abstract (string system, string user) BuildPrompt(TIn input); + + /// Turns the raw completion text into the typed output (trim / parse). Must never throw — fail-closed instead. + protected abstract TOut Parse(string text, TIn input); + + public async Task> RunAsync(TIn input, AgentContext ctx, CancellationToken ct) + { + var sw = Stopwatch.StartNew(); + + var (system, user) = BuildPrompt(input); + var resp = await llm.CompleteAsync( + new LlmRequest(system, [new LlmMessage("user", user)], MaxOutputTokens, FeatureTag: FeatureTag), + ct); + sw.Stop(); + + var step = new AgentStep( + 0, + "llm_response", + JsonSerializer.SerializeToElement(new { text = resp.Text }), + DateTimeOffset.UtcNow); + + var usage = new AgentUsage( + 1, + resp.Usage.InputTokens, + resp.Usage.OutputTokens, + resp.Usage.CostUsd, + (int)sw.ElapsedMilliseconds); + + return new AgentResult(Parse(resp.Text, input), [step], usage); + } +} diff --git a/tests/TextStack.UnitTests/CrewSpecialistsTests.cs b/tests/TextStack.UnitTests/CrewSpecialistsTests.cs new file mode 100644 index 00000000..2d88a56b --- /dev/null +++ b/tests/TextStack.UnitTests/CrewSpecialistsTests.cs @@ -0,0 +1,632 @@ +using Application.Agents; +using Microsoft.Extensions.DependencyInjection; +using TextStack.Ai.Agents; +using TextStack.Ai.Core; + +namespace TextStack.UnitTests; + +/// +/// AI-041 — the four single-call crew specialists (Researcher/Drafter/Critic/Editor) and the critic JSON +/// parser, driven by a deterministic fake (no key, no network). Verifies each agent +/// maps the gateway response into its typed output + a one-step transcript + usage (Iterations=1), the +/// critic parser is lenient and fail-closed, the prompt builders surface every brief constraint, and the four +/// compose into a real plan via (the AI-040 contract). +/// +public class CrewSpecialistsTests +{ + // ---- Fakes ------------------------------------------------------------------------------------- + + /// Returns a single canned completion. StreamAsync is not used by single-call agents. + private sealed class FakeLlm(string text, LlmUsage? usage = null) : ILlmService + { + public string? LastSystem { get; private set; } + public string? LastUser { get; private set; } + public string? LastFeatureTag { get; private set; } + public int? LastMaxOutputTokens { get; private set; } + + public Task CompleteAsync(LlmRequest request, CancellationToken ct) + { + LastSystem = request.SystemPrompt; + LastUser = request.Messages[0].Content; + LastFeatureTag = request.FeatureTag; + LastMaxOutputTokens = request.MaxOutputTokens; + return Task.FromResult(new LlmResponse( + text, [], usage ?? new LlmUsage(40, 20, 0.002m), "m", Guid.NewGuid())); + } + + public IAsyncEnumerable StreamAsync(LlmRequest request, CancellationToken ct) => + throw new NotSupportedException(); + } + + private static AgentContext Ctx() => + new(null, null, Guid.NewGuid(), new ServiceCollection().BuildServiceProvider()); + + private static ContentBrief Brief( + IReadOnlyList? banned = null, string? style = null, + int min = 120, int max = 300, string lang = "English") => + new("Author", "biography", min, max, banned ?? ["a leading authority", "needs no introduction"], lang, style); + + private static CancellationToken Ct => TestContext.Current.CancellationToken; + + // ---- 1. Per-agent: output mapping + transcript + usage ----------------------------------------- + + [Fact] + public async Task ResearcherAgent_FakeReturnsText_ParsesTrimmedNotesWithOneStepAndUsage() + { + var llm = new FakeLlm(" - born 1900\n- wrote books "); + var agent = new ResearcherAgent(llm); + + var result = await agent.RunAsync(new ResearchInput(Brief(), "source"), Ctx(), Ct); + + Assert.Equal("- born 1900\n- wrote books", result.Output.Notes); + Assert.Equal("crew.researcher", llm.LastFeatureTag); + Assert.Equal(600, llm.LastMaxOutputTokens); + AssertSingleLlmStepAndUsage(result); + } + + [Fact] + public async Task DrafterAgent_FakeReturnsText_ParsesTrimmedDraft() + { + var llm = new FakeLlm(" A short bio. "); + var agent = new DrafterAgent(llm); + + var result = await agent.RunAsync( + new DraftInput(Brief(), new ResearchNotes("- fact")), Ctx(), Ct); + + Assert.Equal("A short bio.", result.Output.Text); + Assert.Equal("crew.drafter", llm.LastFeatureTag); + Assert.Equal(500, llm.LastMaxOutputTokens); + AssertSingleLlmStepAndUsage(result); + } + + [Fact] + public async Task CriticAgent_FakeReturnsJson_ParsesTypedCritiqueResult() + { + const string json = """{"scores":{"factual_accuracy":4,"tone":5,"length":3,"banned_phrases":5},"issues":[{"location":"line 1","severity":"major","fix":"tighten"}]}"""; + var llm = new FakeLlm(json); + var agent = new CriticAgent(llm); + + var result = await agent.RunAsync( + new CritiqueInput(Brief(), new Draft("d"), new ResearchNotes("n")), Ctx(), Ct); + + Assert.False(result.Output.ParseFailed); + Assert.Equal(4, result.Output.FactualAccuracy); + Assert.Equal(5, result.Output.Tone); + var issue = Assert.Single(result.Output.Issues); + Assert.Equal("major", issue.Severity); + Assert.Equal("crew.critic", llm.LastFeatureTag); + Assert.Equal(700, llm.LastMaxOutputTokens); + AssertSingleLlmStepAndUsage(result); + } + + [Fact] + public async Task EditorAgent_FakeReturnsText_ParsesTrimmedDraft() + { + var llm = new FakeLlm(" Revised bio. "); + var agent = new EditorAgent(llm); + + var critique = new CritiqueResult(2, 3, 4, 5, + [new CritiqueIssue("line 1", "blocker", "remove unsupported claim")], false); + var result = await agent.RunAsync( + new EditInput(Brief(), new Draft("d"), critique), Ctx(), Ct); + + Assert.Equal("Revised bio.", result.Output.Text); + Assert.Equal("crew.editor", llm.LastFeatureTag); + Assert.Equal(500, llm.LastMaxOutputTokens); + AssertSingleLlmStepAndUsage(result); + } + + [Fact] + public async Task CriticAgent_FakeReturnsEmptyText_FailsClosedNotThrows() + { + var agent = new CriticAgent(new FakeLlm("")); + var result = await agent.RunAsync( + new CritiqueInput(Brief(), new Draft("d"), new ResearchNotes("n")), Ctx(), Ct); + + Assert.True(result.Output.ParseFailed); + Assert.Equal(1, result.Output.FactualAccuracy); + AssertSingleLlmStepAndUsage(result); + } + + [Fact] + public async Task DrafterAgent_FakeReturnsEmptyText_ProducesEmptyDraftNotThrows() + { + var agent = new DrafterAgent(new FakeLlm(" ")); + var result = await agent.RunAsync( + new DraftInput(Brief(), new ResearchNotes("- fact")), Ctx(), Ct); + + Assert.Equal(string.Empty, result.Output.Text); + } + + [Fact] + public async Task ResearcherAgent_FakeReturnsEmptyText_ProducesEmptyNotesNotThrows() + { + var agent = new ResearcherAgent(new FakeLlm("")); + var result = await agent.RunAsync(new ResearchInput(Brief(), "source"), Ctx(), Ct); + Assert.Equal(string.Empty, result.Output.Notes); + } + + [Fact] + public async Task SingleCallAgent_UsageWithZeroCost_PreservesNonZeroTokens() + { + // Guard against zeros being lost / hardcoded: map LlmUsage through verbatim, only Iterations is fixed at 1. + var agent = new DrafterAgent(new FakeLlm("ok", new LlmUsage(123, 45, 0m))); + var result = await agent.RunAsync( + new DraftInput(Brief(), new ResearchNotes("- fact")), Ctx(), Ct); + + Assert.Equal(1, result.Usage.Iterations); + Assert.Equal(123, result.Usage.InputTokensTotal); + Assert.Equal(45, result.Usage.OutputTokensTotal); + Assert.Equal(0m, result.Usage.CostUsdTotal); + } + + [Fact] + public async Task SingleCallAgent_CancelledToken_PropagatesToGateway() + { + // The agent must forward the caller's CancellationToken to the gateway; a pre-cancelled token + // surfaces as OperationCanceledException, not a completed run on a cancelled request. + var agent = new DrafterAgent(new CancelObservingLlm()); + using var cts = new CancellationTokenSource(); + await cts.CancelAsync(); + + await Assert.ThrowsAnyAsync(() => + agent.RunAsync(new DraftInput(Brief(), new ResearchNotes("- fact")), Ctx(), cts.Token)); + } + + /// Throws if the token it receives is not the cancelled one the caller passed — proves forwarding. + private sealed class CancelObservingLlm : ILlmService + { + public Task CompleteAsync(LlmRequest request, CancellationToken ct) + { + ct.ThrowIfCancellationRequested(); + return Task.FromResult(new LlmResponse("x", [], new LlmUsage(1, 1, 0m), "m", Guid.NewGuid())); + } + + public IAsyncEnumerable StreamAsync(LlmRequest request, CancellationToken ct) => + throw new NotSupportedException(); + } + + private static void AssertSingleLlmStepAndUsage(AgentResult result) + { + Assert.Equal(1, result.Usage.Iterations); + Assert.Equal(40, result.Usage.InputTokensTotal); + Assert.Equal(20, result.Usage.OutputTokensTotal); + Assert.Equal(0.002m, result.Usage.CostUsdTotal); + var step = Assert.Single(result.Steps); + Assert.Equal(0, step.Index); + Assert.Equal("llm_response", step.Kind); + Assert.True(step.Payload.TryGetProperty("text", out _)); + } + + // ---- 2. CriticOutputParser --------------------------------------------------------------------- + + [Fact] + public void Parse_WellFormedJson_ReturnsTypedResult() + { + const string json = """{"scores":{"factual_accuracy":5,"tone":4,"length":3,"banned_phrases":2},"issues":[{"location":"intro","severity":"blocker","fix":"cite source"}]}"""; + var r = CriticOutputParser.Parse(json); + + Assert.False(r.ParseFailed); + Assert.Equal(5, r.FactualAccuracy); + Assert.Equal(4, r.Tone); + Assert.Equal(3, r.Length); + Assert.Equal(2, r.BannedPhrases); + var issue = Assert.Single(r.Issues); + Assert.Equal("intro", issue.Location); + Assert.Equal("blocker", issue.Severity); + Assert.Equal("cite source", issue.Fix); + } + + [Fact] + public void Parse_JsonFenced_StripsFencesAndParses() + { + const string text = "```json\n{\"scores\":{\"factual_accuracy\":3,\"tone\":3,\"length\":3,\"banned_phrases\":3},\"issues\":[]}\n```"; + var r = CriticOutputParser.Parse(text); + + Assert.False(r.ParseFailed); + Assert.Equal(3, r.FactualAccuracy); + Assert.Empty(r.Issues); + } + + [Fact] + public void Parse_TrailingProseAfterBrace_StillParses() + { + const string text = "{\"scores\":{\"factual_accuracy\":4,\"tone\":4,\"length\":4,\"banned_phrases\":4},\"issues\":[]} Hope that helps!"; + var r = CriticOutputParser.Parse(text); + + Assert.False(r.ParseFailed); + Assert.Equal(4, r.Tone); + } + + [Fact] + public void Parse_ScoreAboveRange_ClampedToFive() + { + const string text = """{"scores":{"factual_accuracy":9,"tone":3,"length":3,"banned_phrases":3},"issues":[]}"""; + Assert.Equal(5, CriticOutputParser.Parse(text).FactualAccuracy); + } + + [Fact] + public void Parse_ScoreBelowRange_ClampedToOne() + { + const string text = """{"scores":{"factual_accuracy":0,"tone":3,"length":3,"banned_phrases":3},"issues":[]}"""; + Assert.Equal(1, CriticOutputParser.Parse(text).FactualAccuracy); + } + + [Fact] + public void Parse_UnrecognizedNonEmptySeverity_CoercedToMajor() + { + // Fail-closed direction: a critic that writes something we don't recognize ("critical"/"catastrophic") + // must NOT be silently downgraded below the blocker threshold — coerce up to "major", not down to "minor". + const string text = """{"scores":{"factual_accuracy":3,"tone":3,"length":3,"banned_phrases":3},"issues":[{"location":"x","severity":"critical","fix":"fix it"}]}"""; + var r = CriticOutputParser.Parse(text); + Assert.Equal("major", Assert.Single(r.Issues).Severity); + } + + [Fact] + public void Parse_EmptySeverity_StaysMinor() + { + // Absent severity (whitespace) is "not asserted" — don't manufacture a severity; stays minor. + const string text = """{"scores":{"factual_accuracy":3,"tone":3,"length":3,"banned_phrases":3},"issues":[{"location":"x","severity":" ","fix":"fix it"}]}"""; + var r = CriticOutputParser.Parse(text); + Assert.Equal("minor", Assert.Single(r.Issues).Severity); + } + + [Fact] + public void Parse_IssueMissingFix_Dropped() + { + const string text = """{"scores":{"factual_accuracy":3,"tone":3,"length":3,"banned_phrases":3},"issues":[{"location":"x","severity":"major","fix":""},{"location":"y","severity":"minor","fix":"keep"}]}"""; + var r = CriticOutputParser.Parse(text); + var issue = Assert.Single(r.Issues); + Assert.Equal("keep", issue.Fix); + } + + [Theory] + [InlineData("not json at all")] + [InlineData("")] + [InlineData(" ")] + [InlineData("{ this is broken ][")] + public void Parse_Garbage_FailsClosed(string text) + { + var r = CriticOutputParser.Parse(text); + + Assert.True(r.ParseFailed); + Assert.Equal(1, r.FactualAccuracy); + Assert.Equal(1, r.Tone); + Assert.Equal(1, r.Length); + Assert.Equal(1, r.BannedPhrases); + var issue = Assert.Single(r.Issues); + Assert.Equal("blocker", issue.Severity); + } + + [Fact] + public void Parse_MissingScores_FailsClosed() + { + var r = CriticOutputParser.Parse("""{"issues":[]}"""); + Assert.True(r.ParseFailed); + } + + [Fact] + public void Parse_NeverThrows_OnAnyInput() + { + // Exhaustively confirm fail-closed: no input shape throws. + foreach (var s in new[] { null, "", "{", "}", "{}", "[]", "{\"scores\":null}", "garbage {nested {brace}}" }) + { + var ex = Record.Exception(() => CriticOutputParser.Parse(s!)); + Assert.Null(ex); + } + } + + // ---- 2b. CriticOutputParser — adversarial real-LLM failure shapes ------------------------------- + + /// + /// The keystone safety property: garbage must NEVER deserialize to a clean, high-scored pass + /// (false-clean is the worst failure — a hallucinating drafter sneaks past a critic that merely + /// failed to format). A missing score field defaults to 0 → clamps to the WORST score (1), not the best. + /// + [Fact] + public void Parse_ScoresObjectPresentButFieldMissing_ClampsToWorstNotBest() + { + const string text = """{"scores":{"tone":5,"length":5,"banned_phrases":5},"issues":[]}"""; + var r = CriticOutputParser.Parse(text); + + // It parsed (scores object present) but the missing axis must read as 1, never 5. + Assert.False(r.ParseFailed); + Assert.Equal(1, r.FactualAccuracy); + Assert.Equal(5, r.Tone); + } + + [Fact] + public void Parse_NullScoreField_FailsClosed() + { + // null into a non-nullable int throws in System.Text.Json → fail closed (worst), never false-clean. + const string text = """{"scores":{"factual_accuracy":null,"tone":5,"length":5,"banned_phrases":5},"issues":[]}"""; + var r = CriticOutputParser.Parse(text); + Assert.True(r.ParseFailed); + Assert.Equal(1, r.FactualAccuracy); + } + + [Fact] + public void Parse_ScoreAsQuotedString_ReadsNumber() + { + // AllowReadingFromString — "4" must parse, not fail closed. + const string text = """{"scores":{"factual_accuracy":"4","tone":"3","length":"3","banned_phrases":"3"},"issues":[]}"""; + var r = CriticOutputParser.Parse(text); + Assert.False(r.ParseFailed); + Assert.Equal(4, r.FactualAccuracy); + } + + [Fact] + public void Parse_ScoreAsFloat_FailsClosed_NeverFalseClean() + { + // 4.5 into an int property: System.Text.Json throws → fail-closed (worst), never a silent high pass. + const string text = """{"scores":{"factual_accuracy":4.5,"tone":4.5,"length":4.5,"banned_phrases":4.5},"issues":[]}"""; + var r = CriticOutputParser.Parse(text); + Assert.True(r.ParseFailed); + Assert.Equal(1, r.FactualAccuracy); + } + + [Fact] + public void Parse_NonNumericScore_FailsClosed() + { + const string text = """{"scores":{"factual_accuracy":"good","tone":3,"length":3,"banned_phrases":3},"issues":[]}"""; + var r = CriticOutputParser.Parse(text); + Assert.True(r.ParseFailed); + } + + [Fact] + public void Parse_TopLevelJsonArrayWrappingTheObject_RecoversInnerObject() + { + // A critic that wraps its verdict in a single-element array still parses CLEAN: the balanced-brace scan + // starts at the first '{' (object-0) and stops at its matching '}', ignoring the array brackets. + const string text = """[{"scores":{"factual_accuracy":5,"tone":5,"length":5,"banned_phrases":5},"issues":[]}]"""; + var r = CriticOutputParser.Parse(text); + Assert.False(r.ParseFailed); + Assert.Equal(5, r.FactualAccuracy); + } + + [Fact] + public void Parse_MultiElementTopLevelArray_RecoversFirstObjectScores() + { + // The balanced-brace scan extracts object-0 cleanly (stops at its own '}'), so a multi-element array + // [{good},{bad}] no longer splices across elements — object-0's real scores are recovered. + const string text = """[{"scores":{"factual_accuracy":5,"tone":4,"length":3,"banned_phrases":2},"issues":[]},{"x":1}]"""; + var r = CriticOutputParser.Parse(text); + Assert.False(r.ParseFailed); + Assert.Equal(5, r.FactualAccuracy); + Assert.Equal(4, r.Tone); + Assert.Equal(3, r.Length); + Assert.Equal(2, r.BannedPhrases); + } + + [Fact] + public void Parse_NestedBracesInsideStringValue_ParsesCorrectly() + { + // A '}' inside a string value must not confuse the brace scan: in-string braces are ignored. + const string text = """{"scores":{"factual_accuracy":3,"tone":3,"length":3,"banned_phrases":3},"issues":[{"location":"intro","severity":"minor","fix":"replace the {placeholder} token"}]}"""; + var r = CriticOutputParser.Parse(text); + Assert.False(r.ParseFailed); + Assert.Equal("replace the {placeholder} token", Assert.Single(r.Issues).Fix); + } + + [Fact] + public void Parse_TrailingProseBraceAfterValidJson_ParsesCleanRecoversScores() + { + // A perfectly valid critique followed by prose containing a stray brace (emoticon ":}", "${var}") now + // PARSES CLEAN: the balanced-brace scan stops at the object's own closing '}' and ignores the prose. + const string text = """{"scores":{"factual_accuracy":5,"tone":4,"length":3,"banned_phrases":2},"issues":[]} Looks good :} and use ${var}"""; + var r = CriticOutputParser.Parse(text); + Assert.False(r.ParseFailed); + Assert.Equal(5, r.FactualAccuracy); + Assert.Equal(4, r.Tone); + Assert.Equal(3, r.Length); + Assert.Equal(2, r.BannedPhrases); + } + + [Fact] + public void Parse_InStringBraceInIssueLocation_ParsesClean() + { + // A brace inside a string value ("${x}") must not break the scan — in-string braces are ignored. + const string text = """{"scores":{"factual_accuracy":4,"tone":4,"length":4,"banned_phrases":4},"issues":[{"location":"use ${x}","severity":"minor","fix":"a"}]}"""; + var r = CriticOutputParser.Parse(text); + Assert.False(r.ParseFailed); + Assert.Equal(4, r.FactualAccuracy); + var issue = Assert.Single(r.Issues); + Assert.Equal("use ${x}", issue.Location); + } + + [Fact] + public void Parse_EscapedQuoteInStringValue_ParsesClean() + { + // An escaped quote inside a string ('\"') must not prematurely close the string and expose its braces. + const string text = """{"scores":{"factual_accuracy":4,"tone":4,"length":4,"banned_phrases":4},"issues":[{"location":"x","severity":"minor","fix":"say \"hi\" {now}"}]}"""; + var r = CriticOutputParser.Parse(text); + Assert.False(r.ParseFailed); + Assert.Equal("say \"hi\" {now}", Assert.Single(r.Issues).Fix); + } + + [Fact] + public void Parse_LeadingBomAndJunkBeforeJson_StillParses() + { + const string text = "Here is my review:\n{\"scores\":{\"factual_accuracy\":4,\"tone\":4,\"length\":4,\"banned_phrases\":4},\"issues\":[]}"; + var r = CriticOutputParser.Parse(text); + Assert.False(r.ParseFailed); + Assert.Equal(4, r.Tone); + } + + [Fact] + public void Parse_DuplicateScoreKeys_LastWins_NeverThrows() + { + // System.Text.Json takes the last duplicate; assert it doesn't throw and stays in range. + const string text = """{"scores":{"factual_accuracy":1,"factual_accuracy":5,"tone":3,"length":3,"banned_phrases":3},"issues":[]}"""; + var r = CriticOutputParser.Parse(text); + Assert.False(r.ParseFailed); + Assert.Equal(5, r.FactualAccuracy); + } + + [Fact] + public void Parse_IssuesNotAnArray_FailsClosed() + { + // "issues" as an object (not array) → deserialize throws on the list → fail closed. + const string text = """{"scores":{"factual_accuracy":4,"tone":4,"length":4,"banned_phrases":4},"issues":{"location":"x"}}"""; + var r = CriticOutputParser.Parse(text); + Assert.True(r.ParseFailed); + } + + [Fact] + public void Parse_SeverityUppercaseAndPadded_NormalizedToLowercase() + { + const string text = """{"scores":{"factual_accuracy":3,"tone":3,"length":3,"banned_phrases":3},"issues":[{"location":"x","severity":" BLOCKER ","fix":"do it"}]}"""; + var r = CriticOutputParser.Parse(text); + Assert.Equal("blocker", Assert.Single(r.Issues).Severity); + } + + [Fact] + public void Parse_IssueNullSeverityAndNullLocation_DefaultsApplied() + { + const string text = """{"scores":{"factual_accuracy":3,"tone":3,"length":3,"banned_phrases":3},"issues":[{"fix":"do it"}]}"""; + var r = CriticOutputParser.Parse(text); + var issue = Assert.Single(r.Issues); + Assert.Equal("draft", issue.Location); + Assert.Equal("minor", issue.Severity); + } + + [Fact] + public void Parse_NaNScore_FailsClosed() + { + // NaN is not valid JSON number → deserialize throws → fail closed. + const string text = """{"scores":{"factual_accuracy":NaN,"tone":3,"length":3,"banned_phrases":3},"issues":[]}"""; + var r = CriticOutputParser.Parse(text); + Assert.True(r.ParseFailed); + } + + // ---- 3. Prompt builders surface brief constraints ---------------------------------------------- + + [Fact] + public void DrafterPrompt_SurfacesLengthBannedPhraseAndLanguage() + { + var brief = Brief(banned: ["a leading authority"], style: "neutral encyclopedic", lang: "Ukrainian", min: 150, max: 280); + var system = Application.Agents.Prompts.DrafterPrompt.BuildSystemPrompt(brief); + + Assert.Contains("150-280 characters", system); + Assert.Contains("a leading authority", system); + Assert.Contains("Ukrainian", system); + Assert.Contains("neutral encyclopedic", system); + } + + [Fact] + public void CriticPrompt_SurfacesSchemaTokensLengthAndBannedPhrase() + { + var brief = Brief(banned: ["needs no introduction"], min: 100, max: 200); + var system = Application.Agents.Prompts.CriticPrompt.BuildSystemPrompt(brief); + + Assert.Contains("factual_accuracy", system); + Assert.Contains("severity", system); + Assert.Contains("100-200 characters", system); + Assert.Contains("needs no introduction", system); + Assert.Contains("blocker", system); + } + + [Fact] + public void EditorPrompt_SurfacesLengthAndLanguage() + { + var brief = Brief(lang: "Spanish", min: 90, max: 160); + var system = Application.Agents.Prompts.EditorPrompt.BuildSystemPrompt(brief); + + Assert.Contains("90-160 characters", system); + Assert.Contains("Spanish", system); + Assert.Contains("blocker", system); // blockers-first ordering + } + + [Fact] + public void ResearcherPrompt_NamesFieldAndEntityAndGroundsOnSource() + { + var brief = Brief(); + var system = Application.Agents.Prompts.ResearcherPrompt.BuildSystemPrompt(brief); + + Assert.Contains("biography", system); + Assert.Contains("Author", system); + Assert.Contains("ONLY", system); // grounded on source only + } + + [Fact] + public void DrafterPrompt_NoBannedPhrases_OmitsBannedClause() + { + var brief = Brief(banned: []); + var system = Application.Agents.Prompts.DrafterPrompt.BuildSystemPrompt(brief); + Assert.DoesNotContain("Do not use these phrases", system); + } + + // ---- 4. Crew integration smoke ----------------------------------------------------------------- + + private sealed class CrewState + { + public ResearchNotes? Research { get; set; } + public Draft? Draft { get; set; } + public CritiqueResult? Critique { get; set; } + public Draft? Final { get; set; } + } + + [Fact] + public async Task FourSpecialists_ComposedIntoCrewPlan_ThreadStateAndTranscriptInOrder() + { + var brief = Brief(); + var source = "Born 1900. Wrote three novels."; + + // Distinct canned outputs per specialist so we can prove the state threaded through each stage. + var researcher = new ResearcherAgent(new FakeLlm("- born 1900\n- wrote three novels")); + var drafter = new DrafterAgent(new FakeLlm("Born in 1900, the author wrote three novels.")); + var critic = new CriticAgent(new FakeLlm( + """{"scores":{"factual_accuracy":5,"tone":4,"length":4,"banned_phrases":5},"issues":[{"location":"end","severity":"minor","fix":"add a period"}]}""")); + var editor = new EditorAgent(new FakeLlm("Born in 1900, the author wrote three acclaimed novels.")); + + var plan = new CrewPlan("content", + [ + new CrewStage("research", + [ + CrewTasks.Of( + "researcher", researcher, + _ => new ResearchInput(brief, source), + (s, o) => s.Research = o), + ]), + new CrewStage("draft", + [ + CrewTasks.Of( + "drafter", drafter, + s => new DraftInput(brief, s.Research!), + (s, o) => s.Draft = o), + ]), + new CrewStage("critique", + [ + CrewTasks.Of( + "critic", critic, + s => new CritiqueInput(brief, s.Draft!, s.Research!), + (s, o) => s.Critique = o), + ]), + new CrewStage("edit", + [ + CrewTasks.Of( + "editor", editor, + s => new EditInput(brief, s.Draft!, s.Critique!), + (s, o) => s.Final = o), + ]), + ], new CrewOptions()); + + var result = await new CrewOrchestrator().RunAsync(plan, new CrewState(), Ctx(), Ct); + + Assert.Equal(CrewRunRecordFactory.StatusCompleted, result.Status); + + // State threaded research -> draft -> critique -> edit. + Assert.Equal("- born 1900\n- wrote three novels", result.State.Research!.Notes); + Assert.Equal("Born in 1900, the author wrote three novels.", result.State.Draft!.Text); + Assert.False(result.State.Critique!.ParseFailed); + Assert.Equal(5, result.State.Critique.FactualAccuracy); + Assert.Equal("Born in 1900, the author wrote three acclaimed novels.", result.State.Final!.Text); + + // Transcript: exactly 4 CrewStepEntry in declaration order. + Assert.Equal(4, result.Steps.Count); + Assert.Equal(["research", "draft", "critique", "edit"], result.Steps.Select(e => e.Stage)); + Assert.Equal(["researcher", "drafter", "critic", "editor"], result.Steps.Select(e => e.AgentName)); + Assert.Equal([0, 1, 2, 3], result.Steps.Select(e => e.Index)); + Assert.Equal(4, result.Usage.Iterations); + } +}