Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@

## [Unreleased]

### Phase 7 — crew specialist sub-agents + prompts (AI-041) (2026-06-15)

The four generic, **single-call** crew specialists the content crews (AI-042/043) compose via `CrewTasks.Of` + `CrewOrchestrator` (AI-040). Each is exactly ONE `ILlmService` gateway call — no tools, no `AgentLoop`, no iteration — and is domain-agnostic (no SEO/AutoPublish specifics): they operate on a shared `ContentBrief` (length in CHARACTERS, banned phrases, target language, optional style guide). **Why these four, in this order**: a researcher condenses the source into grounded bullet FACTS; a drafter writes the field strictly from those notes; a critic scores the draft 1-5 **against the research notes** (not its own knowledge) — every claim not supported by the notes is a factual-accuracy `blocker` — and an editor rewrites fixing each issue blockers-first. Grounding the critic on the research notes is the crux: it turns "does this sound plausible?" into "is this actually in the source?", which is what catches hallucinations the drafter slipped in.

- **`Application/Agents/CrewAgentContracts.cs`** (new) — the records threaded through a crew: `ContentBrief`, `ResearchInput`/`ResearchNotes`, `DraftInput`/`Draft`, `CritiqueInput`/`EditInput`, `CritiqueResult` (four 1-5 scores + `Issues` + `ParseFailed`), `CritiqueIssue` (severity `blocker|major|minor`). All read the same brief so "write to N chars" and "score length against N" can never drift.
- **`Application/Agents/SingleCallAgent.cs`** (new, abstract) — base for a one-gateway-call `IAgent<TIn,TOut>`: subclasses own only `FeatureTag` (routing), `MaxOutputTokens`, `BuildPrompt`, `Parse`; the base does the request/timing/step/usage plumbing so each specialist is ~15 lines. Produces the same shape the orchestrator schedules — one `"llm_response"` `AgentStep` + `AgentUsage(Iterations: 1, …)` mapped from the gateway's `LlmUsage`.
- **`ResearcherAgent` / `DrafterAgent` / `CriticAgent` / `EditorAgent`** (new) — feature tags `crew.researcher` / `crew.drafter` / `crew.critic` / `crew.editor`; token budgets 600/500/700/500. Researcher/drafter/editor parse = trimmed text; critic parses via `CriticOutputParser`.
- **`Application/Agents/CriticOutputParser.cs`** (new, pure) — strips ```` ```json ````/```` ``` ```` fences, slices first `{` to last `}`, deserializes case-insensitively into a private DTO matching the prompt's schema, clamps each score to [1,5], coerces unknown/blank severity → `minor`, drops issues with no fix. **Fail-closed**: any exception / empty / no-brace → `CritiqueResult(1,1,1,1, [blocker "unparseable"], ParseFailed: true)` and NEVER throws — an unreadable critic must read as "reject", never as a silent clean pass.
- **`Application/Agents/Prompts/`** (new dir) — `ResearcherPrompt`/`DrafterPrompt`/`CriticPrompt`/`EditorPrompt` (pure `BuildSystemPrompt`/`BuildUserPrompt`, mirroring `ExplainPrompt`) + an internal `BriefConstraints` so drafter/critic/editor render length + banned phrases IDENTICALLY. `CriticPrompt` inlines the literal JSON schema it shares with the parser and demands a bare JSON object only.
- **DI** — the four registered as singletons next to `StudyBuddyAgent` in `Api/Program.cs` (stateless, take the singleton `ILlmService`).
- Tests: `CrewSpecialistsTests` (23, fake `ILlmService`, no key/network) — each agent maps its canned response to the typed output with one `llm_response` step + `Iterations==1` + usage from `LlmUsage`; the `CriticOutputParser` battery (well-formed, fenced, trailing prose, score clamp 0→1 / 9→5, unknown severity → minor, missing-fix dropped, garbage/empty/no-brace fail-closed, never-throws sweep); prompt builders surface length range / banned phrase / language / the critic schema tokens (`factual_accuracy`, `severity`, `blocker`); and a crew integration smoke test wiring all four via `CrewTasks.Of` into a 4-stage `CrewPlan` run through the real `CrewOrchestrator` — asserts state threads research→draft→critique→edit and the transcript has 4 `CrewStepEntry`s in declaration order (proves the AI-040 contract; no stray `ITool` added).

### Phase 7 — CrewOrchestrator primitive (AI-040) (2026-06-15)

Phase 7 opens with the **generic multi-agent orchestration engine** — the crew-level analogue of `AgentLoop` (AI-034). Engine-only: no concrete crews, no specialist agents, no SEO/AutoPublish wiring, no endpoint (those are AI-041+). Like `AgentLoop` shipped before `StudyBuddyAgent`, the primitive lands first and migrates callers later. **Reuses Phase 6 seams**: no new tables, no new persistence interface — a crew run persists through the same `IAgentRunWriter`/`agent_run` path as a single agent.
Expand Down
6 changes: 6 additions & 0 deletions backend/src/Api/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,12 @@
// Agent loop engine (Phase 6, AI-034). Concrete agents (StudyBuddy, AI-035) build on it.
TextStack.Ai.Agents.ServiceCollectionExtensions.AddAiAgents(builder.Services);
builder.Services.AddScoped<Application.Agents.StudyBuddyAgent>();
// Crew specialists (Phase 7, AI-041): single-call IAgent<TIn,TOut> sub-agents the content crews
// (AI-042/043) compose via CrewTasks.Of. Stateless + ILlmService is a singleton, so singleton is fine.
builder.Services.AddSingleton<Application.Agents.ResearcherAgent>();
builder.Services.AddSingleton<Application.Agents.DrafterAgent>();
builder.Services.AddSingleton<Application.Agents.CriticAgent>();
builder.Services.AddSingleton<Application.Agents.EditorAgent>();
builder.Services.AddAuthSettings(builder.Configuration);

var connectionString = builder.Configuration.GetConnectionString("Default")
Expand Down
50 changes: 50 additions & 0 deletions backend/src/Application/Agents/CrewAgentContracts.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
namespace Application.Agents;

/// <summary>
/// The shared writing assignment threaded through a content crew (Phase 7, AI-041). Every specialist reads
/// the SAME brief so the constraints (length, banned phrases, language, style) are enforced identically by
/// drafter, critic and editor — there is no drift between "write to N chars" and "score length against N".
/// Lengths are in CHARACTERS (SEO fields are character-bounded, not token-bounded).
/// </summary>
public record ContentBrief(
string EntityType,
string FieldName,
int MinLength,
int MaxLength,
IReadOnlyList<string> BannedPhrases,
string TargetLanguage,
string? StyleGuide);

/// <summary>Input to the researcher: the brief plus the raw source material to condense into neutral facts.</summary>
public record ResearchInput(ContentBrief Brief, string SourceMaterial);

/// <summary>The researcher's output: bullet FACTS grounded entirely in the source, ready for the drafter.</summary>
public record ResearchNotes(string Notes);

/// <summary>Input to the drafter: the brief plus the research notes it must write strictly from.</summary>
public record DraftInput(ContentBrief Brief, ResearchNotes Research);

/// <summary>A produced draft of the requested field.</summary>
public record Draft(string Text);

/// <summary>Input to the critic: the brief, the draft to score, and the research notes to ground it against.</summary>
public record CritiqueInput(ContentBrief Brief, Draft Draft, ResearchNotes Research);

/// <summary>Input to the editor: the brief, the draft to revise, and the critique listing what to fix.</summary>
public record EditInput(ContentBrief Brief, Draft Draft, CritiqueResult Critique);

/// <summary>
/// The critic's structured verdict: 1-5 scores on each axis plus an issue list. <see cref="ParseFailed"/> is
/// true when the critic's output could not be parsed (fail-closed all-1s) — callers treat that as "reject,
/// needs work", never as a clean pass.
/// </summary>
public record CritiqueResult(
int FactualAccuracy,
int Tone,
int Length,
int BannedPhrases,
IReadOnlyList<CritiqueIssue> Issues,
bool ParseFailed);

/// <summary>One actionable defect the critic found. <c>Severity</c> is one of "blocker", "major", "minor".</summary>
public record CritiqueIssue(string Location, string Severity, string Fix);
21 changes: 21 additions & 0 deletions backend/src/Application/Agents/CriticAgent.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
using Application.Agents.Prompts;
using TextStack.Ai.Core;

namespace Application.Agents;

/// <summary>
/// Crew critic (AI-041): one gateway call that scores the draft 1-5 on four axes against the research notes
/// and lists actionable issues. Grounds factual accuracy on the notes (unsupported claims = blockers) and
/// fails closed if its JSON can't be parsed — see <see cref="CriticOutputParser"/>.
/// </summary>
public sealed class CriticAgent(ILlmService llm) : SingleCallAgent<CritiqueInput, CritiqueResult>(llm)
{
protected override string FeatureTag => "crew.critic";
protected override int MaxOutputTokens => 700;

protected override (string system, string user) BuildPrompt(CritiqueInput input) =>
(CriticPrompt.BuildSystemPrompt(input.Brief),
CriticPrompt.BuildUserPrompt(input.Draft, input.Research));

protected override CritiqueResult Parse(string text, CritiqueInput input) => CriticOutputParser.Parse(text);
}
153 changes: 153 additions & 0 deletions backend/src/Application/Agents/CriticOutputParser.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
using System.Text.Json;
using System.Text.Json.Serialization;

namespace Application.Agents;

/// <summary>
/// Parses the critic's JSON verdict into a typed <see cref="CritiqueResult"/> (AI-041). LLMs leak fences and
/// prose around JSON, so we strip code fences and extract the first top-level balanced <c>{...}</c> object via a
/// string-aware brace scan before deserializing leniently. Scores are clamped to [1,5]; a blank severity stays
/// "minor" (absent = not asserted) while a non-empty-but-unrecognized severity coerces to "major" (fail-closed
/// direction — never silently downgrade an unknown severity below the blocker threshold); issues without a fix
/// are dropped. Critically it is FAIL-CLOSED: any failure (empty, no brace, malformed, exception) yields the
/// worst-possible verdict with <c>ParseFailed: true</c>, never an exception and never a silent clean pass — an
/// unparseable critic must read as "reject", so a hallucinating drafter can't sneak past on a critic that
/// merely failed to format its output.
/// </summary>
public static class CriticOutputParser
{
private static readonly JsonSerializerOptions Options = new()
{
PropertyNameCaseInsensitive = true,
NumberHandling = JsonNumberHandling.AllowReadingFromString,
};

private static readonly CritiqueResult FailClosed = new(
1, 1, 1, 1,
[new CritiqueIssue("output", "blocker", "Critic output was unparseable.")],
ParseFailed: true);

public static CritiqueResult Parse(string llmText)
{
try
{
var json = ExtractJson(llmText);
if (json is null)
return FailClosed;

var dto = JsonSerializer.Deserialize<CriticDto>(json, Options);
if (dto?.Scores is null)
return FailClosed;

var issues = (dto.Issues ?? [])
.Where(i => !string.IsNullOrWhiteSpace(i.Fix))
.Select(i => new CritiqueIssue(
i.Location ?? "draft",
NormalizeSeverity(i.Severity),
i.Fix!.Trim()))
.ToList();

return new CritiqueResult(
Clamp(dto.Scores.FactualAccuracy),
Clamp(dto.Scores.Tone),
Clamp(dto.Scores.Length),
Clamp(dto.Scores.BannedPhrases),
issues,
ParseFailed: false);
}
catch
{
// Never throw: a critic we cannot read must fail closed, not crash the crew.
return FailClosed;
}
}

/// <summary>
/// Strips ```` ``` ```` / ```` ```json ```` fences and returns the first top-level balanced <c>{...}</c>.
/// A string-aware brace scan from the first <c>{</c>: braces inside JSON string literals are ignored (an
/// in-string flag toggles on each unescaped <c>"</c>, honoring <c>\</c> so <c>\"</c> doesn't toggle), so the
/// scan stops at the object's own closing brace. This makes "valid object then arbitrary prose" parse
/// cleanly and extracts object-0 from <c>[{...},{...}]</c> without splicing across elements. Returns null if
/// there is no <c>{</c> or the object never balances — caller fail-closes.
/// </summary>
private static string? ExtractJson(string text)
{
if (string.IsNullOrWhiteSpace(text))
return null;

var cleaned = text.Replace("```json", string.Empty, StringComparison.OrdinalIgnoreCase)
.Replace("```", string.Empty);

var start = cleaned.IndexOf('{');
if (start < 0)
return null;

var depth = 0;
var inString = false;
var escaped = false;

for (var i = start; i < cleaned.Length; i++)
{
var c = cleaned[i];

if (inString)
{
if (escaped)
escaped = false;
else if (c == '\\')
escaped = true;
else if (c == '"')
inString = false;
continue;
}

switch (c)
{
case '"':
inString = true;
break;
case '{':
depth++;
break;
case '}':
depth--;
if (depth == 0)
return cleaned[start..(i + 1)];
break;
}
}

return null; // never balanced → fail closed
}

private static int Clamp(int score) => Math.Clamp(score, 1, 5);

private static string NormalizeSeverity(string? severity)
{
var s = severity?.Trim().ToLowerInvariant();
if (string.IsNullOrEmpty(s))
return "minor"; // absent = not asserted, don't manufacture severity
return s is "blocker" or "major" or "minor" ? s : "major"; // unknown-but-stated → fail-closed direction
}

private sealed class CriticDto
{
[JsonPropertyName("scores")] public ScoresDto? Scores { get; set; }
[JsonPropertyName("issues")] public List<IssueDto>? Issues { get; set; }
}

private sealed class ScoresDto
{
[JsonPropertyName("factual_accuracy")] public int FactualAccuracy { get; set; }
[JsonPropertyName("tone")] public int Tone { get; set; }
[JsonPropertyName("length")] public int Length { get; set; }
[JsonPropertyName("banned_phrases")] public int BannedPhrases { get; set; }
}

private sealed class IssueDto
{
[JsonPropertyName("location")] public string? Location { get; set; }
[JsonPropertyName("severity")] public string? Severity { get; set; }
[JsonPropertyName("fix")] public string? Fix { get; set; }
}
}
20 changes: 20 additions & 0 deletions backend/src/Application/Agents/DrafterAgent.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
using Application.Agents.Prompts;
using TextStack.Ai.Core;

namespace Application.Agents;

/// <summary>
/// Crew drafter (AI-041): one gateway call that writes the requested field strictly from the research notes,
/// within the brief's character bounds and language. Stays groundable — the critic scores it against the notes.
/// </summary>
public sealed class DrafterAgent(ILlmService llm) : SingleCallAgent<DraftInput, Draft>(llm)
{
protected override string FeatureTag => "crew.drafter";
protected override int MaxOutputTokens => 500;

protected override (string system, string user) BuildPrompt(DraftInput input) =>
(DrafterPrompt.BuildSystemPrompt(input.Brief),
DrafterPrompt.BuildUserPrompt(input.Research));

protected override Draft Parse(string text, DraftInput input) => new(text.Trim());
}
20 changes: 20 additions & 0 deletions backend/src/Application/Agents/EditorAgent.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
using Application.Agents.Prompts;
using TextStack.Ai.Core;

namespace Application.Agents;

/// <summary>
/// Crew editor (AI-041): one gateway call that rewrites the draft fixing each critique issue (blockers first),
/// keeping the supported facts and staying within the brief's character bounds. Last link in the content crew.
/// </summary>
public sealed class EditorAgent(ILlmService llm) : SingleCallAgent<EditInput, Draft>(llm)
{
protected override string FeatureTag => "crew.editor";
protected override int MaxOutputTokens => 500;

protected override (string system, string user) BuildPrompt(EditInput input) =>
(EditorPrompt.BuildSystemPrompt(input.Brief),
EditorPrompt.BuildUserPrompt(input.Draft, input.Critique));

protected override Draft Parse(string text, EditInput input) => new(text.Trim());
}
18 changes: 18 additions & 0 deletions backend/src/Application/Agents/Prompts/BriefConstraints.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
namespace Application.Agents.Prompts;

/// <summary>
/// Shared, single-source rendering of the brief's length + banned-phrase constraints so the drafter, critic
/// and editor read them IDENTICALLY (AI-041). If the drafter is told "120-300 characters" and the critic
/// scores length against a different phrasing, length critique drifts — so all three call through here.
/// </summary>
internal static class BriefConstraints
{
public static string Length(ContentBrief brief) =>
$"{brief.MinLength}-{brief.MaxLength} characters";

/// <summary>Comma-separated banned phrases, or null when the brief bans none.</summary>
public static string? BannedPhrases(ContentBrief brief) =>
brief.BannedPhrases.Count == 0
? null
: string.Join(", ", brief.BannedPhrases.Select(p => $"\"{p}\""));
}
44 changes: 44 additions & 0 deletions backend/src/Application/Agents/Prompts/CriticPrompt.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
namespace Application.Agents.Prompts;

/// <summary>
/// The crew critic's prompt (AI-041) — the most load-bearing prompt in the crew. It scores the draft 1-5 on
/// four axes AGAINST THE RESEARCH NOTES (not the model's own knowledge): any draft claim not supported by the
/// notes is a factual-accuracy <c>blocker</c>. That research-grounding is the whole point — it turns "does
/// this sound plausible?" into "is this actually in the source?", which is what makes the critic catch
/// hallucinations the drafter slipped in. The output MUST be a bare JSON object matching the exact schema
/// <see cref="CriticOutputParser"/> deserializes; the parser fails closed on anything else. Pure strings.
/// </summary>
public static class CriticPrompt
{
/// <summary>The literal schema the critic must emit and <see cref="CriticOutputParser"/> parses — kept in one place.</summary>
public const string Schema =
"{\"scores\":{\"factual_accuracy\":1-5,\"tone\":1-5,\"length\":1-5,\"banned_phrases\":1-5}," +
"\"issues\":[{\"location\":\"…\",\"severity\":\"blocker|major|minor\",\"fix\":\"…\"}]}";

public static string BuildSystemPrompt(ContentBrief brief)
{
var prompt =
$"You are an editor reviewing a draft {brief.FieldName} for a {brief.EntityType}. " +
"Score the draft 1-5 (1 worst, 5 best) on each of four axes:\n" +
"- factual_accuracy: every claim in the draft must be supported BY THE RESEARCH NOTES. " +
"Flag EVERY claim that is not supported by the notes as a \"blocker\" issue and lower this score.\n" +
$"- tone: does it match the requested style{(string.IsNullOrWhiteSpace(brief.StyleGuide) ? string.Empty : $" ({brief.StyleGuide!.Trim()})")} " +
$"and read naturally in {brief.TargetLanguage}?\n" +
$"- length: is the draft within {BriefConstraints.Length(brief)}?\n" +
"- banned_phrases: does the draft avoid the banned phrases?";

if (BriefConstraints.BannedPhrases(brief) is { } banned)
prompt += $" Banned phrases: {banned}.";

prompt +=
"\nList every concrete problem as an issue with a location, a severity " +
"(\"blocker\", \"major\", or \"minor\"), and a specific fix.\n" +
"Output a bare JSON object ONLY — no markdown, no code fences, no commentary before or after it — " +
"with EXACTLY this schema:\n" + Schema;

return prompt;
}

public static string BuildUserPrompt(Draft draft, ResearchNotes research) =>
$"Research notes:\n{research.Notes}\n\nDraft to review:\n{draft.Text}";
}
Loading
Loading