From f21c6383f23f7aa438a91a46eb359d018ae2879f Mon Sep 17 00:00:00 2001 From: Vasyl Vdovychenko Date: Mon, 15 Jun 2026 18:35:35 -0400 Subject: [PATCH] =?UTF-8?q?feat(ai):=20AutoPublishCrew=20=E2=80=94=20in-pr?= =?UTF-8?q?ocess=20SEO=20generation=20crew=20(AI-042)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Admin-triggered crew path that runs the AI-041 specialists (researcher→ drafter→critic→editor) over ILlmService to generate Edition SEO prose (Description + SeoRelevanceText). Per-field 4-stage CrewPlan, persists each as crew.autopublish agent_run, fail-closed review gate, writes the Edition only when BOTH fields pass clean. Edition stays Draft (no auto-publish). Legacy bash+Claude-CLI poller untouched and default. New in-process path rather than swapping the poller: legacy gen is out-of-process (Claude CLI, Max sub, $0) while the crew is metered nano — keep both, A/B in AI-046 before any cutover. Data-loss guards (per QA): - gate flags NeedsReview on empty/whitespace OR below-MinLength editor output — never blanks a real field - honors SeoSource.Manual: a hand-written Description/Relevance is never overwritten (returns manualProtected), matching legacy backfill contract CostCapUsd 0.02/field; autopublish.crew rate limit (4/min). 22 unit tests (fake ILlmService + recording IAgentRunWriter), no DB/network. Co-Authored-By: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 10 + backend/src/Api/Api.csproj | 1 + .../Endpoints/AdminAutoPublishEndpoints.cs | 138 ++++++ backend/src/Api/Program.cs | 16 + .../Application/Agents/AutoPublishBriefs.cs | 46 ++ .../src/Application/Agents/AutoPublishCrew.cs | 139 ++++++ .../AutoPublishCrewTests.cs | 408 ++++++++++++++++++ 7 files changed, 758 insertions(+) create mode 100644 backend/src/Application/Agents/AutoPublishBriefs.cs create mode 100644 backend/src/Application/Agents/AutoPublishCrew.cs create mode 100644 tests/TextStack.UnitTests/AutoPublishCrewTests.cs diff --git a/CHANGELOG.md b/CHANGELOG.md index 0e0eda7f..e26ce02c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,16 @@ ## [Unreleased] +### Phase 7 — in-process AutoPublishCrew admin path (AI-042) (2026-06-15) + +An admin-triggered, **in-process** path that runs the AI-041 specialists over `ILlmService` + the AI-040 `CrewOrchestrator` to generate SEO **prose** for an Edition (`Description` + `SeoRelevanceText`). **Why a new path, not a swap**: the legacy bash + Claude-CLI systemd poller (`seo-publish-poll.sh`/`seo-generate.sh`) is the production SEO pipeline — it works, it's the default, and it stays **fully intact and untouched**. This is the *observable* alternative: every call routes through the traced gateway (`llm_traces`), persists as an `agent_run` with the full sub-agent transcript (AI-045 replay), and is gated by a fail-closed critic — none of which the opaque CLI poller offers. Shipping it parallel (not as a replacement) lets us prove the crew on real editions with zero risk to the live poller, exactly as `AgentLoop` shipped before any caller migrated. + +- **`Application/Agents/AutoPublishBriefs.cs`** (new) — static factory for the two `ContentBrief`s: `Description("en")` (800-1600 chars) + `Relevance("en")` (500-1000). Shared hardcoded `BannedPhrases` ("masterpiece", "must-read", "timeless classic", "page-turner", "tour de force", "magnum opus") and `StyleGuide` ("Factual, encyclopedic tone; no subjective superlatives; third person.") — encodes the legacy "no subjective superlatives" rule as a list the critic can actually score against. Admin-editable later. +- **`Application/Agents/AutoPublishCrew.cs`** (new, scoped) — DB-free service: builds a 4-stage sequential `CrewPlan` (researcher → drafter → critic → editor via `CrewTasks.Of`), runs it under `CrewOptions(CostCapUsd: 0.02m, MaxParallelism: 1)`, persists via `CrewRunRecordFactory` → `IAgentRunWriter` (agent `crew.autopublish`, goal `edition.{field}`), and returns an `AutoPublishFieldResult` (edited text + critique + fail-closed `NeedsReview` + status + runId). **Writes nothing to the Edition and never publishes** — the caller owns apply/publish. The `NeedsReview` gate is a pure, unit-tested static: `true` unless the crew COMPLETED **and** the editor's text clears the brief's `MinLength` floor **and** the critic produced a parseable verdict **and** that verdict raised no `blocker` (so an error/budget halt, **empty/whitespace/below-floor edited text**, a missing/unparseable critic, or any blocker all fail closed). The **empty-output floor** (`NeedsReview(status, critique, editedText, minLength)`) closes the P1 data-loss hole where an editor returning `""` with a clean critic would otherwise read as a clean pass and let the endpoint overwrite a real field with an empty string. Per-field cost cap is a `const decimal CostCapUsd = 0.02m`. +- **`Api/Endpoints/AdminAutoPublishEndpoints.cs`** — new `POST /admin/autopublish/editions/{editionId}/crew-generate` (same `/admin/*` auth group, new `autopublish.crew` rate-limit policy). Loads the same source material `seo-generate.sh` feeds Claude (title, author(s), language, `LEFT(plain_text, 1000)` of the first chapter), runs the crew TWICE (Description + Relevance, separate runIds). **Gate**: if EITHER field needs review → writes nothing, returns `{ needsReview: true, runIds, fields }` with per-field critique-score summaries; if BOTH clean → writes `Description` + `SeoRelevanceText`, sets `SeoSource = Auto`, saves. **Manual-source protection (P2)**: before the write block, `IsManualProtected(edition.SeoSource, Description, SeoRelevanceText)` (pure helper) blocks the write entirely when the edition is `SeoSource.Manual` **and** either targeted field already holds hand-written content — the response carries `manualProtected: true` so the admin sees why, the Edition (and its `SeoSource`) stays untouched, and only the crew transcripts persist (audit). Honors the same "Manual flag protects filled content from overwrite" contract as the legacy `SeoCoverageAnalyzer`. Empty Manual fields are still fair game for first-time generation. Edition stays `Draft` regardless. Response carries both runIds for the AI-045 transcript UI. +- **DI / config** — `AutoPublishCrew` registered scoped (it persists via the scoped `IAgentRunWriter`) next to the specialists in `Program.cs`; `autopublish.crew` rate-limit policy (4/min per IP — a generate is 8 LLM calls) mirrors the `studybuddy` policy shape. **No DB migration** — reuses the Phase 6 `agent_run` table and the existing Edition columns. +- Tests: `AutoPublishCrewTests` + `AutoPublishManualProtectionTests` (fake `ILlmService` routed per `FeatureTag` + recording `IAgentRunWriter`, no network/DB) — clean critic → not flagged + edited text; critic `blocker` → flagged; garbage critic → parser fail-closed → flagged; persists exactly once as `crew.autopublish` with editionId + 4 nested sub-agent steps; per-field cost cap → `budget_exhausted` + flagged + partial run persisted (only the research stage ran); plus the `NeedsReview` gate exercised directly across completed/halted/null/parse-failed/blocker/minor-major cases. **P1 floor**: empty / whitespace-only / below-`MinLength` editor output → flagged (was a pinned `_BUG` regression, now flipped); at/above-floor + clean critic → not flagged. **P2 manual-protect**: `IsManualProtected` pure helper — `Manual` + filled Description or Relevance → blocked; `Manual` + empty fields → allowed; `Auto`/`Hybrid` with content → allowed. No `ITool` introduced — the StudyBuddy tool set-equality test is unaffected. + ### Phase 7 — crew specialist sub-agents + prompts (AI-041) (2026-06-15) The four generic, **single-call** crew specialists the content crews (AI-042/043) compose via `CrewTasks.Of` + `CrewOrchestrator` (AI-040). Each is exactly ONE `ILlmService` gateway call — no tools, no `AgentLoop`, no iteration — and is domain-agnostic (no SEO/AutoPublish specifics): they operate on a shared `ContentBrief` (length in CHARACTERS, banned phrases, target language, optional style guide). **Why these four, in this order**: a researcher condenses the source into grounded bullet FACTS; a drafter writes the field strictly from those notes; a critic scores the draft 1-5 **against the research notes** (not its own knowledge) — every claim not supported by the notes is a factual-accuracy `blocker` — and an editor rewrites fixing each issue blockers-first. Grounding the critic on the research notes is the crux: it turns "does this sound plausible?" into "is this actually in the source?", which is what catches hallucinations the drafter slipped in. diff --git a/backend/src/Api/Api.csproj b/backend/src/Api/Api.csproj index d9c6b7a8..fe03bd14 100644 --- a/backend/src/Api/Api.csproj +++ b/backend/src/Api/Api.csproj @@ -1,6 +1,7 @@ + diff --git a/backend/src/Api/Endpoints/AdminAutoPublishEndpoints.cs b/backend/src/Api/Endpoints/AdminAutoPublishEndpoints.cs index d289ba25..5b9d1705 100644 --- a/backend/src/Api/Endpoints/AdminAutoPublishEndpoints.cs +++ b/backend/src/Api/Endpoints/AdminAutoPublishEndpoints.cs @@ -1,9 +1,12 @@ +using Api.Middleware; using Application.AdminSettings; +using Application.Agents; using Application.Common.Interfaces; using Domain.Entities; using Domain.Enums; using Microsoft.AspNetCore.Mvc; using Microsoft.EntityFrameworkCore; +using TextStack.Ai.Core; namespace Api.Endpoints; @@ -25,6 +28,11 @@ public static void MapAdminAutoPublishEndpoints(this WebApplication app) group.MapPost("/trigger", Trigger); group.MapPost("/queue/{editionId:guid}", QueueEdition); group.MapGet("/candidates", GetCandidates); + + // AI-042: in-process crew path — runs the AI-041 specialists over ILlmService to generate SEO prose. + // Behind the same /admin/* auth group; rate-limited (each call = two 4-stage crews = 8 LLM calls). + group.MapPost("/editions/{editionId:guid}/crew-generate", CrewGenerate) + .RequireRateLimiting("autopublish.crew"); } private static async Task GetSettings(AdminSettingsService settings, CancellationToken ct) @@ -233,8 +241,138 @@ private static async Task GetCandidates( return Results.Ok(candidates); } + + /// + /// AI-042 — runs the in-process content crew over an Edition to generate its two SEO prose fields + /// (Description + SeoRelevanceText). Loads the same source material the legacy + /// seo-generate.sh uses (title, author(s), language, first-chapter excerpt), runs the crew once per + /// field (own runId, own cost cap, own persisted agent_run), and applies a fail-closed review gate: only if + /// BOTH fields pass cleanly does it write them and set SeoSource = Auto. The Edition stays Draft + /// either way — publishing remains the existing separate flow. No auto-publish, ever. + /// + private static async Task CrewGenerate( + Guid editionId, + HttpContext httpContext, + IAppDbContext db, + AutoPublishCrew crew, + CancellationToken ct) + { + var adminId = httpContext.GetAdminUserId(); + + var edition = await db.Editions + .Include(e => e.EditionAuthors) + .ThenInclude(ea => ea.Author) + .FirstOrDefaultAsync(e => e.Id == editionId, ct); + if (edition == null) return Results.NotFound(); + + // First-chapter excerpt — mirror seo-generate.sh: LEFT(plain_text, 1000) of the lowest chapter_number. + var excerpt = await db.Chapters + .Where(c => c.EditionId == editionId) + .OrderBy(c => c.ChapterNumber) + .Select(c => c.PlainText) + .FirstOrDefaultAsync(ct); + + var authors = string.Join(", ", edition.EditionAuthors + .Select(ea => ea.Author?.Name) + .Where(n => !string.IsNullOrWhiteSpace(n))); + var lang = edition.Language; + + var sourceMaterial = BuildSourceMaterial(edition.Title, authors, lang, excerpt); + + // One crew per field — separate runIds so AI-045's transcript UI can fetch each independently. + var descCtx = new AgentContext(adminId, editionId, Guid.NewGuid(), httpContext.RequestServices); + var descResult = await crew.RunFieldAsync(AutoPublishBriefs.Description(lang), sourceMaterial, descCtx, ct); + + var relCtx = new AgentContext(adminId, editionId, Guid.NewGuid(), httpContext.RequestServices); + var relResult = await crew.RunFieldAsync(AutoPublishBriefs.Relevance(lang), sourceMaterial, relCtx, ct); + + var runIds = new[] { descResult.RunId, relResult.RunId }; + var fields = new[] + { + FieldSummary("description", descResult), + FieldSummary("relevance", relResult), + }; + + // Manual-source protection (AI-042 P2): a Manual edition with hand-written prose in either targeted field + // is never overwritten by the crew — same contract legacy SEO backfill honors (SeoCoverageAnalyzer). The + // crew transcripts are still persisted (audit), only the write-back is blocked. + if (IsManualProtected(edition.SeoSource, edition.Description, edition.SeoRelevanceText)) + return Results.Ok(new CrewGenerateResponse(true, runIds, fields, ManualProtected: true)); + + // Fail-closed gate: if EITHER field needs review, write NOTHING — the admin inspects both transcripts. + var needsReview = descResult.NeedsReview || relResult.NeedsReview; + if (needsReview) + return Results.Ok(new CrewGenerateResponse(true, runIds, fields)); + + // Both clean → apply the prose and mark provenance Auto. Edition stays Draft regardless. + edition.Description = descResult.EditedText; + edition.SeoRelevanceText = relResult.EditedText; + edition.SeoSource = SeoSource.Auto; + edition.UpdatedAt = DateTimeOffset.UtcNow; + await db.SaveChangesAsync(ct); + + return Results.Ok(new CrewGenerateResponse(false, runIds, fields)); + } + + /// + /// The manual-source write-block decision (AI-042 P2), pure for unit testing. Returns true when the edition is + /// AND either targeted prose field already holds hand-written content — in which + /// case the crew must NOT overwrite it. Empty Manual fields are still fair game for first-time generation. + /// + internal static bool IsManualProtected(SeoSource source, string? description, string? relevanceText) => + source == SeoSource.Manual && + (!string.IsNullOrWhiteSpace(description) || !string.IsNullOrWhiteSpace(relevanceText)); + + /// The source block the crew reasons from — same fields seo-generate.sh feeds Claude. + private static string BuildSourceMaterial(string title, string author, string lang, string? excerpt) => + $""" + Book: {title} + Author: {(string.IsNullOrWhiteSpace(author) ? "Unknown" : author)} + Language: {lang} + First chapter excerpt: {Excerpt(excerpt)} + """; + + private static string Excerpt(string? plainText) + { + if (string.IsNullOrWhiteSpace(plainText)) return "(none)"; + return plainText.Length <= 1000 ? plainText : plainText[..1000]; + } + + private static CrewFieldSummary FieldSummary(string field, AutoPublishFieldResult r) => + new( + field, + r.RunId, + r.Status, + r.NeedsReview, + r.EditedText?.Length ?? 0, + r.Critique is { } c + ? new CrewCritiqueSummary(c.FactualAccuracy, c.Tone, c.Length, c.BannedPhrases, c.ParseFailed, + c.Issues.Count(i => i.Severity == "blocker")) + : null); } +public record CrewGenerateResponse( + bool NeedsReview, + IReadOnlyList RunIds, + IReadOnlyList Fields, + bool ManualProtected = false); + +public record CrewFieldSummary( + string Field, + Guid RunId, + string Status, + bool NeedsReview, + int CharLength, + CrewCritiqueSummary? Critique); + +public record CrewCritiqueSummary( + int FactualAccuracy, + int Tone, + int Length, + int BannedPhrases, + bool ParseFailed, + int BlockerCount); + public record AutoPublishSettingsDto( bool Enabled, int BooksPerDay, int HourUtc, bool RequireReview, string Language); diff --git a/backend/src/Api/Program.cs b/backend/src/Api/Program.cs index 55b952c9..a3d2a3fb 100644 --- a/backend/src/Api/Program.cs +++ b/backend/src/Api/Program.cs @@ -95,6 +95,10 @@ builder.Services.AddSingleton(); builder.Services.AddSingleton(); builder.Services.AddSingleton(); +// AutoPublish crew (Phase 7, AI-042): in-process admin path that runs the specialists over ILlmService to +// generate SEO prose for an Edition. Scoped because it persists via the scoped IAgentRunWriter (per-request +// DbContext). The legacy bash + Claude-CLI poller stays the default; this is the observable, traced alternative. +builder.Services.AddScoped(); builder.Services.AddAuthSettings(builder.Configuration); var connectionString = builder.Configuration.GetConnectionString("Default") @@ -320,6 +324,18 @@ QueueLimit = 0, }); }); + // AutoPublish crew (AI-042): an admin generate is TWO 4-stage crews = 8 LLM calls, so a tight per-IP cap. + // Mirrors the studybuddy policy shape; it sits behind admin auth too, this is just runaway protection. + options.AddPolicy("autopublish.crew", httpContext => + { + var ip = httpContext.Connection.RemoteIpAddress?.ToString() ?? "unknown"; + return RateLimitPartition.GetFixedWindowLimiter(ip, _ => new FixedWindowRateLimiterOptions + { + Window = TimeSpan.FromMinutes(1), + PermitLimit = 4, + QueueLimit = 0, + }); + }); options.RejectionStatusCode = StatusCodes.Status429TooManyRequests; // Emit Retry-After so clients can back off intelligently instead of // hammering in a tight retry loop. RateLimiter exposes the metadata diff --git a/backend/src/Application/Agents/AutoPublishBriefs.cs b/backend/src/Application/Agents/AutoPublishBriefs.cs new file mode 100644 index 00000000..b2296b57 --- /dev/null +++ b/backend/src/Application/Agents/AutoPublishBriefs.cs @@ -0,0 +1,46 @@ +namespace Application.Agents; + +/// +/// The hardcoded writing assignments for the in-process AutoPublish crew (Phase 7, AI-042). Mirrors the +/// "factual, encyclopedic tone; no subjective superlatives" rules the legacy seo-generate.sh hands +/// Claude, but as typed s the AI-041 specialists all read identically. Prose-only: +/// the crew owns just Edition.Description + Edition.SeoRelevanceText; themes/FAQs stay legacy. +/// +/// Banned phrases and the style guide are constants for now (admin-editable later, AI-04x) — keep them in one +/// place so drafter, critic and editor enforce the exact same list. Bounds are in CHARACTERS. +/// +public static class AutoPublishBriefs +{ + private const string Entity = "edition"; + + /// Description: 800-1600 chars (~150-250 words), the plot summary + literary significance block. + public const int DescriptionMin = 800; + public const int DescriptionMax = 1600; + + /// Relevance: 500-1000 chars (~100-150 words), modern connections + why readers today should care. + public const int RelevanceMin = 500; + public const int RelevanceMax = 1000; + + /// + /// Subjective superlatives the SEO prose must avoid — encodes the legacy "no subjective superlatives" rule + /// as an explicit list the critic can score against. The critic blocks a draft that uses any of these. + /// + public static readonly IReadOnlyList BannedPhrases = + [ + "masterpiece", + "must-read", + "timeless classic", + "page-turner", + "tour de force", + "magnum opus", + ]; + + /// The shared tone contract — same string every specialist reads, so there is no style drift. + public const string StyleGuide = "Factual, encyclopedic tone; no subjective superlatives; third person."; + + public static ContentBrief Description(string lang) => + new(Entity, "description", DescriptionMin, DescriptionMax, BannedPhrases, lang, StyleGuide); + + public static ContentBrief Relevance(string lang) => + new(Entity, "relevance", RelevanceMin, RelevanceMax, BannedPhrases, lang, StyleGuide); +} diff --git a/backend/src/Application/Agents/AutoPublishCrew.cs b/backend/src/Application/Agents/AutoPublishCrew.cs new file mode 100644 index 00000000..efaa6863 --- /dev/null +++ b/backend/src/Application/Agents/AutoPublishCrew.cs @@ -0,0 +1,139 @@ +using TextStack.Ai.Agents; +using TextStack.Ai.Core; + +namespace Application.Agents; + +/// +/// The in-process AutoPublish content crew (Phase 7, AI-042). Runs the AI-041 specialists +/// (researcher → drafter → critic → editor) over to generate ONE SEO prose field +/// for an Edition, via the generic AI-040 . The endpoint runs it TWICE — once +/// per field (Description, Relevance) — so each field gets its own 4-stage plan, its own cost cap and its own +/// persisted agent_run. +/// +/// DB-FREE by design: the source material is passed in (loaded by the endpoint), so an eval harness can drive +/// this with a fake and a recording — no network, no DB. +/// It NEVER writes the Edition and NEVER publishes: it returns the edited text + a fail-closed review verdict, +/// and the caller decides whether to apply it. The legacy bash + Claude-CLI poller stays the default path. +/// +public sealed class AutoPublishCrew( + CrewOrchestrator orchestrator, + ResearcherAgent researcher, + DrafterAgent drafter, + CriticAgent critic, + EditorAgent editor, + IAgentRunWriter runWriter) +{ + /// Per-field cost ceiling. 4 nano calls cost well under this; the cap is a hard runaway guard. + public const decimal CostCapUsd = 0.02m; + + private const string CrewName = "autopublish"; + + /// Shared mutable state threaded through the 4 stages by the orchestrator's deterministic folds. + private sealed class FieldCrewState + { + public required ContentBrief Brief { get; init; } + public required string SourceMaterial { get; init; } + public ResearchNotes? Notes { get; set; } + public Draft? Draft { get; set; } + public CritiqueResult? Critique { get; set; } + public Draft? Edited { get; set; } + } + + /// + /// Generates a single field. Runs the 4-stage crew (cost-capped, sequential), persists the run through the + /// SAME agent_run path as a single agent (agent crew.autopublish), and returns the edited text + /// plus the fail-closed review verdict. Writes nothing to the Edition; the caller owns the apply/publish call. + /// + public async Task RunFieldAsync( + ContentBrief brief, string sourceMaterial, AgentContext ctx, CancellationToken ct) + { + var state = new FieldCrewState { Brief = brief, SourceMaterial = sourceMaterial }; + var plan = BuildPlan(); + + var result = await orchestrator.RunAsync(plan, state, ctx, ct); + + var editedText = state.Edited?.Text; + var needsReview = NeedsReview(result.Status, state.Critique, editedText, brief.MinLength); + + // Persist through the existing AgentRunRecord / IAgentRunWriter path — no new table. The edited text is + // the human-readable summary; the full sub-agent transcript is nested for replay (AI-045 UI). + var record = CrewRunRecordFactory.From( + ctx.AgentRunId, + CrewName, + ctx.UserId, + ctx.EditionId, + $"edition.{brief.FieldName}", + editedText, + result); + await runWriter.WriteAsync(record, ct); + + return new AutoPublishFieldResult(editedText, state.Critique, needsReview, result.Status, ctx.AgentRunId); + } + + /// + /// The review gate, isolated so it is unit-testable as a pure decision. Fail-closed: a draft is only "clean" + /// when the crew COMPLETED, the editor produced text that clears the brief's floor, + /// the critic produced a parseable verdict, and that verdict raised no blockers. Anything else (error/budget + /// halt, empty/whitespace or below-floor edited text, missing critique, unparseable critic, any blocker issue) + /// → needs review, and the caller writes nothing. The length floor stops an empty/short editor output from + /// silently overwriting a real Description / SeoRelevanceText (AI-042 P1). + /// + public static bool NeedsReview(string crewStatus, CritiqueResult? critique, string? editedText, int minLength) + { + if (crewStatus != CrewRunRecordFactory.StatusCompleted) + return true; + // Empty/whitespace or below the brief's hard floor: nothing safe to apply. + if (string.IsNullOrWhiteSpace(editedText) || editedText.Trim().Length < minLength) + return true; + if (critique is null || critique.ParseFailed) + return true; + return critique.Issues.Any(i => i.Severity == "blocker"); + } + + /// The 4-stage sequential plan: each specialist is injected directly (no per-stage DI resolution). + private CrewPlan BuildPlan() => + new(CrewName, + [ + new CrewStage("research", + [ + CrewTasks.Of( + "researcher", researcher, + s => new ResearchInput(s.Brief, s.SourceMaterial), + (s, o) => s.Notes = o), + ]), + new CrewStage("draft", + [ + CrewTasks.Of( + "drafter", drafter, + s => new DraftInput(s.Brief, s.Notes!), + (s, o) => s.Draft = o), + ]), + new CrewStage("critique", + [ + CrewTasks.Of( + "critic", critic, + s => new CritiqueInput(s.Brief, s.Draft!, s.Notes!), + (s, o) => s.Critique = o), + ]), + new CrewStage("edit", + [ + CrewTasks.Of( + "editor", editor, + s => new EditInput(s.Brief, s.Draft!, s.Critique!), + (s, o) => s.Edited = o), + ]), + ], new CrewOptions(CostCapUsd, MaxParallelism: 1)); +} + +/// +/// Outcome of one AutoPublish field run (AI-042). is the editor's final prose (null if +/// the crew never reached the edit stage). is the critic's verdict (null on halt before +/// critique). is the fail-closed gate — true means "do not auto-apply". The DB write +/// and publish decision live in the caller, never here. +/// +public record AutoPublishFieldResult( + string? EditedText, + CritiqueResult? Critique, + bool NeedsReview, + string Status, + Guid RunId); diff --git a/tests/TextStack.UnitTests/AutoPublishCrewTests.cs b/tests/TextStack.UnitTests/AutoPublishCrewTests.cs new file mode 100644 index 00000000..7e0a6af5 --- /dev/null +++ b/tests/TextStack.UnitTests/AutoPublishCrewTests.cs @@ -0,0 +1,408 @@ +using System.Collections.Concurrent; +using Application.Agents; +using Microsoft.Extensions.DependencyInjection; +using TextStack.Ai.Agents; +using TextStack.Ai.Core; + +namespace TextStack.UnitTests; + +/// +/// AI-042 — the in-process , driven end-to-end by a deterministic fake +/// (routed per FeatureTag) and a recording . No network, +/// no DB. Verifies: a clean critic → not flagged + edited text returned; a critic blocker or an unparseable +/// critic → fail-closed review; the run persists once through the agent_run path as crew.autopublish with +/// the sub-agent transcript; and the per-field cost cap halts the crew (budget_exhausted → flagged). The +/// needsReview gate is also unit-tested directly as a pure decision. +/// +public class AutoPublishCrewTests +{ + // ---- Fakes ------------------------------------------------------------------------------------- + + /// + /// Routes a canned completion per crew FeatureTag (crew.researcher/drafter/critic/editor), so one fake drives + /// all four specialists deterministically. An optional cost is attached to every response (for the cap test). + /// + private sealed class FakeLlmService( + string research, string draft, string critic, string editor, decimal costEach = 0.001m) : ILlmService + { + public Task CompleteAsync(LlmRequest request, CancellationToken ct) + { + var text = request.FeatureTag switch + { + "crew.researcher" => research, + "crew.drafter" => draft, + "crew.critic" => critic, + "crew.editor" => editor, + _ => throw new InvalidOperationException($"Unexpected feature tag: {request.FeatureTag}"), + }; + return Task.FromResult(new LlmResponse( + text, [], new LlmUsage(40, 20, costEach), "fake", Guid.NewGuid())); + } + + public IAsyncEnumerable StreamAsync(LlmRequest request, CancellationToken ct) => + throw new NotSupportedException(); + } + + private sealed class RecordingAgentRunWriter : IAgentRunWriter + { + public ConcurrentBag Records { get; } = []; + + public Task WriteAsync(AgentRunRecord run, CancellationToken ct) + { + Records.Add(run); + return Task.CompletedTask; + } + } + + // ---- Canned critic verdicts -------------------------------------------------------------------- + + private const string CleanCritic = + """{"scores":{"factual_accuracy":5,"tone":5,"length":5,"banned_phrases":5},"issues":[]}"""; + + private const string BlockerCritic = + """{"scores":{"factual_accuracy":2,"tone":4,"length":4,"banned_phrases":5},"issues":[{"location":"intro","severity":"blocker","fix":"remove the unsupported claim"}]}"""; + + private const string GarbageCritic = "this is not json at all, sorry"; + + // ---- Helpers ----------------------------------------------------------------------------------- + + private static AutoPublishCrew Build(ILlmService llm, IAgentRunWriter writer) => + new( + new CrewOrchestrator(), + new ResearcherAgent(llm), + new DrafterAgent(llm), + new CriticAgent(llm), + new EditorAgent(llm), + writer); + + // No tools registered — keep this assembly free of ITool so the StudyBuddy set-equality test is unaffected. + private static AgentContext Ctx(Guid? editionId = null, Guid? userId = null) => + new(userId, editionId, Guid.NewGuid(), new ServiceCollection().BuildServiceProvider()); + + private static ContentBrief DescBrief => AutoPublishBriefs.Description("English"); + + private static CancellationToken Ct => TestContext.Current.CancellationToken; + + // ---- 1. Clean draft → not flagged -------------------------------------------------------------- + + // Editor output that clears the Description MinLength floor (800 chars) so a clean run is not gated on length. + private static readonly string CleanEditedText = + "An edited factual encyclopedic summary of the work. " + new string('x', AutoPublishBriefs.DescriptionMin); + + [Fact] + public async Task RunField_CleanDraft_NotFlagged() + { + var llm = new FakeLlmService( + research: "- born 1900\n- wrote three novels", + draft: "A factual encyclopedic summary of the work.", + critic: CleanCritic, + editor: CleanEditedText); + var writer = new RecordingAgentRunWriter(); + var crew = Build(llm, writer); + + var result = await crew.RunFieldAsync(DescBrief, "Born 1900. Wrote novels.", Ctx(), Ct); + + Assert.False(result.NeedsReview); + Assert.Equal(CleanEditedText, result.EditedText); + Assert.Equal(CrewRunRecordFactory.StatusCompleted, result.Status); + Assert.NotNull(result.Critique); + Assert.False(result.Critique!.ParseFailed); + } + + // ---- 2. Critic blocker → flagged --------------------------------------------------------------- + + [Fact] + public async Task RunField_CriticBlocker_FlagsForReview() + { + var llm = new FakeLlmService( + research: "- notes", + draft: "A draft that overclaims.", + critic: BlockerCritic, + editor: "A revised draft."); + var crew = Build(llm, new RecordingAgentRunWriter()); + + var result = await crew.RunFieldAsync(DescBrief, "source", Ctx(), Ct); + + Assert.True(result.NeedsReview); + Assert.Equal(CrewRunRecordFactory.StatusCompleted, result.Status); // crew completed; gate flagged it + Assert.NotNull(result.Critique); + Assert.Contains(result.Critique!.Issues, i => i.Severity == "blocker"); + } + + // ---- 3. Critic parse failure → flagged --------------------------------------------------------- + + [Fact] + public async Task RunField_CriticParseFailed_FlagsForReview() + { + var llm = new FakeLlmService( + research: "- notes", + draft: "A draft.", + critic: GarbageCritic, + editor: "An edited draft."); + var crew = Build(llm, new RecordingAgentRunWriter()); + + var result = await crew.RunFieldAsync(DescBrief, "source", Ctx(), Ct); + + Assert.True(result.NeedsReview); + Assert.NotNull(result.Critique); + Assert.True(result.Critique!.ParseFailed); // fail-closed: unparseable critic never reads as a clean pass + } + + // ---- 4. Persistence ---------------------------------------------------------------------------- + + [Fact] + public async Task RunField_PersistsAgentRun() + { + var editionId = Guid.NewGuid(); + var llm = new FakeLlmService("- notes", "A draft.", CleanCritic, "An edited draft."); + var writer = new RecordingAgentRunWriter(); + var crew = Build(llm, writer); + + var result = await crew.RunFieldAsync(DescBrief, "source", Ctx(editionId), Ct); + + var record = Assert.Single(writer.Records); + Assert.Equal("crew.autopublish", record.Agent); + Assert.Equal(editionId, record.EditionId); + Assert.Equal(result.RunId, record.Id); + Assert.Equal("edition.description", record.Goal); + Assert.Equal("An edited draft.", record.Output); + Assert.Equal(CrewRunRecordFactory.StatusCompleted, record.Status); + + // The four sub-agent invocations are present as nested steps for replay. + Assert.Equal(4, record.Steps.Count); + Assert.All(record.Steps, s => Assert.Equal(CrewRunRecordFactory.SubAgentStepKind, s.Kind)); + Assert.Equal(["research", "draft", "critique", "edit"], + record.Steps.Select(s => s.Payload.GetProperty("stage").GetString())); + } + + // ---- 5. Cost cap halts ------------------------------------------------------------------------- + + [Fact] + public async Task RunField_CostCapExceeded_Halts() + { + // Each call reports cost above the per-field cap, so the orchestrator halts after the first stage. + var llm = new FakeLlmService( + "- notes", "A draft.", CleanCritic, "An edited draft.", + costEach: AutoPublishCrew.CostCapUsd + 0.01m); + var writer = new RecordingAgentRunWriter(); + var crew = Build(llm, writer); + + var result = await crew.RunFieldAsync(DescBrief, "source", Ctx(), Ct); + + Assert.Equal(CrewRunRecordFactory.StatusBudgetExhausted, result.Status); + Assert.True(result.NeedsReview); // not completed → flagged + Assert.Null(result.EditedText); // never reached the edit stage + Assert.Null(result.Critique); // halted after research, before critique + + // The partial run is still persisted (the budget-exhausted run is the one worth inspecting). + var record = Assert.Single(writer.Records); + Assert.Equal(CrewRunRecordFactory.StatusBudgetExhausted, record.Status); + Assert.Single(record.Steps); // only the research stage ran before the cap tripped + } + + // ---- 6. NeedsReview gate as a pure decision ---------------------------------------------------- + + // A body of text comfortably above any brief's MinLength, so the length floor never falsely gates these cases. + private static string LongText => new('x', AutoPublishBriefs.DescriptionMin + 50); + + [Fact] + public void NeedsReview_CompletedWithCleanCritique_False() + { + var clean = new CritiqueResult(5, 5, 5, 5, [], ParseFailed: false); + Assert.False(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusCompleted, clean, LongText, AutoPublishBriefs.DescriptionMin)); + } + + [Fact] + public void NeedsReview_NonCompletedStatus_True() + { + var clean = new CritiqueResult(5, 5, 5, 5, [], ParseFailed: false); + Assert.True(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusBudgetExhausted, clean, LongText, AutoPublishBriefs.DescriptionMin)); + Assert.True(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusError, clean, LongText, AutoPublishBriefs.DescriptionMin)); + } + + [Fact] + public void NeedsReview_NullOrParseFailedCritique_True() + { + Assert.True(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusCompleted, null, LongText, AutoPublishBriefs.DescriptionMin)); + var failed = new CritiqueResult(1, 1, 1, 1, + [new CritiqueIssue("output", "blocker", "unparseable")], ParseFailed: true); + Assert.True(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusCompleted, failed, LongText, AutoPublishBriefs.DescriptionMin)); + } + + [Fact] + public void NeedsReview_BlockerIssue_True() + { + var blocked = new CritiqueResult(4, 4, 4, 4, + [new CritiqueIssue("intro", "blocker", "fix")], ParseFailed: false); + Assert.True(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusCompleted, blocked, LongText, AutoPublishBriefs.DescriptionMin)); + } + + [Fact] + public void NeedsReview_OnlyMinorAndMajorIssues_False() + { + // Non-blocker issues alone do not gate — the editor already addressed them; only blockers fail closed. + var minor = new CritiqueResult(4, 4, 4, 4, + [new CritiqueIssue("a", "minor", "x"), new CritiqueIssue("b", "major", "y")], ParseFailed: false); + Assert.False(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusCompleted, minor, LongText, AutoPublishBriefs.DescriptionMin)); + } + + // ---- 6b. NeedsReview length floor (AI-042 P1 gate cases) --------------------------------------- + + [Fact] + public void NeedsReview_NullOrWhitespaceEditedText_True() + { + var clean = new CritiqueResult(5, 5, 5, 5, [], ParseFailed: false); + Assert.True(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusCompleted, clean, null, AutoPublishBriefs.DescriptionMin)); + Assert.True(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusCompleted, clean, "", AutoPublishBriefs.DescriptionMin)); + Assert.True(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusCompleted, clean, " \n\t ", AutoPublishBriefs.DescriptionMin)); + } + + [Fact] + public void NeedsReview_EditedTextBelowMinLength_True() + { + // Present, clean critic, but a few chars short of the floor → still flagged. + var clean = new CritiqueResult(5, 5, 5, 5, [], ParseFailed: false); + var tooShort = new string('x', AutoPublishBriefs.DescriptionMin - 1); + Assert.True(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusCompleted, clean, tooShort, AutoPublishBriefs.DescriptionMin)); + } + + [Fact] + public void NeedsReview_EditedTextAtMinLengthAndCleanCritic_False() + { + // Exactly at the floor + clean critic → not flagged (boundary is inclusive: length >= MinLength passes). + var clean = new CritiqueResult(5, 5, 5, 5, [], ParseFailed: false); + var atFloor = new string('x', AutoPublishBriefs.DescriptionMin); + Assert.False(AutoPublishCrew.NeedsReview( + CrewRunRecordFactory.StatusCompleted, clean, atFloor, AutoPublishBriefs.DescriptionMin)); + } + + // ---- 7. P1 fix: empty/whitespace/short editor output IS gated -------------------------------------- + // The editor (EditorAgent.Parse → new(text.Trim())) does no length/empty validation. The gate now enforces + // the brief's MinLength floor, so an LLM that returns empty/whitespace/short text yields NeedsReview == true + // and the endpoint writes NOTHING — a real Description / SeoRelevanceText can never be clobbered. + + [Fact] + public async Task RunField_EmptyEditorOutput_FlagsForReview() + { + var llm = new FakeLlmService( + research: "- born 1900\n- wrote three novels", + draft: "A factual encyclopedic summary of the work.", + critic: CleanCritic, + editor: ""); // editor produced nothing — must NOT be auto-applied + var crew = Build(llm, new RecordingAgentRunWriter()); + + var result = await crew.RunFieldAsync(DescBrief, "Born 1900.", Ctx(), Ct); + + Assert.Equal(string.Empty, result.EditedText); + Assert.True(result.NeedsReview); // empty edited field → flagged, nothing applied + } + + [Fact] + public async Task RunField_WhitespaceEditorOutput_FlagsForReview() + { + var llm = new FakeLlmService( + research: "- notes", + draft: "A draft.", + critic: CleanCritic, + editor: " \n\t "); // whitespace only — Trim() collapses it to "" + var crew = Build(llm, new RecordingAgentRunWriter()); + + var result = await crew.RunFieldAsync(DescBrief, "source", Ctx(), Ct); + + Assert.Equal(string.Empty, result.EditedText); + Assert.True(result.NeedsReview); // flagged — nothing to apply + } + + [Fact] + public async Task RunField_EditorOutputBelowMinLength_FlagsForReview() + { + // Present, clean critic, but a handful of chars — far under the 800-char Description floor → flagged. + var llm = new FakeLlmService( + research: "- notes", + draft: "A draft.", + critic: CleanCritic, + editor: "Too short to publish."); + var crew = Build(llm, new RecordingAgentRunWriter()); + + var result = await crew.RunFieldAsync(DescBrief, "source", Ctx(), Ct); + + Assert.Equal("Too short to publish.", result.EditedText); + Assert.True(result.NeedsReview); // below MinLength → flagged + } + + // ---- 8. Cost cap binds only at STAGE boundaries (single runaway stage overshoots) ------------------ + // Documents that CrewOptions(CostCapUsd, 1) is checked AFTER each stage, not mid-call. A single sub-agent + // whose one call already exceeds the cap still completes that call before the crew halts. With 4 + // single-call stages the max overshoot is one call — acceptable, but pinned so the granularity is explicit. + + [Fact] + public async Task RunField_SingleStageExceedsCap_StillRunsThatStageThenHalts() + { + var llm = new FakeLlmService( + "- notes", "A draft.", CleanCritic, "An edited draft.", + costEach: AutoPublishCrew.CostCapUsd + 0.01m); // first call alone busts the cap + var writer = new RecordingAgentRunWriter(); + var crew = Build(llm, writer); + + var result = await crew.RunFieldAsync(DescBrief, "source", Ctx(), Ct); + + // The research call ran (and overshot) before the cap was checked at the stage boundary. + Assert.Equal(CrewRunRecordFactory.StatusBudgetExhausted, result.Status); + Assert.True(result.NeedsReview); + var record = Assert.Single(writer.Records); + Assert.Single(record.Steps); // exactly one stage executed despite the overshoot + } +} + +/// +/// AI-042 P2 — the manual-source write-block decision (AdminAutoPublishEndpoints.IsManualProtected), +/// extracted as a pure helper so the "don't clobber hand-written Manual content" contract is unit-reachable +/// without spinning up HTTP. Mirrors the legacy SeoCoverageAnalyzer "Manual flag protects filled content". +/// +public class AutoPublishManualProtectionTests +{ + [Fact] + public void IsManualProtected_ManualWithDescription_True() + { + Assert.True(Api.Endpoints.AdminAutoPublishEndpoints.IsManualProtected( + Domain.Enums.SeoSource.Manual, "hand-written description", null)); + } + + [Fact] + public void IsManualProtected_ManualWithRelevance_True() + { + Assert.True(Api.Endpoints.AdminAutoPublishEndpoints.IsManualProtected( + Domain.Enums.SeoSource.Manual, null, "hand-written relevance")); + } + + [Fact] + public void IsManualProtected_ManualButBothFieldsEmpty_False() + { + // Empty Manual fields are fair game for first-time generation — only filled content is protected. + Assert.False(Api.Endpoints.AdminAutoPublishEndpoints.IsManualProtected( + Domain.Enums.SeoSource.Manual, null, null)); + Assert.False(Api.Endpoints.AdminAutoPublishEndpoints.IsManualProtected( + Domain.Enums.SeoSource.Manual, "", " ")); + } + + [Fact] + public void IsManualProtected_AutoOrHybridWithContent_False() + { + // Non-Manual provenance is always overwritable, even with existing content. + Assert.False(Api.Endpoints.AdminAutoPublishEndpoints.IsManualProtected( + Domain.Enums.SeoSource.Auto, "auto description", "auto relevance")); + Assert.False(Api.Endpoints.AdminAutoPublishEndpoints.IsManualProtected( + Domain.Enums.SeoSource.Hybrid, "hybrid description", null)); + } +}