Skip to content

Kind3rin/TopCoder

Repository files navigation

TC Requirement Analyzer — Guardrails

Guardrails for the requirement-analyzer agent, implemented with the Mastra Processor API. They make the agent's requirement-coverage analyses reliable, consistent and safe by preventing the two production failures documented in test-data/:

  • False negatives — declaring a requirement missing when it is actually implemented (e.g. concluding REQ_15 MISSING without ever reading the /docs/agents.md the requirement names).
  • False positives — declaring a requirement implemented on the strength of documentation prose or search snippets, without reading any code (e.g. REQ_06Implemented @0.95 with zero submission_read calls).

All guardrails are deterministic (no extra LLM), so they add < 1 ms/step, introduce no second model, and are fully unit-tested without Ollama.

Architecture details: architecture.md · Validation steps: Validation.md


What was implemented

# Guardrail Stage Where
1 False-Negative Minimization output guardrails/false-negative-processor.ts
2 False-Positive Prevention output guardrails/false-positive-processor.ts
3 Output Quality Verification output guardrails/output-quality-processor.ts
4 Result Consistency model + post providers/ollama.ts, guardrails/consistency.ts
Context-Window Management input guardrails/context-window-processor.ts
Smart tools tools tools/suggest-searches-tool.ts, tools/verify-evidence-tool.ts

The pure decision logic lives in src/mastra/agents/requirement-analyzer/guardrails/ and is consumed by thin Mastra Processor adapters and by the tests.


Requirements

  • Node.js ≥ 20 (.nvmrc pins v24; CI tested on v22).
  • pnpm 10 (packageManager is pinned).
  • Ollama with qwen3:4b-instruct for live runs.

Install

pnpm install

Configuration (.env)

Variable Default Purpose
OLLAMA_HOST http://localhost:11434 Ollama endpoint
LLM_PROVIDER_NAME TC-Ollama Provider
LLM_MODEL_NAME qwen3:4b-instruct Model (agent and any LLM step share it)
MAX_CONTEXT_SIZE 43960 Context window; drives num_ctx and the tool-result budget
LLM_SEED 42 Fixed decoding seed (Result Consistency)
LOCAL_DEV true Enables LibSQL storage + observability + memory
WORKSPACE_PATH Absolute path to <repo>/workspace (required)

Guardrail tunables (all optional, env-overridable — see guardrails/config.ts): FN_MIN_SEARCH_ATTEMPTS, FN_MIN_READ_ATTEMPTS, FN_NEGATIVE_SCORE_THRESHOLD, FN_MAX_RETRIES, FN_EMPTY_WORKSPACE_FILES, FP_REQUIRE_CODE_EVIDENCE, FP_MIN_EVIDENCE_LENGTH, FP_MAX_UNVERIFIED_PATHS, FP_MAX_RETRIES, OQ_MIN_EVIDENCE_ITEMS, OQ_MIN_QUALITY_SCORE, OQ_MAX_STRUCTURE_RETRIES, CONSISTENCY_SCORE_QUANTUM, CONSISTENCY_SCORE_TOLERANCE, CONTEXT_RESERVE_TOKENS.

Test-data setup

# challenge-context.json already lives in ./workspace
# The false-positive submission is preconfigured under workspace/submission.
# To use the false-negative submission instead:
rm -rf workspace/submission && mkdir -p workspace/submission
unzip test-data/false-NEGATIVE-submission.zip -d workspace/submission

Set WORKSPACE_PATH in .env to the absolute path of <repo>/workspace.

Run

pnpm dev        # Mastra Studio at http://localhost:3000/studio
pnpm start      # CLI quality gate (src/cli/run-quality-gate.ts)
  • Single requirement: studio/agents/requirement-analyzer-agent/chat/new — paste a requirement JSON.
  • Full review: studio/workflows/requirementsAnalyzerWorkflow/graph.

Test, lint, format

pnpm test              # all tests (guardrail + pre-existing)
pnpm test:guardrails   # guardrail suite only (43 tests)
pnpm lint
pnpm format:check

Generating LibSQL trace artifacts

Two kinds of artifacts live under src/mastra/public/:

  1. Offline verification trace (committed):

    pnpm traces        # writes src/mastra/public/guardrail-traces.db

    Replays the real test-data/ reports through the guardrails and records each verdict (FN retry? FP block? quality score?) into a LibSQL DB — reproducible without Ollama. Inspect with any SQLite/LibSQL client:

    SELECT dataset, requirement_id, parsed_verdict, read_count,
           fn_should_retry, fp_unsupported, oq_quality_score
    FROM guardrail_traces ORDER BY dataset, requirement_id;
  2. Live agent-run traces (optional): with LOCAL_DEV=true, running the workflow on Ollama persists Mastra memory/observability to LibSQL (ai-review-libsql-storage.db, requirement-analyzer-memory.db). Copy the freshest of these into src/mastra/public/ to ship real run traces:

    cp ai-review-libsql-storage.db src/mastra/public/
    cp requirement-analyzer-memory.db src/mastra/public/

How a guardrail acts (retry behaviour)

On the agent's conclusion step a guardrail inspects the report + the thread's tool-call history. If the conclusion is unsafe it calls args.abort(feedback, { retry: true }) with specific feedback (which files to read, which patterns to search). The agent then performs the requested tool calls and re-concludes. A hard maxRetries cap and an empty-workspace check guarantee termination — no infinite loops.

Project layout

See architecture.md §11 for the full file map.

Third-party libraries

Only libraries already present in the starter package.json are used (Mastra, ai-sdk-ollama, zod, tokenx, @mastra/libsql). @libsql/client (already in the dependency tree via @mastra/libsql, MIT-licensed) is declared as a devDependency solely for the offline trace-generation script. No other third-party code was added.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors