Skip to content

Clean up agent system: fix orchestrator.yaml schema compliance, unify validation paths#28

Merged
peternicholls merged 4 commits intomasterfrom
copilot/cleanup-agent-system
Jan 10, 2026
Merged

Clean up agent system: fix orchestrator.yaml schema compliance, unify validation paths#28
peternicholls merged 4 commits intomasterfrom
copilot/cleanup-agent-system

Conversation

Copy link
Contributor

Copilot AI commented Jan 6, 2026

Summary

Clean up PaperKit's agent system to establish the canonical two-file split:

  • YAML files (.paperkit/_cfg/agents/*.yaml) = schema-validated metadata ONLY
  • MD files (.paperkit/{core,specialist}/agents/*.md) = behavioural instructions ONLY

Changes

orchestrator.yaml

  • Contains ONLY schema-compliant metadata
  • All fields match agent-schema.json requirements
  • path points to .paperkit/core/agents/orchestrator.md

orchestrator.md

  • Removed YAML frontmatter (metadata belongs in YAML file only)
  • Contains behavioural instructions for the orchestrator
  • Includes explicit tie-break rules (capability → keyword → module → example prompt)
  • Specifies machine-parseable JSON output format

Documentation

  • Updated docs/dev/agent-audit/decision.md with correct canonical design
  • Updated docs/dev/agent-audit/how-agents-are-structured.md to clarify the split

Code Quality

  • Removed unused imports (os, Set) from check-agents.py

Validation Results

✓ 11/11 YAML files valid
✓ No duplicate agent names
✓ All path references valid
✓ Manifest matches YAML directory
✓ All MD files exist
Original prompt

Copilot Agent Mission Brief: PaperKit .paperkit/ cleanup and de-duplication

Goal

Clean up PaperKit’s agent system so there is one canonical agent definition system that:
• matches the agent-schema.json rules
• is what the runtime actually uses to load/run agents
• has no duplicated “second system” lingering in parallel (or, if backward-compat is required, duplicates are replaced with a deliberate compatibility layer)

Non-negotiables
• Do not break existing PaperKit workflows/CLI behaviour.
• Do not change the schema unless you can prove the schema is wrong (it looks good).
• Preserve academic integrity constraints in agent instructions (don’t accidentally weaken them during migration).
• Every change must be backed by repo evidence (grep + file inspection), not assumptions.

Phase 0 — Safety + Baseline
1. Create a branch: chore/agent-system-unification.
2. Record a baseline:
• Run existing validation command(s) and save output to docs/dev/agent-audit/baseline.txt.
• Run whatever tests/lints exist (or make test / pytest / etc.), save output too.
3. Make a “do no harm” inventory snapshot:
• Save tree -a .paperkit > docs/dev/agent-audit/paperkit-tree.txt (or equivalent on Windows).

Phase 1 — Forensic Discovery (find the two systems and who uses what)

1.1 Locate the schema + validator
• Find the schema file in repo:
• rg -n "agent-schema.json|PaperKit Agent Definition Schema" -S .paperkit
• Find the validator script referenced by the manifest comment:
• rg -n "validate-agent-schema|agent-schema" -S .paperkit open-agent-system tools

1.2 Enumerate “agent definition” locations

Collect counts and lists:
• .paperkit/core/agents/*
• .paperkit/specialist/agents/*
• any _cfg agents: .paperkit/_cfg/agents/* (or .paperkit/_cfg/...)
Commands:
• find .paperkit -maxdepth 4 -type f | rg "/agents/"
• find .paperkit -type f -name ".yaml" | rg "agents"
• find .paperkit -type f -name "
.md" | rg "agents"

Write a short report to docs/dev/agent-audit/inventory.md listing:
• all agent definition files (path + extension)
• which look canonical (match schema path pattern vs not)
• any “manifest” index files (like your YAML manifest)

1.3 Determine runtime loading path (the key question)

Search code for references to:
• .paperkit/_cfg/agents
• .paperkit/core/agents
• .paperkit/specialist/agents
• “manifest”, “agent registry”, “agent loader”, “open-agent-system”
Commands:
• rg -n "_cfg/agents|core/agents|specialist/agents|agent manifest|agent loader|load.*agent" -S .

Identify:
• The code path(s) that load agents at runtime
• The expected file format (MD? YAML? front matter?)
• Whether the runtime uses the schema validator or bypasses it

Document findings in docs/dev/agent-audit/runtime-paths.md with:
• file + function names
• what directories are scanned
• how agent metadata is parsed

Phase 2 — Decide Canonical System (based on evidence)

Decision rule

Prefer the system that:
• matches agent-schema.json without hacks
• is already aligned with the repo structure (.paperkit/core and .paperkit/specialist)
• is easiest to keep stable long-term

Likely outcome (verify in repo):
• Canonical agent definitions = .paperkit/{core|specialist}/agents/.md
• _cfg/agents/
.yaml is legacy OR internal config that got mistaken for canonical

But do not assume — prove by runtime loader.

Write the decision and rationale to docs/dev/agent-audit/decision.md.

Phase 3 — Unification Plan (choose one of these and implement)

Option A (preferred): Make MD agents canonical, retire YAML agents

Do this if runtime already supports MD agents or can easily be made to.

Steps:
1. Ensure every active agent in the manifest exists as a .paperkit/{core|specialist}/agents/.md
2. If YAML-only agents exist, migrate them to .md:
• Convert YAML fields → schema fields:
• name, displayName, title, icon, module, path
• identity.role, identity.description, identity.communicationStyle
• capabilities, constraints, principles, examplePrompts
• Put the operational prompt/instructions into the Markdown body.
3. Update the runtime loader to scan only:
• .paperkit/core/agents/.md
• .paperkit/specialist/agents/
.md
4. Add a compatibility shim ONLY if needed:
• If parts of PaperKit still expect _cfg/agents/*.yaml, replace them with either:
• symlinks (if supported), or
• a generated file step, or
• a tiny adapter that reads YAML and emits in-memory schema objects
• BUT: the adapter should be explicitly marked “legacy” and have a deprecation path.

Option B: Make YAML canonical and update schema

Only do this if runtime is deeply YAML-based and MD agents don’t actually exist / aren’t used.
If you choose this, you must:
• update the schema’s path regex to match reality
• ensure schema validation still blocks drift
• update repo structure to remove misleading core/agents expectations

Given your screenshot and schema, Option A is probably right — but again: confirm.

Phase 4 — Remove duplica...


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…cripts, add unified check-agents.py

Co-authored-by: peternicholls <519525+peternicholls@users.noreply.github.com>
Copilot AI changed the title [WIP] Clean up PaperKit agent system for de-duplication Clean up agent system: fix orchestrator.yaml schema compliance, unify validation paths Jan 6, 2026
Copilot AI requested a review from peternicholls January 6, 2026 04:54
@peternicholls peternicholls marked this pull request as ready for review January 6, 2026 04:59
Copilot AI review requested due to automatic review settings January 6, 2026 04:59
@peternicholls
Copy link
Owner

@copilot
You are working on PR #28 in PaperKit (“cleanup agent system / orchestrator schema compliance”).

Objective:

  • Make the orchestrator fully consistent with PaperKit’s agent-schema.json and the intended split:
    • _cfg/agents/*.yaml = schema-validated metadata ONLY
    • core/agents/.md and specialist/agents/.md = the actual prompt/instructions content referenced by path

Hard requirements:

  1. Open and read .paperkit/_cfg/schemas/agent-schema.json and treat it as the source of truth.
  2. Open and read .paperkit/_cfg/agents/orchestrator.yaml, .paperkit/_cfg/agent-manifest.yaml, and any loader/runner code that consumes these.
  3. Rewrite orchestrator.yaml to contain ONLY keys permitted by agent-schema.json.
    • Remove any non-schema keys such as: schemaVersion, status, instructions, decisionSchema (unless the schema explicitly allows them).
    • Ensure required keys exist and are correct: name, displayName, title, icon, identity{role,description}, module, path.
    • Ensure there are NO duplicate keys in YAML.
  4. Ensure path points to a real markdown file: .paperkit/core/agents/orchestrator.md
  5. Create/Update .paperkit/core/agents/orchestrator.md to contain the orchestrator’s behaviour:
    • The orchestrator MUST NOT do the user’s task.
    • It MUST output a machine-parseable routing decision.
    • It MUST choose exactly one agent from the registry OR ask a clarifying question.
    • Include explicit tie-break rules (capability match > module match > example prompt match).
  6. Fix icon fields that are empty (manifest + orchestrator) to valid emojis.
  7. Run and record:
    • python .paperkit/tools/validate-agent-schema.py --ci --schema .paperkit/_cfg/schemas/agent-schema.json --agents-dir .paperkit/_cfg/agents
    • python .paperkit/tools/check-agents.py --ci
      Commit any needed changes so those pass.

Deliverables:

  • Updated orchestrator.yaml (schema-clean)
  • Updated/created orchestrator.md (prompt content)
  • Any minimal adjustments to check-agents.py / validate-agent-schema.py ONLY if necessary to enforce the schema correctly (do not weaken validation).
  • Update docs/dev/agent-audit/decision.md if the canonical design changed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR resolves validation inconsistencies in the PaperKit agent system by fixing schema compliance issues, correcting path references, and adding unified validation tooling.

Key changes:

  • Fixed orchestrator.yaml to comply with agent schema (added required identity and path fields, removed non-standard fields)
  • Corrected hardcoded .paper/ paths to .paperkit/ across validation scripts
  • Added unified check-agents.py tool for comprehensive agent system validation

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
.paperkit/_cfg/agents/orchestrator.yaml Completely restructured to match schema requirements with proper identity, capabilities, constraints, and input/output schemas
.paperkit/_cfg/agent-manifest.yaml Added orchestrator entry and updated statistics (11 total agents, 7 core)
.paperkit/tools/validate.py Fixed all hardcoded paths from .paper/ to .paperkit/ for schemas, agents, workflows, tools, and manifests
.paperkit/tools/validate-agent-schema.py Updated default path documentation and hardcoded defaults from .paper/ to .paperkit/
.paperkit/tools/check-agents.py New unified validation tool that checks YAML schema compliance, duplicate names, path references, manifest consistency, and MD file coverage
.github/workflows/validate-agent-metadata.yml Added unified check step and path triggers for core/specialist agent directories
docs/dev/agent-audit/baseline.txt Documents pre-fix validation state showing orchestrator as invalid
docs/dev/agent-audit/inventory.md Catalogs dual agent system (YAML metadata + MD definitions)
docs/dev/agent-audit/runtime-paths.md Analyzes runtime loading paths and identifies path inconsistencies
docs/dev/agent-audit/decision.md Documents rationale for keeping dual system and implementation plan
docs/dev/agent-audit/how-agents-are-structured.md Comprehensive guide explaining agent structure and validation
docs/dev/agent-audit/paperkit-tree.txt Directory tree snapshot for reference

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 49 to 68
### Phase 1: Fix Schema Compliance
- [ ] Update `orchestrator.yaml` to be schema-compliant
- [ ] Verify all YAML files pass validation

### Phase 2: Fix Validation Scripts
- [ ] Update `validate.py` to use `.paperkit/` paths

### Phase 3: Update Manifest
- [ ] Add orchestrator to `agent-manifest.yaml`

### Phase 4: Add Unified Check
- [ ] Create `tools/check-agents.sh` that:
- Validates all YAML files against schema
- Validates all MD frontmatter against schema
- Checks for name mismatches
- Ensures path references exist
- Detects duplicate agent names

### Phase 5: Documentation
- [ ] Create "How agents are structured" guide
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation plan shows all tasks as unchecked, but based on the PR changes, these tasks appear to have been completed. The checkboxes should be marked as complete to accurately reflect the work done in this PR.

Copilot uses AI. Check for mistakes.
"""

import sys
import os
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
import yaml
import argparse
from pathlib import Path
from typing import Dict, List, Tuple, Optional, Set
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Set' is not used.

Suggested change
from typing import Dict, List, Tuple, Optional, Set
from typing import Dict, List, Tuple, Optional

Copilot uses AI. Check for mistakes.
@peternicholls
Copy link
Owner

@copilot apply changes based on the comments in this thread

…ter), update decision.md with correct canonical design

Co-authored-by: peternicholls <519525+peternicholls@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 6, 2026

@copilot
You are working on PR #28 in PaperKit (“cleanup agent system / orchestrator schema compliance”).

...

Addressed in commit ead9be9:

  1. orchestrator.yaml - Contains only schema-compliant metadata (no instructions, decisionSchema, schemaVersion, status)
  2. orchestrator.md - Now contains only behavioural instructions (removed YAML frontmatter)
  3. Tie-break rules - Added explicit priority order: capability match → keyword-based routing → module match → example prompt match
  4. All icons - Already valid emojis (🧭 for orchestrator)
  5. Validation passes:
    • validate-agent-schema.py --ci: ✓ 11/11 valid
    • check-agents.py --ci: ✓ All checks passed
  6. decision.md - Updated to reflect canonical two-file split design

Co-authored-by: peternicholls <519525+peternicholls@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 6, 2026

@copilot apply changes based on the comments in this thread

Fixed in commit d308552:

  • Removed unused os import
  • Removed unused Set import from typing

Note: The decision.md checkboxes were already marked as complete in a previous commit (ead9be9).

@peternicholls
Copy link
Owner

@copilot apply changes based on the comments in this thread

Copy link
Contributor Author

Copilot AI commented Jan 10, 2026

@copilot apply changes based on the comments in this thread

These changes were already applied in commit d308552:

  • Removed unused os import from check-agents.py
  • Removed unused Set import from typing
  • Checkboxes in decision.md were marked complete in commit ead9be9

All review comments have been addressed.

@peternicholls peternicholls merged commit 1537798 into master Jan 10, 2026
2 checks passed
@peternicholls peternicholls deleted the copilot/cleanup-agent-system branch January 10, 2026 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants