Skip to content

richfrem/agent-plugins-skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,337 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Universal Agent Plugins & Skills Ecosystem

**Current Scale:** 11 Plugins · 142 Skills · 43 Sub-Agents — a self-improving, cross-platform library of reusable AI agent

capabilities for Claude Code, GitHub Copilot, Gemini CLI, and any compliant agent framework.

Recent milestones: v1.3 — Hardened SQLite control plane (May 2026) · v1.4 — MAF synthesis & hybrid runtime strategy (May 31, 2026)


Architecture Evolution

v1.3 — Hardened Control Plane (May 2026)

Replaced fragile markdown-based state with a transactional SQLite control plane (state_engine.py), added strong process sandboxing (sandbox_runner.py), HMAC-signed envelopes, approval gating, and WAL concurrency safety. Implementation is stdlib-only (sqlite3, hmac, hashlib, subprocess, os, secrets) — no framework dependencies. This made the custom Python kernel production-grade and laid the foundation for the v1.4 hybrid strategy.

v1.4 — MAF Synthesis & Hybrid Strategy (May 31, 2026)

After extensive MAF research and 12 hands-on C# experiments (including full loading of real exploration-cycle-plugin manifests), we pivoted from "do not adopt MAF" to a hybrid architecture:

Manifest-first. Multiple certified runtime adapters second.

Key outcomes:

  • Kept the hardened Python control plane as the authoritative kernel
  • Adopted AGT (Agent Governance Toolkit) for deterministic policy enforcement
  • Ported 4 high-value patterns from MAF: alias resolution, standardized handoff envelopes, per-agent skill scoping, per-phase premium call budgets
  • MAF is now a certified optional runtime adapter alongside Claude Code, Copilot CLI, and Gemini CLI (ADR-007)
  • All .md agent manifests and SKILL.md files remain fully portable

This hybrid approach gives us the best of both worlds: battle-tested custom safety primitives + selective leverage of Microsoft's well-engineered patterns.

References: ADR-001 · ADR-002 · ADR-007


Platforms

A strictly cross-platform (Windows, Mac, Ubuntu) library — the universal upstream source for reusable AI agent plugins and skills across multiple IDEs and agent frameworks: Claude Code, GitHub Copilot, Gemini CLI, Antigravity, Roo Code, Windsurf, Cursor, and other compliant integrations.

All plugins deploy to the single .agents/ folder standard — no duplicate copies needed for .github, .gemini, .agent, etc.


Installation

Important

Start here — fresh clone or first-time setup. The single .agents/ environment directory is not committed to your repo. It will be empty by default.

All installation methods (uvx, bootstrap.py, npx skills, and Marketplace / Extension CLI) are now consolidated in a single authoritative guide:

Quick install (all plugins):

uvx --from git+https://github.com/richfrem/agent-plugins-skills plugin-add richfrem/agent-plugins-skills

v1.4 note: If upgrading from v1.3, run uv sync (or pip install -r requirements.txt) after pulling latest — the per-phase budget enforcement and AGT governance patterns add new dependencies to exploration-cycle-plugin.


Core Philosophy: Transitional Architectures & Decoupled Skills

This repository is built on a pragmatic acceptance of the current AI engineering landscape: the ecosystem changes weekly, and workflows that were revolutionary six months ago are obsolete today.

Frameworks like agent-agentic-os and spec-kitty are treated as Transitional Architectures — bridges between what agents need to do today and what native SDKs will eventually handle. When Anthropic, Google, and GitHub harden native memory persistence, execution safety, and multi-agent orchestration, large swaths of this tooling will be happily discarded.

The MAF research (May 2026) reinforced this view: instead of choosing between a custom kernel and a framework, we now deliberately pursue a hybrid model:

  • Portable .md manifests and SKILL.md files remain the source of truth across all runtimes
  • Multiple runtime adapters (Claude Code, Copilot CLI, Gemini CLI, MAF) are supported side-by-side
  • Strong custom control plane for safety and governance that no hosted framework currently matches
  • Selective adoption of excellent patterns from frontier frameworks (e.g. MAF's typed handoffs and AGT governance)

Skills are Applications; the SDK is the OS. Individual skills must function in complete isolation — no hard dependencies on sibling plugins, no assumptions about which framework is running.


Architecture

Pillar 1: The Improvement OS (agent-agentic-os)

The OS implements an eval-gated improvement pipeline for autonomous skill evolution:

os-architect           ← intent classifier + ecosystem router
    ↓
os-improvement-loop    ← learning engine: orchestrates multi-iteration improvement
    ↓
os-eval-runner         ← inner gate: KEEP/DISCARD per iteration (evaluate.py)
    ↓
os-eval-backport       ← human gate: review before lab winner → production
    ↓
os-experiment-log      ← scientific backbone: longitudinal tracking + synthesis

Entry point: /os-architect — describe what you want in plain language. The agent classifies intent, audits the ecosystem, proposes Path A/B/C, and dispatches via your available CLI tools. os-evolution-planner writes the task plan + delegation prompt. os-architect-tester validates after any changes.

Karpathy Autoresearch Loop

Skills that score HIGH on the autoresearch viability rubric (objectivity + speed + frequency + utility) can run fully autonomous self-improvement loops:

mutate SKILL.md → evaluate.py → exit 0 (KEEP) or exit 1 (DISCARD) → repeat

Not all skills are good candidates — use eval-autoresearch-fit to score a skill before running a loop.

Live example — convert-mermaid skill, 26 iterations across 2 rounds: 0.61 → 1.00

convert-mermaid eval progress

Each blue diamond is a baseline anchor (one per session). Green = new best score. Amber = kept but not a record. The two-segment shape shows a fresh re-baseline for round 2.

Monitor a live run: python plugins/agent-agentic-os/scripts/plot_eval_progress.py --tsv <lab>/evals/ --live

Flywheel layers:

  • OUTER flywheel (os-improvement-loop): improves OS-level protocols and session ledgers between sessions
  • INNER flywheel (os-eval-runner): evaluate.py KEEP/DISCARD gate per iteration within a session

Pillar 2: Execution Patterns (agent-loops)

5 composable primitives used as the execution substrate by the Improvement OS and standalone by any agent workflow:

learning-loop · dual-loop · agent-swarm · red-team-review · triple-loop-learning

Pillar 3: Super-RAG 3-Tier Retrieval

O(1) RLM keyword → O(log N) vector semantic → wiki concept nodes.

Super-RAG stack: rlm-factory (O(1) keyword) + vector-db (O(log N) semantic) + obsidian-wiki-engine (full concept nodes)

Each plugin works standalone (Mode A) or combined for full Super-RAG power. Init agents detect what is installed in .agents/skills/ and configure only the available layers.

Hub-and-Spoke ADR

All shared scripts live once at plugins/<plugin>/scripts/. Skills reference them via file-level symlinks (skills/<skill>/scripts/script.py → ../../../scripts/script.py). Directory-level symlinks are forbidden — npx drops them on install.


Plugin Ecosystem (11 plugins · 137 skills)

Group 1: The Improvement OS

agent-agentic-os — Continuous Self-Improvement

The flagship operational framework. Eval-gated improvement loops, memory management, session lifecycle, and ecosystem evolution orchestration.

Skills (17): os-architect · os-evolution-planner · os-guide · os-improvement-loop · os-eval-lab-setup · os-eval-runner · os-eval-backport · os-environment-probe · os-evolution-verifier · os-experiment-log · os-memory-manager · os-improvement-report · os-init · os-clean-locks · todo-check · optimize-agent-instructions · self-evolution

Agents (5): os-architect-agent · os-architect-tester-agent · improvement-intake-agent · os-health-check · agentic-os-setup


Group 2: Engineering Workflows

spec-kitty-plugin — Spec-Driven Development

Enterprise-grade Spec → Plan → Tasks → Implement → Review → Merge pipeline.

Skills (19): spec-kitty-specify · spec-kitty-plan · spec-kitty-tasks · spec-kitty-implement · spec-kitty-review · spec-kitty-merge · spec-kitty-analyze · spec-kitty-accept · spec-kitty-clarify · spec-kitty-research · spec-kitty-dashboard · spec-kitty-status · spec-kitty-checklist · spec-kitty-constitution · spec-kitty-tasks-outline · spec-kitty-tasks-finalize · spec-kitty-tasks-packages · spec-kitty-workflow · spec-kitty-sync-plugin

Agents: spec-kitty-agent · spec-kitty-setup

exploration-cycle-plugin — Discovery & Requirements

Autonomous discovery loop: idea framing → business requirements → user stories → prototype → handoff into formal engineering specs.

Skills (19): exploration-workflow · exploration-session-brief · discovery-planning · business-requirements-capture · business-workflow-doc · user-story-capture · exploration-handoff · exploration-optimizer · prototype-builder · visual-companion · subagent-driven-prototyping · vibe-browser-audit · vibe-behavioral-test-capture · vibe-domain-extractor · vibe-slice-migrator · vibe-reengineer · vibe-spec-packager · vibe-togaf-architect · vibe-to-speckit-superpowers

Agents (17): business-rule-audit-agent · certification-verifier · discovery-planning-agent · domain-purity-auditor · exploration-cycle-orchestrator-agent · handoff-preparer-agent · intake-agent · planning-doc-agent · problem-framing-agent · prototype-builder-agent · prototype-companion-agent · requirements-doc-agent · requirements-scribe-agent · runtime-observer-agent · semantic-drift-auditor · vibe-orchestrator-agent · subagent-driven-prototyping-agent


Group 3: Execution Patterns

agent-loops — Composable Loop Primitives

5 execution primitives used as the substrate for the Improvement OS and standalone agent workflows.

Skills (6): orchestrator · learning-loop · dual-loop · agent-swarm · red-team-review · triple-loop-learning

Agents: orchestrator


Group 4: Code Quality & Safety

agent-scaffolders — Boilerplate & Audit (30 skills)

Interactive creators for exact file hierarchies + structured audit framework for plugin architectural maturity.

Scaffolding skills: create-plugin · create-skill · create-sub-agent · create-command · create-hook · create-github-action · create-agentic-workflow · create-azure-agent · create-docker-skill · create-mcp-integration · create-stateful-skill

Audit & analysis skills: audit-plugin · audit-plugin-l5 · l5-red-team-auditor · analyze-plugin · self-audit · mine-skill · mine-plugins · path-reference-auditor · fix-plugin-paths · synthesize-learnings · eval-autoresearch-fit · manage-marketplace · ecosystem-standards · ecosystem-authoritative-sources


Group 5: CLI Sub-Agents

Dispatch tasks and persona-based analysis to isolated model contexts via four CLI tools.

cli-agents — Unified CLI Dispatcher (v1.1.0)

All CLI sub-agent tooling consolidated into one plugin. Each tool has its own run_agent.py, its own agent personas (with tool-appropriate default model), and its own SKILL.md.

Skills (6):

Agents (12): claude/architect-review · claude/refactor-expert · claude/security-auditor · copilot/architect-review · copilot/refactor-expert · copilot/security-auditor · gemini/architect-review · gemini/refactor-expert · gemini/security-auditor · agy/architect-review · agy/refactor-expert · agy/security-auditor

June 2026 billing note: GitHub Copilot moves to AI Credits (per-token) on June 1. gpt-5-mini, gpt-4.1, gpt-4o remain included (no credit cost). All other models consume credits — see copilot-cli-agent SKILL.md for updated model table.

Execution Disciplines — Safety & Quality

Behavioural guardrails enforcing best practices on every coding session. These skills come from obra/superpowers — install that plugin to get them.

Install: uvx --from git+https://github.com/richfrem/agent-plugins-skills plugin-add obra/superpowers

Skills available via superpowers: verification-before-completion · test-driven-development · using-git-worktrees · systematic-debugging · finishing-a-development-branch · requesting-code-review


Group 6: Knowledge & Memory

agent-memory — Unified Cognitive Memory Suite (v1.0.0)

Three standalone plugins consolidated: rlm-factory (O(1) keyword search) + vector-db (semantic search) + memory-management (session tiering). Works standalone per layer or combined as a full Super-RAG stack.

RLM skills (6): rlm-init · rlm-curator · rlm-search · rlm-distill-agent · rlm-cleanup-agent · rlm-audit

Vector DB skills (6): vector-db-init · vector-db-launch · vector-db-ingest · vector-db-search · vector-db-cleanup · vector-db-audit

Session memory (1): memory-management — multi-tiered cognition and context caching

Agents (9): rlm-cleanup-agent · rlm-curator · rlm-distill-agent · rlm-factory-init-agent · rlm-init · rlm-search · vector-db-cleanup · vector-db-ingest · vector-db-init-agent

obsidian-wiki-engine — Karpathy LLM Wiki + Super-RAG (v3.1.0)

Karpathy-style LLM wiki with cross-source concept synthesis. Transforms raw markdown into structured, queryable concept nodes. Full Obsidian vault CRUD, canvas, and graph traversal. Pairs with agent-memory as Phase 3 of the Super-RAG stack.

Wiki skills: obsidian-wiki-builder · obsidian-rlm-distiller · obsidian-query-agent · obsidian-wiki-linter

Vault skills: obsidian-init · obsidian-vault-crud · obsidian-canvas-architect · obsidian-graph-traversal · obsidian-markdown-mastery · obsidian-bases-manager

Setup agents: wiki-init-agent · wiki-build-agent · wiki-distill-agent · wiki-lint-agent · wiki-query-agent · super-rag-setup-agent


Group 7: Infrastructure & Utilities

dev-utils — Developer Utilities Suite (v1.1.0)

Nine standalone plugins consolidated into one. All tools are stateless and self-contained.

Skills (12): adr-management · coding-conventions-agent · context-bundler · convert-mermaid · hf-init · hf-upload · humanize · link-checker-agent · optimize-context · red-team-bundler · symlink-manager · task-agent

Agents (3): coding-conventions-agent · link-checker-agent · rsvp-comprehension-agent

plugin-manager — Ecosystem Sync

Skills (3): plugin-installer · plugin-remover · plugin-syncer

dependency-management — pip-compile Workflows

Cross-platform pip-compile with strict .in.txt lockfile discipline.

Skills (1): dependency-management


Completed Experiments

Ecosystem Fitness Sweep v1 — COMPLETE (temp/ecosystem-fitness-sweep-v1/)

Scored all 116/120 production skills for Karpathy autoresearch loop viability using GPT-5 mini via Copilot CLI. Each skill scored on: objectivity (can a shell command measure it?), execution speed, frequency of use, and potential utility (max 40).

Top HIGH candidates:

Rank Skill Score Loop
1 superpowers/verification-before-completion 35/40 LLM_IN_LOOP
2 superpowers/test-driven-development 35/40 LLM_IN_LOOP
3 coding-conventions/coding-conventions-agent 34/40 HYBRID
4 superpowers/using-git-worktrees 33/40 DETERMINISTIC
5 spec-kitty-plugin/spec-kitty-status 33/40 DETERMINISTIC
6 agent-agentic-os/os-eval-runner 32/40 DETERMINISTIC

Full ranked results: summary-ranked-skills.json Top 20 opportunities with metrics + blockers: autoresearch-opportunities-report.md

Regenerate report:

python plugin-research/experiments/analyze-candidates-for-auto-reseaarch/skills/eval-autoresearch-fit/scripts/update_ranked_skills.py \
  --json-path plugin-research/experiments/analyze-candidates-for-auto-reseaarch/skills/eval-autoresearch-fit/assets/resources/summary-ranked-skills.json \
  --morning-report

Repository Structure

plugins/                    ← upstream source (11 plugins, 137 skills)
  <plugin>/
    plugin.yaml             ← plugin manifest
    .claude-plugin/plugin.json
    skills/<skill>/
      SKILL.md              ← skill definition (mutation target for autoresearch loops)
      evals/evals.json      ← routing evaluation suite (should_trigger boolean schema)
      evals/results.tsv     ← per-experiment score history
      scripts/              ← file-level symlinks → ../../scripts/
    scripts/                ← canonical scripts (shared via symlinks, never duplicated)
    agents/                 ← sub-agent .md definitions
    commands/               ← slash commands
    assets/diagrams/        ← architecture diagrams

.agents/                    ← deployed skill copies (bridge installer output)
  skills/
  agents/

plugin-research/            ← experiments and autoresearch infrastructure
  experiments/
    analyze-candidates-for-auto-reseaarch/

temp/                       ← local scratch (gitignored except scripts)
  ecosystem-fitness-sweep-v1/

137 skills · 11 plugins · Improvement OS (os-architect) · Karpathy autoresearch loops · Super-RAG 3-tier retrieval