comses · alee · May 7, 2026 · Apr 30, 2026 · May 3, 2026
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,6 +1,6 @@
-# Contributing Skills to COMSES
+# Contributing Skills to CoMSES
 
-Thank you for contributing to this skills repository! This guide walks you through the process of creating, testing, and submitting skills for computational modelers.
+Thank you for contributing to this skills repository! This guide walks you through the process of creating, testing, and submitting skills for our community.
 
 ## Table of Contents
 
@@ -15,16 +15,15 @@ Thank you for contributing to this skills repository! This guide walks you throu
 ## Before You Start
 
 - Familiarize yourself with the [Agent Skills specification](https://agentskills.io)
-- Review existing skills in `skills/` to understand the pattern
-- Copy [docs/SKILL-TEMPLATE.md](docs/SKILL-TEMPLATE.md) as your starting point
-- Ensure your skill addresses a concrete pain point for computational modelers
-- Confirm your skill does NOT substantially overlap with existing skills
+- Read [docs/agent-skills-creation-reference.md](docs/agent-skills-creation-reference.md). This is the canonical authoring guide for this repository.
+- Review existing skills in `skills/` to check for overlap and assess fit / appropriateness
+- Use `/create-skill` if your coding agent provides it, or manually copy [docs/SKILL-TEMPLATE.md](docs/SKILL-TEMPLATE.md) into a new skill directory
 
 ## Skill Creation Workflow
 
 ### 1. Plan Your Skill
 
-Answer these questions before writing:
+Answer these questions:
 
 - **What problem does it solve?** (e.g., "Modelers struggle to document ODD+2 protocols manually")
 - **When should the coding agent use it?** (e.g., "When user has model code and needs narrative documentation")
@@ -34,21 +33,24 @@ Answer these questions before writing:
 
 ### 2. Create Your Skill Folder
 
-Run `/create-skill <name> — <one-sentence description>` in your coding agent. This scaffolds `skills/<name>/SKILL.md` from [docs/SKILL-TEMPLATE.md](docs/SKILL-TEMPLATE.md) with placeholders filled in, and generates a starter `evals.json`.
+Run `/create-skill <name> — <one-sentence description>` in your coding agent if that command is available. It should scaffold `skills/<name>/SKILL.md` from [docs/SKILL-TEMPLATE.md](docs/SKILL-TEMPLATE.md) and create a starter `skills/<name>/evals.json`.
 
 Alternatively, copy manually:
 ```bash
 mkdir -p skills/your-skill-name
 cp docs/SKILL-TEMPLATE.md skills/your-skill-name/SKILL.md
+cp skills/document/evals.json skills/your-skill-name/evals.json
 ```
 
+Then immediately rename `skill_name`, replace the copied prompts, and make sure the frontmatter `name:` matches the folder exactly.
+
 ### 3. Write SKILL.md
 
 See [Frontmatter Specification](#frontmatter-specification) and [Writing Guidelines](#writing-guidelines) below.
 
 ### 4. Add Optional Resources
 
-As your skill grows, add supporting files:
+As your skill grows, you might find supporting files useful:
 
 ```
 your-skill-name/
@@ -71,6 +73,14 @@ your-skill-name/
 
 See [Testing Your Skill](#testing-your-skill).
 
+Before opening a PR, also run the repository validators:
+
+```bash
+python scripts/validate_individual_skills.py
+python scripts/validate_evals_schema.py
+python scripts/validate_cross_skills.py evals/cross-skills.json
+```
+
 ### 6. Submit a Pull Request
 
 Include:
@@ -124,21 +134,21 @@ A typical SKILL.md body includes:
 
 ## Key Inputs
 
-- Model source files (Python/R/C++)
+- Model source code files
 - Parameter descriptions or config files
 - Optional: docstrings with metadata
 
 ## Step-by-Step Instructions
 
 1. Read the model code
-2. Extract metadata using scripts/extract.py
+2. Extract metadata (scicodes/somef-core,  google/langextract)
 3. Generate narrative following references/TEMPLATE.md
 4. Validate against references/CHECKLIST.md
 
 ## ⚠️ Gotchas
 
 - **Stochastic models:** If your model uses randomness, document any fixed random seeds
-- **Large codebases:** Summarize into entity/subsystem abstractions first
+- **Large codebases:** Summarize into entity/subsystem/component abstractions first
 - **Missing documentation:** Skill will ask clarifying questions rather than guess
 
 ## Templates & Resources
@@ -173,10 +183,11 @@ A typical SKILL.md body includes:
 name: your-skill-name
 description: |
   A complete description of what this skill does.
-  
-  Use when: you have model code and need...
-  When to trigger: mention [keywords like ODD, documentation, publication]
+
+  Use this skill when you have model code and need...
+  Triggers: "odd", "documentation", "publication"
   Expected output: [specific deliverables]
+license: MIT
 ---
 ```
 
@@ -186,23 +197,25 @@ description: |
 ---
 name: your-skill-name
 description: ...
-license: MIT (default) | Apache-2.0 | Proprietary
+license: MIT | Apache-2.0 | Proprietary
 compatibility: Python 3.10+, git, Docker (optional)
 metadata:
   domain: computational-modeling | documentation | publication | execution
   maturity: alpha | beta | stable
-  audience: modelers | researchers | data scientists
+  audience: modelers | researchers | data-scientists
+  category: documentation | quality-assurance | execution | publication
 ---
 ```
 
-### Guidancefor `description`
+### Guidance for `description`
 
 The description is your **primary triggering mechanism**. Make it:
 
 - **Task-specific:** "ODD+2 narrative for agent-based models" not just "model documentation"
 - **Keyword-rich:** Include trigger phrases users would naturally type
 - **Outcome-focused:** Mention specific deliverables (e.g., "checklist", "narrative sections", "validation report")
-- **Slightly pushy:** Coding agents tend to under-trigger skills. Emphasize when to use: "Use whenever you mention ODD, ABM documentation, or model publication preparation"
+- **Use the repository-preferred trigger phrase:** Start with `Use this skill when ...` so your description aligns with the validator heuristics and the existing skills.
+- **Slightly pushy:** Coding agents tend to under-trigger skills. Emphasize when to use: "Use this skill when you mention ODD, ABM documentation, or model publication preparation"
 
 ## Testing Your Skill
 
@@ -226,37 +239,45 @@ The description is your **primary triggering mechanism**. Make it:
 
 ### Creating an Evaluation Strategy
 
-For each skill, document 3–5 concrete test cases in a file `evals/evals.json`:
+For each skill, include concrete test cases in `skills/<name>/evals.json`:
 
 ```json
 {
   "skill_name": "document",
+  "description": "Evaluation cases for ODD+2 narrative documentation skill",
   "evals": [
     {
       "id": 1,
+      "type": "core",
       "prompt": "I have a Python ABM with Agent and Environment classes. Generate an ODD narrative.",
       "should_trigger": true,
-      "expected_output": "ODD sections covering entities, state variables, and processes",
-      "files": ["evals/files/minimal_abm.py"]
+      "expected_output": "ODD sections covering entities, state variables, and processes"
     }
   ]
 }
 ```
 
+Notes:
+
+- Individual skill evals live next to the skill, for example `skills/document/evals.json`.
+- The repository schema accepts fields such as `type`, `should_trigger`, `expected_output`, `expected_behavior`, `success_criteria`, `skills_expected`, `failure_modes`, and `notes`.
+- Do not add ad hoc fields unless you also update the schema in `evals/schema/schema.json`.
+
 ## Submission Checklist
 
 Before submitting, verify:
 
 - [ ] Skill folder name matches `name:` field in frontmatter
-- [ ] Frontmatter includes `name` and `description` (and optionally `license`, `compatibility`, `metadata`)
-- [ ] Description includes triggers ("Use when you...") and expected outputs
+- [ ] Frontmatter includes `name`, `description`, and `license` (plus optional `compatibility` and `metadata`)
+- [ ] Description includes triggers (`Use this skill when ...`) and expected outputs
 - [ ] All script references use relative paths: `scripts/name.py` (not `./scripts/name.py`)
 - [ ] README/CONTRIBUTING sections are consistent with repository guidelines
+- [ ] `skills/<name>/evals.json` exists and validates against `evals/schema/schema.json`
 - [ ] Tested skill against ≥5 should-trigger and ≥3 should-not-trigger prompts
 - [ ] No hardcoded paths or user-specific settings
 - [ ] Scripts have clear usage documentation (docstrings, help text, or references/SCRIPT.md)
 - [ ] No credentials, API keys, or personal data in examples
-- [ ] License field in frontmatter (defaults to MIT if omitted)
+- [ ] License field is present in frontmatter
 
 ## Questions?
 

diff --git a/Makefile b/Makefile
@@ -0,0 +1,50 @@
+# ---- config ----
+PYTHON ?= python
+SCRIPTS := scripts
+EVALS := evals
+
+CROSS_EVAL := $(EVALS)/cross-skills.json
+
+# ---- default ----
+.PHONY: all
+all: validate-evals cross
+
+# ---- schema validation ----
+.PHONY: validate-evals
+validate-evals:
+	$(PYTHON) $(SCRIPTS)/validate_evals_schema.py
+
+# ---- cross-skill evals ----
+.PHONY: cross
+cross:
+	$(PYTHON) $(SCRIPTS)/validate_cross_skills.py $(CROSS_EVAL)
+
+# ---- per-skill evals (placeholder) ----
+# assumes future runner like: run_skill_evals.py <skill>
+SKILLS := document fair4rs hpc ospool peer-review
+
+.PHONY: skills
+skills: $(SKILLS)
+
+.PHONY: $(SKILLS)
+$(SKILLS):
+	@echo "Running evals for $@"
+	$(PYTHON) $(SCRIPTS)/run_skill_evals.py $@
+
+# ---- aggregate report ----
+.PHONY: report
+report:
+	$(PYTHON) $(SCRIPTS)/aggregate_failures.py
+
+# ---- full pipeline ----
+.PHONY: full
+full: validate-evals cross report
+
+# ---- CI Pipeline ----
+.PHONY: ci
+ci:
+	@echo "=== Running CI pipeline ==="
+	$(MAKE) validate-evals
+	$(MAKE) cross
+	$(MAKE) report
+	@echo "=== CI completed ==="
diff --git a/README.md b/README.md
@@ -147,7 +147,7 @@ Use cases:
 ## Repository Structure
 
 ```
-skills/
+.
 ├── .github/
 │   └── skills/
 │       └── update-skill/        (repository-local maintainer skill)
@@ -156,43 +156,66 @@ skills/
 │           │   └── REFRESH-WORKFLOW.md
 │           └── assets/
 │               └── REFRESH-PR-NOTE-TEMPLATE.md
+├── AGENTS.md                    (repository-specific agent instructions)
 ├── README.md                    (this file)
 ├── CONTRIBUTING.md              (contribution guidelines)
 ├── LICENSE                      (MIT)
 ├── .gitignore
+├── Makefile                     (validation shortcuts)
 ├── docs/                        (repository-level documentation)
+│   ├── agent-skills-creation-reference.md
 │   ├── roadmap.md
 │   └── SKILL-TEMPLATE.md        (copy/fill template for new skills)
+├── evals/                       (cross-skill evals and schema)
+├── scripts/                     (validation and reporting helpers)
 └── skills/                      (all skill folders)
     ├── document/
-    │   └── SKILL.md
+  │   ├── SKILL.md
+  │   └── evals.json
     ├── fair4rs/
-    │   └── SKILL.md
+  │   ├── SKILL.md
+  │   └── evals.json
     ├── ospool/
-    │   └── SKILL.md
+  │   ├── SKILL.md
+  │   └── evals.json
     ├── hpc/
-    │   └── SKILL.md
+  │   ├── SKILL.md
+  │   └── evals.json
     └── peer-review/
-        └── SKILL.md
+    ├── SKILL.md
+    └── evals.json
 ```
 
 ## For Skill Authors
 
 ### Adding a New Skill
 
-1. **Read** [CONTRIBUTING.md](CONTRIBUTING.md) for submission guidelines and naming conventions.
+1. **Read** [AGENTS.md](AGENTS.md), [CONTRIBUTING.md](CONTRIBUTING.md), and [docs/agent-skills-creation-reference.md](docs/agent-skills-creation-reference.md) before drafting.
 2. **Review** [Agent Skills best practices](https://agentskills.io/skill-creation/best-practices) before drafting.
-3. **Ground from real expertise**: start from real task runs, corrections, and project artifacts (not generic advice).
-4. **Scope coherently**: define one composable unit of work; avoid overly broad or ultra-narrow skills.
-5. **Design for context efficiency**: keep `SKILL.md` concise, move deep details to `references/`, and load references only when needed.
-6. **Prefer defaults over menus**: choose one default tool/approach and list alternatives only as fallbacks.
-7. **Include reusable control patterns**: gotchas, output templates, and validation loops/checklists where relevant.
-8. **Refine with real execution**: test should-trigger and should-not-trigger prompts, review execution traces, then iterate.
-9. **Copy** an existing skill folder as a starting point: `cp -r skills/hpc skills/your-skill-name`.
-10. **Fill in** the YAML frontmatter (`name`, `description`) and markdown instructions following the progressive disclosure pattern.
-11. **Include optional resources** (scripts, references, assets) as your skill grows.
-12. **Test** against should-trigger and should-not-trigger prompts before submitting a PR.
-13. **Submit** a pull request with your skill and evaluation strategy (see CONTRIBUTING.md).
+3. **Ground from real expertise**: start from real task runs, corrections, and project artifacts, not generic advice.
+4. **Scope coherently**: define one composable unit of work and keep the boundary clear.
+5. **Design for context efficiency**: keep `SKILL.md` concise, move deep detail into `references/`, and add explicit load conditions.
+6. **Prefer defaults over menus**: choose one default tool or approach and use alternatives only as fallbacks.
+7. **Create the skill folder** with `/create-skill` if your agent supports it, or scaffold manually:
+
+  ```bash
+  mkdir -p skills/your-skill-name
+  cp docs/SKILL-TEMPLATE.md skills/your-skill-name/SKILL.md
+  cp skills/document/evals.json skills/your-skill-name/evals.json
+  ```
+
+8. **Fill in** the YAML frontmatter and markdown instructions, then immediately rename `skill_name`, replace the copied prompts, and ensure `name:` matches the folder exactly.
+9. **Include optional resources** (`assets/`, `references/`, `scripts/`) as the workflow needs them.
+10. **Refine with real execution**: test should-trigger and should-not-trigger prompts, review execution traces, and iterate.
+11. **Run the repository validators** before opening a PR:
+
+  ```bash
+  python scripts/validate_individual_skills.py
+  python scripts/validate_evals_schema.py
+  python scripts/validate_cross_skills.py evals/cross-skills.json
+  ```
+
+12. **Submit** a pull request with the skill folder, its `evals.json`, and the prompts or checks you used to validate it.
 
 ### Skill Anatomy
 
@@ -223,22 +246,26 @@ Authoring guidance:
 ```yaml
 ---
 name: your-skill-name
-description: Brief description of when and why to use this skill
+description: |
+  Use this skill when...
+  Triggers: "phrase 1", "phrase 2"
+  Expected output: ...
+license: MIT
 ---
 ```
 
 **Optional fields:**
 ```yaml
-license: MIT (default) | Apache-2.0 | GPL-3.0-or-later
 compatibility: Tool/version requirements
 metadata:
   domain: computational-modeling | documentation | publication | execution
   maturity: alpha | beta | stable
-  audience: modelers | researchers | data scientists
+  audience: modelers | researchers | data-scientists
+  category: documentation | quality-assurance | execution | publication
 ---
 ```
 
-See [CONTRIBUTING.md](CONTRIBUTING.md) and [AGENTS.md](AGENTS.md) for full guidance.
+See [CONTRIBUTING.md](CONTRIBUTING.md), [AGENTS.md](AGENTS.md), and [docs/VALIDATION.md](docs/VALIDATION.md) for full guidance.
 
 ## Roadmap