The universal quality standard for AI agent skills.
Evaluate any SKILL.md β from skills.sh, ClawHub, GitHub, or your local machine.
- π― Comprehensive Evaluation: 7 Anthropic-aligned scoring categories with weighted importance
- π¨ Multiple Output Formats: Terminal (colorful), JSON, and Markdown reports
- π Deterministic Analysis: Reliable, reproducible scoring without requiring API keys
- π Detailed Feedback: Specific findings and actionable recommendations
- β‘ Fast & Reliable: Built with TypeScript for speed and reliability
- π Cross-Platform: Works on Windows, macOS, and Linux
- π GitHub Integration: Score skills directly from GitHub repositories
- π Batch Mode: Compare multiple skills with a summary table
- π£οΈ Verbose Mode: See all findings, not just truncated summaries
npm install -g skillscorenpm install skillscore
npx skillscore ./my-skill/git clone https://github.com/joeynyc/skillscore.git
cd skillscore
npm install
npm run build
npm linkEvaluate a skill directory:
skillscore ./my-skill/# Evaluate a skill
skillscore ./skills/my-skill/
# Evaluate with verbose output (shows all findings)
skillscore ./skills/my-skill/ --verbose# Full GitHub URL (always recognized)
skillscore https://github.com/FrancyJGLisboa/agent-skill-creator
# GitHub shorthand (requires -g/--github flag)
skillscore -g FrancyJGLisboa/agent-skill-creator# JSON output
skillscore ./skills/my-skill/ --json
# Markdown report
skillscore ./skills/my-skill/ --markdown
# Save to file
skillscore ./skills/my-skill/ --output report.md
skillscore ./skills/my-skill/ --json --output score.json# Compare multiple skills (auto-enters batch mode)
skillscore ./skill1 ./skill2 ./skill3
# Explicit batch mode flag
skillscore ./skill1 ./skill2 --batch
# Compare GitHub skills
skillscore -g user/repo1/skill1 user/repo2/skill2 --json# Show version
skillscore --version
# Get help
skillscore --helpπ SKILLSCORE EVALUATION REPORT
============================================================
π Skill: weather-fetcher
Fetches current weather data for any city when the user asks for forecasts or conditions.
Path: ./weather-skill
π― OVERALL SCORE
A- - 92.0% (9.2/10.0 points)
π CATEGORY BREAKDOWN
------------------------------------------------------------
Identity & Metadata ββββββββββββββββββββ 100.0%
YAML frontmatter with valid name/description, proper format, not vague
Score: 10/10 (weight: 20%)
β Frontmatter name: "weather-fetcher" (+2)
β Name format valid (lowercase-hyphen, β€64 chars) (+2)
β Frontmatter description present (+2)
... 3 more findings
Clarity & Instructions ββββββββββββββββββββ 90.0%
Workflow steps, consistent terminology, templates/examples, degrees of freedom
Score: 9/10 (weight: 15%)
β Has structured workflow steps (numbered lists or checklists) (+3)
β Consistent terminology throughout (+2)
β 4 code blocks with templates/examples (+2)
... 2 more findings (use --verbose to see all)
Safety & Security ββββββββββββββββββββ 70.0%
No destructive commands without confirmation, no secret exfil, no privilege escalation
Score: 7/10 (weight: 15%)
β No dangerous destructive commands found (+3)
β No secret exfiltration risk detected (+2)
β Privilege escalation with justification: sudo (+1)
π SUMMARY
------------------------------------------------------------
β
Strengths: Identity & Metadata, Conciseness, Clarity & Instructions, Routing & Scope
β Areas for improvement: Safety & Security
Generated: 3/13/2026, 1:37:51 AM
π BATCH SKILL EVALUATION
Evaluating 3 skill(s)...
[1/3] Processing: ./weather-skill
β
Completed
[2/3] Processing: ./file-backup
β
Completed
[3/3] Processing: user/repo/skill
β
Completed
π COMPARISON SUMMARY
Skill Grade Score Identity Routing Safety Status
weather-fetcher A- 92.0% 100% 100% 70% OK
file-backup B+ 87.0% 90% 80% 90% OK
data-processor A 94.0% 100% 100% 85% OK
π BATCH SUMMARY
β
Successful: 3
π Average Score: 91.0%
SkillScore evaluates skills across 7 weighted categories aligned with Anthropic's official skill documentation:
| Category | Weight | Description |
|---|---|---|
| Identity & Metadata | 20% | YAML frontmatter name/description, lowercase-hyphen format, not vague |
| Conciseness | 15% | Body β€500 lines, progressive disclosure, no over-explaining basics |
| Clarity & Instructions | 15% | Workflow steps, consistent terminology, templates/examples, degrees of freedom |
| Routing & Scope | 15% | WHAT+WHEN description, negative routing, domain vocabulary, third-person voice |
| Robustness | 10% | Error handling in code blocks, validation steps, dependency verification |
| Safety & Security | 15% | No destructive commands, proximity-based secret exfil detection, no privilege escalation |
| Portability & Standards | 10% | No platform-specific paths, MCP tool format, no time-sensitive info, relative paths |
Each category is scored from 0-10 points based on specific criteria:
- Identity & Metadata: Validates YAML frontmatter
name(lowercase-hyphen, β€64 chars, no reserved words) anddescription(β€1024 chars, third person, no XML tags), rejects vague names/descriptions - Conciseness: Enforces the 500-line body limit, checks for progressive disclosure via file references, flags over-explaining basics Claude already knows
- Clarity & Instructions: Checks for numbered steps or checklists, consistent terminology (no synonym pairs used interchangeably), code block examples, and a mix of imperative ("must") and flexible ("consider") guidance
- Routing & Scope: Validates description has action verbs + trigger conditions, negative routing examples ("don't use when..."), domain-specific vocabulary, and third-person voice
- Robustness: Scans code blocks for error handling (try/catch,
||,set -e), validates dependency verification commands (--version,command -v), flags magic constants - Safety & Security: Proximity-based secret exfil detection (secrets + network within 5 lines), destructive command scanning with confirmation check, privilege escalation detection, unbounded loop detection
- Portability & Standards: Flags Windows-style paths, hardcoded absolute paths, validates MCP tool
ServerName:tool_nameformat, detects time-sensitive info (dates, pinned versions)
Complete scoring redesign replacing the original 8 generic categories with 7 categories aligned to Anthropic's official skill documentation:
| Change | Details |
|---|---|
| Frontmatter validation | Skills must have YAML frontmatter with name and description fields |
| Name format checks | Names must be lowercase-hyphen (^[a-z0-9][a-z0-9-]*$), β€64 chars, no reserved words |
| Conciseness scoring | New category enforcing 500-line limit, progressive disclosure, no over-explaining |
| Third-person detection | Descriptions should use third-person voice, not "I/We/My" |
| Proximity-based exfil | Secret + network pattern detection within 5-line proximity windows |
| MCP format validation | MCP tool references must use ServerName:tool_name format |
| Time-sensitive detection | Flags specific dates, "as of", and pinned version references |
| Grade | Score Range | Description |
|---|---|---|
| A+ | 97-100% | Exceptional quality |
| A | 93-96% | Excellent |
| A- | 90-92% | Very good |
| B+ | 87-89% | Good |
| B | 83-86% | Above average |
| B- | 80-82% | Satisfactory |
| C+ | 77-79% | Acceptable |
| C | 73-76% | Fair |
| C- | 70-72% | Needs improvement |
| D+ | 67-69% | Poor |
| D | 65-66% | Very poor |
| D- | 60-64% | Failing |
| F | 0-59% | Unacceptable |
my-skill/
βββ SKILL.md # Main skill definition (REQUIRED)
βββ README.md # Documentation (recommended)
βββ package.json # Dependencies (if applicable)
βββ scripts/ # Executable scripts
β βββ setup.sh
β βββ main.py
βββ examples/ # Usage examples
βββ example.md
---
name: my-awesome-skill
description: Performs [specific task] when the user needs to [trigger condition].
---
# My Awesome Skill
Performs [specific task] using [specific tools/inputs].
## When to Use
Use this skill when you need to [specific task] with [specific tools/inputs].
## When NOT to Use
Don't use this skill when:
- The task is [alternative scenario] β use [other skill] instead
- You need [different capability]
## Dependencies
- Tool 1: Installation instructions (`tool --version` to verify)
- API Key: How to obtain and configure
- Environment: OS requirements
## Workflow
1. Step-by-step instructions
2. Specific commands to run
3. Expected outputs
- [ ] Verify dependencies
- [ ] Confirm configuration
## Output
Results are written to `./output/` as JSON files.
## Error Handling
You must always validate output. Consider retrying on transient failures.
```bash
if ! result=$(./scripts/main.py --input "data"); then
echo "Error: processing failed"
exit 1
fi{
"status": "success",
"result": "Example of what the skill produces"
}- Known constraints
- Platform-specific notes
- Edge cases
See docs/advanced.md for more details.
## π§ API Usage
Use SkillScore programmatically in your Node.js projects:
```typescript
import { SkillParser, SkillScorer, TerminalReporter } from 'skillscore';
import type { Reporter, SkillScore } from 'skillscore';
const parser = new SkillParser();
const scorer = new SkillScorer();
const reporter: Reporter = new TerminalReporter();
async function evaluateSkill(skillPath: string): Promise<SkillScore> {
const skill = await parser.parseSkill(skillPath);
const score = await scorer.scoreSkill(skill);
const report = reporter.generateReport(score);
console.log(report);
return score;
}
All three reporters (TerminalReporter, JsonReporter, MarkdownReporter) implement the Reporter interface.
The parser now extracts additional metadata used by the new scoring rubric:
interface ParsedSkill {
// Existing fields
skillPath: string;
skillMdExists: boolean;
skillMdContent: string;
name: string;
description: string;
files: string[];
metadata: Record<string, unknown>;
structure: FileStructure;
// New in v2.0
frontmatter: Record<string, unknown>; // YAML frontmatter (same ref as metadata)
bodyContent: string; // SKILL.md after stripping frontmatter
bodyLineCount: number; // Line count of body
nameSource: 'frontmatter' | 'heading' | 'fallback';
descriptionSource: 'frontmatter' | 'inline' | 'inferred' | 'none';
referencedFiles: string[]; // Markdown links extracted from SKILL.md
}Usage: skillscore [options] <path...>
Arguments:
path Path(s) to skill directory, GitHub URL, or shorthand
Options:
-V, --version Output the version number
-j, --json Output in JSON format
-m, --markdown Output in Markdown format
-o, --output <file> Write output to file
-v, --verbose Show ALL findings (not just truncated)
-b, --batch Batch mode for comparing multiple skills
-g, --github Treat shorthand paths as GitHub repos (user/repo/path)
-h, --help Display help for command
# Run all tests
npm test
# Run tests in watch mode
npm test
# Run tests once
npm run test:run
# Lint code
npm run lint
# Build project
npm run buildWe welcome contributions! Here's how to get started:
git clone https://github.com/joeynyc/skillscore.git
cd skillscore
npm install
npm run build
npm link
# Run in development mode
npm run dev ./test-skill/
# Build for production
npm run buildnpm test- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for new functionality
- Ensure all tests pass (
npm test) - Lint your code (
npm run lint) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Use TypeScript for all new code
- Follow existing code style (enforced by ESLint)
- Add tests for new features
- Update documentation for API changes
- Keep commits focused and descriptive
Error: "Path does not exist"
- Check for typos in the path
- Ensure you have permission to read the directory
- Verify the path points to a directory, not a file
Error: "No SKILL.md file found"
- Skills must contain a SKILL.md file
- Check if you're pointing to the right directory
- The file must be named exactly "SKILL.md"
Error: "Git is not available"
- Install Git to clone GitHub repositories
- macOS:
xcode-select --install - Ubuntu:
sudo apt-get install git - Windows: Download from git-scm.com
Scores seem too high/low
- Scoring is calibrated against real-world skills
- See the scoring methodology above
- Consider the specific criteria for each category
- π Report Issues
- π¬ Discussions
- π Documentation
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by the need for quality assessment in AI agent skills
- Built for the OpenClaw and Claude Code communities
- Thanks to all contributors and skill creators
- Scoring methodology aligned with Anthropic's official skill documentation
Real-world skills scored with SkillScore v2.0:
- FrancyJGLisboa/agent-skill-creator: 83.5% (B) - Perfect identity & robustness, needs negative routing and trimming (617 lines)
- gapmiss/obsidian-plugin-skill: 52% (F) - No frontmatter, weak routing signals, missing structured workflow
- skill-creator (local): 86% (B) - Strong identity & conciseness (353 lines, 6 file refs), needs error handling in code blocks
Help us improve AI agent skills, one evaluation at a time