From 2769a629f2c46ef1117f1b95fc3213be0f90e183 Mon Sep 17 00:00:00 2001
From: Cody Hart <mrcodyhart@gmail.com>
Date: Sun, 18 Jan 2026 09:19:53 -0500
Subject: [PATCH 1/2] improve eval devx

---
 EVALS_GUIDE.md                             | 118 +++++++
 README.md                                  |   9 +
 example/README.md                          |   5 +
 example/team-repo/evals/team-git.yaml      |  41 +++
 example/team-repo/evals/team-quality.yaml  |  50 +++
 example/team-repo/evals/team-security.yaml |  44 +++
 internal/cli/eval.go                       | 362 ++++++++++++++++++++
 internal/cli/root.go                       |   1 +
 internal/eval/eval.go                      | 262 ++++++++++++++
 internal/eval/eval_test.go                 | 378 +++++++++++++++++++++
 internal/eval/templates.go                 | 193 +++++++++++
 internal/eval/templates_test.go            | 204 +++++++++++
 12 files changed, 1667 insertions(+)
 create mode 100644 example/team-repo/evals/team-git.yaml
 create mode 100644 example/team-repo/evals/team-quality.yaml
 create mode 100644 example/team-repo/evals/team-security.yaml
 create mode 100644 internal/eval/eval_test.go
 create mode 100644 internal/eval/templates.go
 create mode 100644 internal/eval/templates_test.go

diff --git a/EVALS_GUIDE.md b/EVALS_GUIDE.md
index 4c51bc2..76e6474 100644
--- a/EVALS_GUIDE.md
+++ b/EVALS_GUIDE.md
@@ -12,6 +12,76 @@ Evals verify that your CLAUDE.md configuration produces the behavior you expect
 
 Evals are **behavioral tests**, not unit tests. They test the emergent behavior that results from your system prompt, not specific code paths.
 
+## Creating Evals
+
+### Quick Start with Templates
+
+The fastest way to create a new eval is with the `create` command:
+
+```bash
+# Interactive wizard
+stag eval create
+
+# Use a specific template
+stag eval create --template security
+stag eval create --template quality
+stag eval create --template language
+stag eval create --template blank
+
+# Copy from an existing eval
+stag eval create --from security-secrets --name my-security
+
+# Save to project instead of personal directory
+stag eval create --project
+
+# Save to ./evals/ for team/community repos
+stag eval create --team
+```
+
+**Destination options:**
+- Default: `~/.config/staghorn/evals/` (personal evals)
+- `--project`: `.staghorn/evals/` (project-specific evals)
+- `--team`: `./evals/` (team/community evals for sharing via git)
+
+Available templates:
+- **security** — Tests for hardcoded secrets, injection vulnerabilities
+- **quality** — Tests for naming conventions, code duplication
+- **language** — Language-specific best practices template
+- **blank** — Minimal template to start from scratch
+
+### Validating Evals
+
+Before running evals (which consume API credits), validate them:
+
+```bash
+# Validate all evals
+stag eval validate
+
+# Validate a specific eval
+stag eval validate my-custom-eval
+```
+
+Validation checks for:
+- Valid assertion types (`llm-rubric`, `contains`, `regex`, etc.)
+- Required fields (`name`, `prompt`, `assert`)
+- Proper YAML structure
+- Naming conventions
+
+Example output:
+```
+Validating 25 eval(s)...
+
+✓ security-secrets (4 tests)
+✓ security-injection (3 tests)
+✗ my-custom-eval (2 tests)
+  error: tests[0].assert[0].type: invalid assertion type "llm_rubric" (did you mean "llm-rubric"?)
+  warning: tests[1]: test "check-patterns" should have a description
+
+23 valid, 1 invalid, 1 warning
+```
+
+The validator provides helpful suggestions for common typos (e.g., `llm_rubric` → `llm-rubric`).
+
 ## Anatomy of an Eval
 
 ```yaml
@@ -267,6 +337,16 @@ assert:
 
 ## Debugging Failed Tests
 
+### Step 0: Validate First
+
+Before debugging runtime failures, ensure your eval is valid:
+
+```bash
+stag eval validate my-eval
+```
+
+This catches common issues like typos in assertion types (e.g., `llm_rubric` instead of `llm-rubric`) without making API calls.
+
 ### Step 1: Run with `--debug`
 
 ```bash
@@ -485,6 +565,44 @@ Each test case = one API call. To minimize costs:
 
 4. **Leverage Promptfoo caching** - Repeated runs with same prompts use cached responses
 
+## Quick Reference
+
+### Commands
+
+```bash
+# Run evals
+stag eval                           # Run all evals
+stag eval security-secrets          # Run specific eval
+stag eval --tag security            # Filter by tag
+stag eval --test "warns-*"          # Filter by test name pattern
+stag eval --debug                   # Show full responses
+
+# Create and validate
+stag eval create                    # Interactive wizard
+stag eval create --template security
+stag eval create --project          # Save to .staghorn/evals/
+stag eval create --team             # Save to ./evals/ for team sharing
+stag eval validate                  # Validate all evals
+stag eval validate my-eval          # Validate specific eval
+
+# List and inspect
+stag eval list                      # List available evals
+stag eval info security-secrets     # Show eval details
+stag eval init                      # Install starter evals
+```
+
+### Valid Assertion Types
+
+| Type           | Description                          |
+|----------------|--------------------------------------|
+| `llm-rubric`   | AI-graded evaluation (most flexible) |
+| `contains`     | Exact string match                   |
+| `contains-any` | Any of the listed strings            |
+| `contains-all` | All of the listed strings            |
+| `not-contains` | String must not appear               |
+| `regex`        | Regular expression match             |
+| `javascript`   | Custom JS assertion function         |
+
 ## Further Reading
 
 - [Promptfoo Documentation](https://promptfoo.dev/docs/intro)
diff --git a/README.md b/README.md
index 29d683c..156c962 100644
--- a/README.md
+++ b/README.md
@@ -157,6 +157,8 @@ The layering means you get shared standards _plus_ your personal style. You neve
 | `stag eval`           | Run behavioral evals against your config          |
 | `stag eval init`      | Install starter evals                             |
 | `stag eval list`      | List available evals                              |
+| `stag eval validate`  | Validate eval definitions without running         |
+| `stag eval create`    | Create a new eval from a template                 |
 | `stag project`        | Manage project-level config                       |
 | `stag team`           | Bootstrap or validate a team standards repo       |
 | `stag version`        | Print version number                              |
@@ -676,6 +678,13 @@ stag eval list --source team   # Filter by source
 stag eval info <name>          # Show eval details
 stag eval init                 # Install starter evals
 stag eval init --project       # Install to project directory
+stag eval validate             # Validate all eval definitions
+stag eval validate <name>      # Validate specific eval
+stag eval create               # Create new eval (interactive)
+stag eval create --template security  # Create from template
+stag eval create --from <eval> # Copy from existing eval
+stag eval create --project     # Save to .staghorn/evals/
+stag eval create --team        # Save to ./evals/ for team sharing
 ```
 
 ## Installation
diff --git a/example/README.md b/example/README.md
index e89279a..2f773ae 100644
--- a/example/README.md
+++ b/example/README.md
@@ -20,6 +20,10 @@ team-repo/
 │   ├── refactor.md        # Suggest refactoring improvements
 │   ├── security-audit.md  # Security vulnerability scan
 │   └── test-gen.md        # Generate unit tests
+├── evals/                 # Behavioral tests
+│   ├── team-security.yaml # Security guidelines tests
+│   ├── team-quality.yaml  # Code quality tests
+│   └── team-git.yaml      # Git conventions tests
 ├── languages/             # Language-specific configs
 │   ├── python.md
 │   ├── go.md
@@ -43,6 +47,7 @@ To use this as your team's standards repo:
 ## Customization
 
 - **CLAUDE.md**: Add your team's general coding standards
+- **evals/**: Write tests to verify Claude follows your guidelines
 - **languages/**: Add configs for languages your team uses
 - **commands/**: Create prompts for common workflows
 - **templates/**: Add project templates for different project types
diff --git a/example/team-repo/evals/team-git.yaml b/example/team-repo/evals/team-git.yaml
new file mode 100644
index 0000000..5697b35
--- /dev/null
+++ b/example/team-repo/evals/team-git.yaml
@@ -0,0 +1,41 @@
+name: team-git
+description: Verify team git conventions are followed
+tags: [git, team]
+
+# Tests aligned with the Git Conventions section of CLAUDE.md:
+# - Write commit messages in imperative mood
+# - Keep commits atomic and focused
+# - Reference issue numbers when applicable
+
+tests:
+  - name: uses-imperative-mood
+    description: Should write commit messages in imperative mood
+    prompt: |
+      Write a commit message for adding user authentication
+    assert:
+      - type: llm-rubric
+        value: Commit message uses imperative mood (e.g., "Add" not "Added" or "Adding")
+      - type: not-contains
+        value: Added
+
+  - name: keeps-commits-atomic
+    description: Should recommend atomic commits
+    prompt: |
+      I made these changes in one commit:
+      - Fixed login bug
+      - Added new dashboard feature
+      - Updated README
+      - Refactored database layer
+
+      Is this commit okay?
+    assert:
+      - type: llm-rubric
+        value: Response should recommend splitting into separate, focused commits
+
+  - name: references-issues
+    description: Should reference issue numbers when applicable
+    prompt: |
+      Write a commit message for fixing the bug described in issue #123
+    assert:
+      - type: contains-any
+        value: ["#123", "issue 123", "closes #123", "fixes #123"]
diff --git a/example/team-repo/evals/team-quality.yaml b/example/team-repo/evals/team-quality.yaml
new file mode 100644
index 0000000..ee7f14f
--- /dev/null
+++ b/example/team-repo/evals/team-quality.yaml
@@ -0,0 +1,50 @@
+name: team-quality
+description: Verify team code quality guidelines are followed
+tags: [quality, team]
+
+# Tests aligned with the Code Style section of CLAUDE.md:
+# - Write clear, self-documenting code
+# - Prefer explicit over implicit
+# - Keep functions small and focused (under 50 lines)
+# - Use meaningful variable and function names
+
+tests:
+  - name: uses-descriptive-names
+    description: Should use meaningful variable and function names
+    prompt: |
+      Write a function to calculate the total price of items in a shopping cart
+    assert:
+      - type: llm-rubric
+        value: Response uses descriptive names like total_price, cart_items, or similar - not single letters or vague names like x, data, or result
+
+  - name: keeps-functions-focused
+    description: Should keep functions small and single-purpose
+    prompt: |
+      Review this function:
+      ```python
+      def process_order(order):
+          # Validate order (10 lines)
+          # Calculate totals (15 lines)
+          # Apply discounts (20 lines)
+          # Update inventory (15 lines)
+          # Send confirmation email (10 lines)
+          # Log analytics (10 lines)
+          pass  # 80+ lines total
+      ```
+    assert:
+      - type: llm-rubric
+        value: Response should recommend splitting into smaller, focused functions
+
+  - name: prefers-explicit-code
+    description: Should prefer explicit over implicit patterns
+    prompt: |
+      Review this Python code:
+      ```python
+      from utils import *
+
+      def process(d):
+          return {k: v for k, v in d.items() if v}
+      ```
+    assert:
+      - type: llm-rubric
+        value: Response should recommend explicit imports and more descriptive variable names
diff --git a/example/team-repo/evals/team-security.yaml b/example/team-repo/evals/team-security.yaml
new file mode 100644
index 0000000..acb9551
--- /dev/null
+++ b/example/team-repo/evals/team-security.yaml
@@ -0,0 +1,44 @@
+name: team-security
+description: Verify team security guidelines are followed
+tags: [security, team]
+
+# Tests aligned with the Security section of CLAUDE.md:
+# - Never commit secrets, API keys, or credentials
+# - Use environment variables for configuration
+# - Validate all user input
+
+tests:
+  - name: warns-about-hardcoded-secrets
+    description: Should warn when code contains hardcoded credentials
+    prompt: |
+      Review this code:
+      ```python
+      DATABASE_PASSWORD = "super_secret_123"
+      conn = connect(password=DATABASE_PASSWORD)
+      ```
+    assert:
+      - type: llm-rubric
+        value: Response must warn about hardcoded password and recommend environment variables
+      - type: not-contains
+        value: looks good
+
+  - name: recommends-env-vars-for-config
+    description: Should recommend environment variables for sensitive config
+    prompt: |
+      How should I configure the API endpoint and auth token for my service?
+    assert:
+      - type: contains-any
+        value: ["environment variable", "os.environ", "os.getenv", ".env", "env var"]
+
+  - name: validates-user-input
+    description: Should recommend input validation for user data
+    prompt: |
+      Review this endpoint:
+      ```python
+      @app.route("/users/<user_id>")
+      def get_user(user_id):
+          return db.query(f"SELECT * FROM users WHERE id = {user_id}")
+      ```
+    assert:
+      - type: llm-rubric
+        value: Response must identify SQL injection risk and recommend parameterized queries or input validation
diff --git a/internal/cli/eval.go b/internal/cli/eval.go
index 1692b76..b7b3922 100644
--- a/internal/cli/eval.go
+++ b/internal/cli/eval.go
@@ -61,6 +61,8 @@ merged config. This helps ensure your team's guidelines are actually followed.`,
 	cmd.AddCommand(NewEvalListCmd())
 	cmd.AddCommand(NewEvalInitCmd())
 	cmd.AddCommand(NewEvalInfoCmd())
+	cmd.AddCommand(NewEvalValidateCmd())
+	cmd.AddCommand(NewEvalCreateCmd())
 
 	return cmd
 }
@@ -572,3 +574,363 @@ func countTests(evals []*eval.Eval) int {
 	}
 	return total
 }
+
+// NewEvalValidateCmd creates the 'eval validate' command.
+func NewEvalValidateCmd() *cobra.Command {
+	return &cobra.Command{
+		Use:   "validate [name]",
+		Short: "Validate eval definitions without running them",
+		Long: `Validates eval YAML files for correctness before running.
+
+Checks for:
+- Valid assertion types (llm-rubric, contains, regex, etc.)
+- Required fields (name, prompt, assert)
+- Proper YAML structure
+- Naming conventions
+
+Returns exit code 1 if any errors are found.`,
+		Example: `  staghorn eval validate                    # Validate all evals
+  staghorn eval validate security-secrets   # Validate specific eval`,
+		Args: cobra.MaximumNArgs(1),
+		RunE: func(cmd *cobra.Command, args []string) error {
+			name := ""
+			if len(args) == 1 {
+				name = args[0]
+			}
+			return runEvalValidate(name)
+		},
+	}
+}
+
+func runEvalValidate(name string) error {
+	evals, err := loadEvals()
+	if err != nil {
+		return err
+	}
+
+	if len(evals) == 0 {
+		fmt.Println("No evals found.")
+		fmt.Println()
+		fmt.Println("Run " + info("staghorn eval init") + " to install starter evals.")
+		return nil
+	}
+
+	// Filter by name if provided
+	if name != "" {
+		var filtered []*eval.Eval
+		for _, e := range evals {
+			if e.Name == name {
+				filtered = append(filtered, e)
+				break
+			}
+		}
+		if len(filtered) == 0 {
+			return fmt.Errorf("eval '%s' not found", name)
+		}
+		evals = filtered
+	}
+
+	fmt.Printf("Validating %d eval(s)...\n", len(evals))
+	fmt.Println()
+
+	totalErrors := 0
+	totalWarnings := 0
+	validCount := 0
+	invalidCount := 0
+
+	for _, e := range evals {
+		errors := e.Validate()
+		errorCount, warningCount := eval.CountByLevel(errors)
+		totalErrors += errorCount
+		totalWarnings += warningCount
+
+		if errorCount > 0 {
+			invalidCount++
+			fmt.Printf("%s %s (%d tests)\n", danger("✗"), e.Name, e.TestCount())
+			for _, err := range errors {
+				prefix := "  "
+				if err.Level == eval.ValidationLevelError {
+					prefix += danger("error: ")
+				} else {
+					prefix += warning("warning: ")
+				}
+				fmt.Printf("%s%s: %s\n", prefix, err.Field, err.Message)
+			}
+		} else if warningCount > 0 {
+			validCount++
+			fmt.Printf("%s %s (%d tests)\n", success("✓"), e.Name, e.TestCount())
+			for _, err := range errors {
+				fmt.Printf("  %s%s: %s\n", warning("warning: "), err.Field, err.Message)
+			}
+		} else {
+			validCount++
+			fmt.Printf("%s %s (%d tests)\n", success("✓"), e.Name, e.TestCount())
+		}
+	}
+
+	fmt.Println()
+	summary := fmt.Sprintf("%d valid", validCount)
+	if invalidCount > 0 {
+		summary += fmt.Sprintf(", %s", danger(fmt.Sprintf("%d invalid", invalidCount)))
+	}
+	if totalWarnings > 0 {
+		summary += fmt.Sprintf(", %s", warning(fmt.Sprintf("%d warning(s)", totalWarnings)))
+	}
+	fmt.Println(summary)
+
+	if totalErrors > 0 {
+		return fmt.Errorf("%d eval(s) have validation errors", invalidCount)
+	}
+
+	return nil
+}
+
+// NewEvalCreateCmd creates the 'eval create' command.
+func NewEvalCreateCmd() *cobra.Command {
+	var project bool
+	var team bool
+	var templateName string
+	var fromEval string
+	var evalName string
+	var description string
+
+	cmd := &cobra.Command{
+		Use:   "create",
+		Short: "Create a new eval from a template",
+		Long: `Creates a new eval file from a template or existing eval.
+
+Available templates:
+- security: Security-focused eval for testing security guidelines
+- quality: Code quality eval for testing style and best practices
+- language: Language-specific eval template
+- blank: Minimal blank template to start from scratch
+
+Destination options:
+- Default: ~/.config/staghorn/evals/ (personal evals)
+- --project: .staghorn/evals/ (project-specific evals)
+- --team: ./evals/ (team/community evals for sharing via git)`,
+		Example: `  staghorn eval create                              # Interactive wizard
+  staghorn eval create --template security          # Use security template
+  staghorn eval create --from security-secrets      # Copy existing eval
+  staghorn eval create --project                    # Save to project directory
+  staghorn eval create --team                       # Save to ./evals/ for team sharing`,
+		RunE: func(cmd *cobra.Command, args []string) error {
+			return runEvalCreate(project, team, templateName, fromEval, evalName, description)
+		},
+	}
+
+	cmd.Flags().BoolVar(&project, "project", false, "Save to project directory (.staghorn/evals/) instead of personal")
+	cmd.Flags().BoolVar(&team, "team", false, "Save to ./evals/ for team/community sharing via git")
+	cmd.Flags().StringVar(&templateName, "template", "", "Template to use (security, quality, language, blank)")
+	cmd.Flags().StringVar(&fromEval, "from", "", "Copy from an existing eval")
+	cmd.Flags().StringVar(&evalName, "name", "", "Name for the new eval")
+	cmd.Flags().StringVar(&description, "description", "", "Description for the new eval")
+
+	return cmd
+}
+
+func runEvalCreate(project, team bool, templateName, fromEval, evalName, description string) error {
+	paths := config.NewPaths()
+
+	// Check for conflicting flags
+	if project && team {
+		return fmt.Errorf("cannot use both --project and --team flags")
+	}
+
+	// Determine target directory
+	var targetDir string
+	var targetLabel string
+	if team {
+		// Team evals go to ./evals/ in the current directory
+		cwd, err := os.Getwd()
+		if err != nil {
+			return fmt.Errorf("failed to get current directory: %w", err)
+		}
+		targetDir = cwd + "/evals"
+		targetLabel = "./evals/"
+	} else if project {
+		projectRoot := findProjectRoot()
+		if projectRoot == "" {
+			return fmt.Errorf("no project root found (looking for .git or .staghorn directory)")
+		}
+		targetDir = config.ProjectEvalsDir(projectRoot)
+		targetLabel = ".staghorn/evals/"
+	} else {
+		targetDir = paths.PersonalEvals
+		targetLabel = "~/.config/staghorn/evals/"
+	}
+
+	// Ensure target directory exists
+	if err := os.MkdirAll(targetDir, 0755); err != nil {
+		return fmt.Errorf("failed to create directory: %w", err)
+	}
+
+	var content string
+	var err error
+
+	if fromEval != "" {
+		// Copy from existing eval
+		content, err = copyFromExistingEval(fromEval, evalName, description)
+		if err != nil {
+			return err
+		}
+	} else {
+		// Use template (interactive if not specified)
+		content, err = createFromTemplate(templateName, evalName, description)
+		if err != nil {
+			return err
+		}
+	}
+
+	// Parse to get the name for the filename
+	parsedEval, err := eval.Parse(content, eval.SourcePersonal, "")
+	if err != nil {
+		return fmt.Errorf("generated invalid eval: %w", err)
+	}
+
+	// Check if file already exists
+	filename := parsedEval.Name + ".yaml"
+	filepath := targetDir + "/" + filename
+	if _, err := os.Stat(filepath); err == nil {
+		return fmt.Errorf("eval file already exists: %s", filepath)
+	}
+
+	// Write file
+	if err := os.WriteFile(filepath, []byte(content), 0644); err != nil {
+		return fmt.Errorf("failed to write file: %w", err)
+	}
+
+	printSuccess("Created %s%s", targetLabel, filename)
+	fmt.Println()
+	fmt.Println("Next steps:")
+	fmt.Printf("  1. Edit the eval file to customize tests\n")
+	fmt.Printf("  2. Run: %s\n", info("staghorn eval validate "+parsedEval.Name))
+	fmt.Printf("  3. Run: %s\n", info("staghorn eval "+parsedEval.Name))
+
+	return nil
+}
+
+func copyFromExistingEval(fromName, newName, description string) (string, error) {
+	evals, err := loadEvals()
+	if err != nil {
+		return "", err
+	}
+
+	var source *eval.Eval
+	for _, e := range evals {
+		if e.Name == fromName {
+			source = e
+			break
+		}
+	}
+	if source == nil {
+		return "", fmt.Errorf("eval '%s' not found", fromName)
+	}
+
+	// Read the original file
+	content, err := os.ReadFile(source.FilePath)
+	if err != nil {
+		return "", fmt.Errorf("failed to read source eval: %w", err)
+	}
+
+	// Get new name interactively if not provided
+	if newName == "" {
+		fmt.Print("Name for new eval: ")
+		if _, err := fmt.Scanln(&newName); err != nil {
+			return "", fmt.Errorf("failed to read input: %w", err)
+		}
+		if newName == "" {
+			return "", fmt.Errorf("name is required")
+		}
+	}
+
+	// Get description interactively if not provided
+	if description == "" {
+		fmt.Print("Description (press enter to keep original): ")
+		var input string
+		// Ignore error here - empty input on enter is expected
+		_, _ = fmt.Scanln(&input)
+		if input != "" {
+			description = input
+		}
+	}
+
+	// Replace name in content
+	result := strings.Replace(string(content), "name: "+source.Name, "name: "+newName, 1)
+
+	// Replace description if provided
+	if description != "" && source.Description != "" {
+		result = strings.Replace(result, "description: "+source.Description, "description: "+description, 1)
+	}
+
+	return result, nil
+}
+
+func createFromTemplate(templateName, evalName, description string) (string, error) {
+	// Interactive mode if template not specified
+	if templateName == "" {
+		fmt.Println("Select a template:")
+		for i, t := range eval.Templates {
+			fmt.Printf("  %d. %s - %s\n", i+1, t.Name, t.Description)
+		}
+		fmt.Print("Choice (1-4): ")
+		var choice int
+		if _, err := fmt.Scanln(&choice); err != nil {
+			return "", fmt.Errorf("failed to read input: %w", err)
+		}
+		if choice < 1 || choice > len(eval.Templates) {
+			return "", fmt.Errorf("invalid choice")
+		}
+		templateName = eval.Templates[choice-1].Name
+	}
+
+	// Get name interactively if not provided
+	if evalName == "" {
+		fmt.Print("Name for new eval: ")
+		if _, err := fmt.Scanln(&evalName); err != nil {
+			return "", fmt.Errorf("failed to read input: %w", err)
+		}
+		if evalName == "" {
+			return "", fmt.Errorf("name is required")
+		}
+	}
+
+	// Get description interactively if not provided
+	if description == "" {
+		fmt.Print("Description: ")
+		// Ignore error - empty input on enter uses default
+		_, _ = fmt.Scanln(&description)
+		if description == "" {
+			description = "Custom eval"
+		}
+	}
+
+	// Get tags
+	fmt.Print("Tags (comma-separated, or press enter for none): ")
+	var tagsInput string
+	// Ignore error - empty input on enter is expected
+	_, _ = fmt.Scanln(&tagsInput)
+	var tags []string
+	if tagsInput != "" {
+		for _, tag := range strings.Split(tagsInput, ",") {
+			tag = strings.TrimSpace(tag)
+			if tag != "" {
+				tags = append(tags, tag)
+			}
+		}
+	}
+
+	// Render template
+	vars := eval.TemplateVars{
+		Name:        evalName,
+		Description: description,
+		Tags:        tags,
+	}
+
+	content, err := eval.RenderTemplateByName(templateName, vars)
+	if err != nil {
+		return "", err
+	}
+
+	return content, nil
+}
diff --git a/internal/cli/root.go b/internal/cli/root.go
index 2923b85..2e88b4a 100644
--- a/internal/cli/root.go
+++ b/internal/cli/root.go
@@ -20,6 +20,7 @@ var (
 
 	success = color.New(color.FgGreen).SprintFunc()
 	warning = color.New(color.FgYellow).SprintFunc()
+	danger  = color.New(color.FgRed).SprintFunc()
 	info    = color.New(color.FgCyan).SprintFunc()
 	dim     = color.New(color.Faint).SprintFunc()
 )
diff --git a/internal/eval/eval.go b/internal/eval/eval.go
index 480385a..c1f69dc 100644
--- a/internal/eval/eval.go
+++ b/internal/eval/eval.go
@@ -5,11 +5,38 @@ import (
 	"fmt"
 	"os"
 	"path/filepath"
+	"regexp"
 	"strings"
 
 	"gopkg.in/yaml.v3"
 )
 
+// ValidAssertionTypes lists all valid assertion types supported by Promptfoo.
+var ValidAssertionTypes = []string{
+	"llm-rubric",
+	"contains",
+	"contains-any",
+	"contains-all",
+	"not-contains",
+	"regex",
+	"javascript",
+}
+
+// ValidationLevel indicates the severity of a validation issue.
+type ValidationLevel string
+
+const (
+	ValidationLevelError   ValidationLevel = "error"
+	ValidationLevelWarning ValidationLevel = "warning"
+)
+
+// ValidationError represents a single validation issue.
+type ValidationError struct {
+	Field   string          // e.g., "tests[0].assert[0].type"
+	Message string          // Human-readable error message
+	Level   ValidationLevel // error or warning
+}
+
 // Source indicates where an eval came from.
 type Source string
 
@@ -255,3 +282,238 @@ func (e *Eval) FilterTests(testFilter string) *Eval {
 	result.Tests = filtered
 	return &result
 }
+
+// Validate performs detailed validation of the eval and returns any issues found.
+// Unlike Parse, which fails fast on critical errors, Validate collects all issues
+// including warnings for non-critical problems.
+func (e *Eval) Validate() []ValidationError {
+	var errors []ValidationError
+
+	// Check eval-level fields
+	if e.Name == "" {
+		errors = append(errors, ValidationError{
+			Field:   "name",
+			Message: "eval must have a name",
+			Level:   ValidationLevelError,
+		})
+	} else if !isValidName(e.Name) {
+		errors = append(errors, ValidationError{
+			Field:   "name",
+			Message: "name should contain only lowercase letters, numbers, and hyphens",
+			Level:   ValidationLevelWarning,
+		})
+	}
+
+	if e.Description == "" {
+		errors = append(errors, ValidationError{
+			Field:   "description",
+			Message: "eval should have a description",
+			Level:   ValidationLevelWarning,
+		})
+	}
+
+	// Validate tags
+	for i, tag := range e.Tags {
+		if !isValidTag(tag) {
+			errors = append(errors, ValidationError{
+				Field:   fmt.Sprintf("tags[%d]", i),
+				Message: fmt.Sprintf("tag %q should contain only lowercase letters, numbers, and hyphens", tag),
+				Level:   ValidationLevelWarning,
+			})
+		}
+	}
+
+	// Check tests
+	if len(e.Tests) == 0 {
+		errors = append(errors, ValidationError{
+			Field:   "tests",
+			Message: "eval must have at least one test",
+			Level:   ValidationLevelError,
+		})
+	}
+
+	for i, test := range e.Tests {
+		testErrors := validateTest(test, i)
+		errors = append(errors, testErrors...)
+	}
+
+	return errors
+}
+
+// validateTest validates a single test and returns any issues.
+func validateTest(test Test, index int) []ValidationError {
+	var errors []ValidationError
+	prefix := fmt.Sprintf("tests[%d]", index)
+
+	if test.Name == "" {
+		errors = append(errors, ValidationError{
+			Field:   prefix + ".name",
+			Message: "test must have a name",
+			Level:   ValidationLevelError,
+		})
+	}
+
+	if test.Description == "" {
+		errors = append(errors, ValidationError{
+			Field:   prefix,
+			Message: fmt.Sprintf("test %q should have a description", test.Name),
+			Level:   ValidationLevelWarning,
+		})
+	}
+
+	if test.Prompt == "" {
+		errors = append(errors, ValidationError{
+			Field:   prefix + ".prompt",
+			Message: fmt.Sprintf("test %q must have a prompt", test.Name),
+			Level:   ValidationLevelError,
+		})
+	} else if strings.TrimSpace(test.Prompt) == "" {
+		errors = append(errors, ValidationError{
+			Field:   prefix + ".prompt",
+			Message: fmt.Sprintf("test %q has an empty prompt (whitespace only)", test.Name),
+			Level:   ValidationLevelError,
+		})
+	}
+
+	if len(test.Assert) == 0 {
+		errors = append(errors, ValidationError{
+			Field:   prefix + ".assert",
+			Message: fmt.Sprintf("test %q must have at least one assertion", test.Name),
+			Level:   ValidationLevelError,
+		})
+	}
+
+	for j, assertion := range test.Assert {
+		assertErrors := validateAssertion(assertion, prefix, j)
+		errors = append(errors, assertErrors...)
+	}
+
+	return errors
+}
+
+// validateAssertion validates a single assertion and returns any issues.
+func validateAssertion(assertion Assertion, testPrefix string, index int) []ValidationError {
+	var errors []ValidationError
+	field := fmt.Sprintf("%s.assert[%d]", testPrefix, index)
+
+	if assertion.Type == "" {
+		errors = append(errors, ValidationError{
+			Field:   field + ".type",
+			Message: "assertion must have a type",
+			Level:   ValidationLevelError,
+		})
+		return errors
+	}
+
+	if !isValidAssertionType(assertion.Type) {
+		suggestion := suggestAssertionType(assertion.Type)
+		msg := fmt.Sprintf("invalid assertion type %q", assertion.Type)
+		if suggestion != "" {
+			msg += fmt.Sprintf(" (did you mean %q?)", suggestion)
+		}
+		errors = append(errors, ValidationError{
+			Field:   field + ".type",
+			Message: msg,
+			Level:   ValidationLevelError,
+		})
+	}
+
+	if assertion.Value == nil {
+		errors = append(errors, ValidationError{
+			Field:   field + ".value",
+			Message: "assertion must have a value",
+			Level:   ValidationLevelError,
+		})
+	}
+
+	return errors
+}
+
+// isValidAssertionType checks if the assertion type is valid.
+func isValidAssertionType(t string) bool {
+	for _, valid := range ValidAssertionTypes {
+		if t == valid {
+			return true
+		}
+	}
+	return false
+}
+
+// suggestAssertionType suggests a valid assertion type based on the invalid input.
+func suggestAssertionType(invalid string) string {
+	// Normalize input for comparison
+	normalized := strings.ToLower(strings.ReplaceAll(invalid, "_", "-"))
+
+	// Check for exact match after normalization
+	for _, valid := range ValidAssertionTypes {
+		if normalized == valid {
+			return valid
+		}
+	}
+
+	// Check for partial matches
+	for _, valid := range ValidAssertionTypes {
+		if strings.Contains(normalized, strings.ReplaceAll(valid, "-", "")) ||
+			strings.Contains(strings.ReplaceAll(valid, "-", ""), normalized) {
+			return valid
+		}
+	}
+
+	// Check for common typos
+	typoMap := map[string]string{
+		"llm_rubric":   "llm-rubric",
+		"llmrubric":    "llm-rubric",
+		"rubric":       "llm-rubric",
+		"contain":      "contains",
+		"notcontains":  "not-contains",
+		"not_contains": "not-contains",
+		"containsall":  "contains-all",
+		"containsany":  "contains-any",
+		"contains_all": "contains-all",
+		"contains_any": "contains-any",
+		"js":           "javascript",
+		"regexp":       "regex",
+	}
+
+	if suggestion, ok := typoMap[normalized]; ok {
+		return suggestion
+	}
+
+	return ""
+}
+
+// isValidName checks if a name follows naming conventions.
+func isValidName(name string) bool {
+	// Allow lowercase letters, numbers, and hyphens
+	matched, _ := regexp.MatchString(`^[a-z0-9][a-z0-9-]*[a-z0-9]$|^[a-z0-9]$`, name)
+	return matched
+}
+
+// isValidTag checks if a tag follows naming conventions.
+func isValidTag(tag string) bool {
+	// Allow lowercase letters, numbers, and hyphens
+	matched, _ := regexp.MatchString(`^[a-z0-9][a-z0-9-]*[a-z0-9]$|^[a-z0-9]$`, tag)
+	return matched
+}
+
+// HasErrors returns true if there are any errors (not just warnings).
+func HasErrors(errors []ValidationError) bool {
+	for _, err := range errors {
+		if err.Level == ValidationLevelError {
+			return true
+		}
+	}
+	return false
+}
+
+// CountByLevel counts errors and warnings separately.
+func CountByLevel(errors []ValidationError) (errorCount, warningCount int) {
+	for _, err := range errors {
+		if err.Level == ValidationLevelError {
+			errorCount++
+		} else {
+			warningCount++
+		}
+	}
+	return
+}
diff --git a/internal/eval/eval_test.go b/internal/eval/eval_test.go
new file mode 100644
index 0000000..4649de2
--- /dev/null
+++ b/internal/eval/eval_test.go
@@ -0,0 +1,378 @@
+package eval
+
+import (
+	"testing"
+)
+
+func TestValidate_ValidEval(t *testing.T) {
+	e := &Eval{
+		Name:        "test-eval",
+		Description: "A test eval",
+		Tags:        []string{"security", "test"},
+		Tests: []Test{
+			{
+				Name:        "test-case",
+				Description: "A test case",
+				Prompt:      "Review this code",
+				Assert: []Assertion{
+					{Type: "llm-rubric", Value: "Response is helpful"},
+				},
+			},
+		},
+	}
+
+	errors := e.Validate()
+	if HasErrors(errors) {
+		t.Errorf("expected no errors for valid eval, got: %v", errors)
+	}
+}
+
+func TestValidate_InvalidAssertionType(t *testing.T) {
+	e := &Eval{
+		Name:        "test-eval",
+		Description: "A test eval",
+		Tests: []Test{
+			{
+				Name:        "test-case",
+				Description: "A test case",
+				Prompt:      "Review this code",
+				Assert: []Assertion{
+					{Type: "llm_rubric", Value: "Response is helpful"},
+				},
+			},
+		},
+	}
+
+	errors := e.Validate()
+	if !HasErrors(errors) {
+		t.Error("expected error for invalid assertion type")
+	}
+
+	// Check for suggestion
+	found := false
+	for _, err := range errors {
+		if err.Level == ValidationLevelError && err.Field == "tests[0].assert[0].type" {
+			found = true
+			if err.Message == "" {
+				t.Error("expected error message")
+			}
+			// Should suggest llm-rubric
+			if !contains(err.Message, "llm-rubric") {
+				t.Errorf("expected suggestion for llm-rubric, got: %s", err.Message)
+			}
+		}
+	}
+	if !found {
+		t.Error("expected error for tests[0].assert[0].type")
+	}
+}
+
+func TestValidate_MissingName(t *testing.T) {
+	e := &Eval{
+		Description: "A test eval",
+		Tests: []Test{
+			{
+				Name:   "test-case",
+				Prompt: "Review this code",
+				Assert: []Assertion{
+					{Type: "contains", Value: "test"},
+				},
+			},
+		},
+	}
+
+	errors := e.Validate()
+	if !HasErrors(errors) {
+		t.Error("expected error for missing name")
+	}
+
+	found := false
+	for _, err := range errors {
+		if err.Field == "name" && err.Level == ValidationLevelError {
+			found = true
+		}
+	}
+	if !found {
+		t.Error("expected error for name field")
+	}
+}
+
+func TestValidate_MissingPrompt(t *testing.T) {
+	e := &Eval{
+		Name:        "test-eval",
+		Description: "A test eval",
+		Tests: []Test{
+			{
+				Name:        "test-case",
+				Description: "A test case",
+				Prompt:      "", // Empty prompt
+				Assert: []Assertion{
+					{Type: "contains", Value: "test"},
+				},
+			},
+		},
+	}
+
+	errors := e.Validate()
+	if !HasErrors(errors) {
+		t.Error("expected error for missing prompt")
+	}
+
+	found := false
+	for _, err := range errors {
+		if err.Field == "tests[0].prompt" && err.Level == ValidationLevelError {
+			found = true
+		}
+	}
+	if !found {
+		t.Error("expected error for tests[0].prompt field")
+	}
+}
+
+func TestValidate_EmptyAssertions(t *testing.T) {
+	e := &Eval{
+		Name:        "test-eval",
+		Description: "A test eval",
+		Tests: []Test{
+			{
+				Name:        "test-case",
+				Description: "A test case",
+				Prompt:      "Review this code",
+				Assert:      []Assertion{}, // Empty assertions
+			},
+		},
+	}
+
+	errors := e.Validate()
+	if !HasErrors(errors) {
+		t.Error("expected error for empty assertions")
+	}
+
+	found := false
+	for _, err := range errors {
+		if err.Field == "tests[0].assert" && err.Level == ValidationLevelError {
+			found = true
+		}
+	}
+	if !found {
+		t.Error("expected error for tests[0].assert field")
+	}
+}
+
+func TestValidate_MissingDescription(t *testing.T) {
+	e := &Eval{
+		Name: "test-eval",
+		// Missing description
+		Tests: []Test{
+			{
+				Name:   "test-case",
+				Prompt: "Review this code",
+				Assert: []Assertion{
+					{Type: "contains", Value: "test"},
+				},
+			},
+		},
+	}
+
+	errors := e.Validate()
+	// Should not have errors, only warnings
+	if HasErrors(errors) {
+		t.Error("expected only warnings for missing description, not errors")
+	}
+
+	// Should have warnings
+	_, warningCount := CountByLevel(errors)
+	if warningCount == 0 {
+		t.Error("expected warnings for missing descriptions")
+	}
+
+	// Check for eval description warning
+	foundEvalWarning := false
+	foundTestWarning := false
+	for _, err := range errors {
+		if err.Field == "description" && err.Level == ValidationLevelWarning {
+			foundEvalWarning = true
+		}
+		if err.Field == "tests[0]" && err.Level == ValidationLevelWarning {
+			foundTestWarning = true
+		}
+	}
+	if !foundEvalWarning {
+		t.Error("expected warning for eval description")
+	}
+	if !foundTestWarning {
+		t.Error("expected warning for test description")
+	}
+}
+
+func TestValidate_MultipleErrors(t *testing.T) {
+	e := &Eval{
+		Name: "", // Error 1: missing name
+		Tests: []Test{
+			{
+				Name:   "",  // Error 2: missing test name
+				Prompt: "",  // Error 3: missing prompt
+				Assert: nil, // Error 4: missing assertions
+			},
+		},
+	}
+
+	errors := e.Validate()
+	errorCount, _ := CountByLevel(errors)
+
+	// Should have at least 4 errors
+	if errorCount < 4 {
+		t.Errorf("expected at least 4 errors, got %d", errorCount)
+	}
+}
+
+func TestValidate_AllAssertionTypes(t *testing.T) {
+	for _, assertType := range ValidAssertionTypes {
+		e := &Eval{
+			Name:        "test-eval",
+			Description: "A test eval",
+			Tests: []Test{
+				{
+					Name:        "test-case",
+					Description: "A test case",
+					Prompt:      "Review this code",
+					Assert: []Assertion{
+						{Type: assertType, Value: "test value"},
+					},
+				},
+			},
+		}
+
+		errors := e.Validate()
+		if HasErrors(errors) {
+			t.Errorf("expected no errors for valid assertion type %q, got: %v", assertType, errors)
+		}
+	}
+}
+
+func TestSuggestAssertionType(t *testing.T) {
+	tests := []struct {
+		input    string
+		expected string
+	}{
+		{"llm_rubric", "llm-rubric"},
+		{"LLM-RUBRIC", "llm-rubric"},
+		{"rubric", "llm-rubric"},
+		{"contain", "contains"},
+		{"not_contains", "not-contains"},
+		{"contains_all", "contains-all"},
+		{"contains_any", "contains-any"},
+		{"js", "javascript"},
+		{"regexp", "regex"},
+		{"invalid", ""},
+		{"xyz", ""},
+	}
+
+	for _, tc := range tests {
+		result := suggestAssertionType(tc.input)
+		if result != tc.expected {
+			t.Errorf("suggestAssertionType(%q) = %q, expected %q", tc.input, result, tc.expected)
+		}
+	}
+}
+
+func TestIsValidName(t *testing.T) {
+	tests := []struct {
+		name  string
+		valid bool
+	}{
+		{"security-secrets", true},
+		{"test", true},
+		{"test-123", true},
+		{"a", true},
+		{"Test", false},      // Uppercase
+		{"test_name", false}, // Underscore
+		{"-test", false},     // Leading hyphen
+		{"test-", false},     // Trailing hyphen
+		{"test name", false}, // Space
+		{"test.name", false}, // Dot
+	}
+
+	for _, tc := range tests {
+		result := isValidName(tc.name)
+		if result != tc.valid {
+			t.Errorf("isValidName(%q) = %v, expected %v", tc.name, result, tc.valid)
+		}
+	}
+}
+
+func TestIsValidTag(t *testing.T) {
+	tests := []struct {
+		tag   string
+		valid bool
+	}{
+		{"security", true},
+		{"python", true},
+		{"code-quality", true},
+		{"Security", false},     // Uppercase
+		{"code_quality", false}, // Underscore
+	}
+
+	for _, tc := range tests {
+		result := isValidTag(tc.tag)
+		if result != tc.valid {
+			t.Errorf("isValidTag(%q) = %v, expected %v", tc.tag, result, tc.valid)
+		}
+	}
+}
+
+func TestCountByLevel(t *testing.T) {
+	errors := []ValidationError{
+		{Level: ValidationLevelError},
+		{Level: ValidationLevelError},
+		{Level: ValidationLevelWarning},
+		{Level: ValidationLevelWarning},
+		{Level: ValidationLevelWarning},
+	}
+
+	errorCount, warningCount := CountByLevel(errors)
+	if errorCount != 2 {
+		t.Errorf("expected 2 errors, got %d", errorCount)
+	}
+	if warningCount != 3 {
+		t.Errorf("expected 3 warnings, got %d", warningCount)
+	}
+}
+
+func TestHasErrors(t *testing.T) {
+	// Only warnings
+	warnings := []ValidationError{
+		{Level: ValidationLevelWarning},
+	}
+	if HasErrors(warnings) {
+		t.Error("expected HasErrors to return false for warnings only")
+	}
+
+	// With errors
+	withErrors := []ValidationError{
+		{Level: ValidationLevelWarning},
+		{Level: ValidationLevelError},
+	}
+	if !HasErrors(withErrors) {
+		t.Error("expected HasErrors to return true when errors present")
+	}
+
+	// Empty
+	if HasErrors(nil) {
+		t.Error("expected HasErrors to return false for nil")
+	}
+}
+
+// Helper function
+func contains(s, substr string) bool {
+	return len(s) >= len(substr) && (s == substr || len(s) > 0 && containsSubstring(s, substr))
+}
+
+func containsSubstring(s, substr string) bool {
+	for i := 0; i <= len(s)-len(substr); i++ {
+		if s[i:i+len(substr)] == substr {
+			return true
+		}
+	}
+	return false
+}
diff --git a/internal/eval/templates.go b/internal/eval/templates.go
new file mode 100644
index 0000000..e3b8331
--- /dev/null
+++ b/internal/eval/templates.go
@@ -0,0 +1,193 @@
+package eval
+
+import (
+	"bytes"
+	"fmt"
+	"text/template"
+)
+
+// EvalTemplate represents a starter template for creating evals.
+type EvalTemplate struct {
+	Name        string // Template identifier (e.g., "security")
+	Description string // Human-readable description
+	Content     string // YAML template with Go template placeholders
+}
+
+// TemplateVars holds variables for template rendering.
+type TemplateVars struct {
+	Name        string
+	Description string
+	Tags        []string
+}
+
+// Templates contains all available eval templates.
+var Templates = []EvalTemplate{
+	{
+		Name:        "security",
+		Description: "Security-focused eval for testing security guidelines",
+		Content: `name: {{.Name}}
+description: {{.Description}}
+tags:
+{{- range .Tags}}
+  - {{.}}
+{{- end}}
+
+tests:
+  - name: warns-about-hardcoded-secrets
+    description: Verifies Claude warns about hardcoded API keys and secrets
+    prompt: |
+      Review this code:
+      API_KEY = "sk-12345abcdef"
+      db_password = "super_secret_123"
+    assert:
+      - type: llm-rubric
+        value: Response warns about hardcoded secrets and suggests using environment variables
+
+  - name: identifies-injection-risks
+    description: Verifies Claude identifies potential injection vulnerabilities
+    prompt: |
+      Review this code:
+      query = f"SELECT * FROM users WHERE id = {user_input}"
+    assert:
+      - type: llm-rubric
+        value: Response identifies SQL injection risk and suggests parameterized queries
+`,
+	},
+	{
+		Name:        "quality",
+		Description: "Code quality eval for testing style and best practices",
+		Content: `name: {{.Name}}
+description: {{.Description}}
+tags:
+{{- range .Tags}}
+  - {{.}}
+{{- end}}
+
+tests:
+  - name: suggests-clear-naming
+    description: Verifies Claude suggests better variable names
+    prompt: |
+      Review this code:
+      def f(x, y, z):
+          a = x + y
+          b = a * z
+          return b
+    assert:
+      - type: llm-rubric
+        value: Response suggests more descriptive function and variable names
+
+  - name: identifies-code-duplication
+    description: Verifies Claude identifies duplicated code
+    prompt: |
+      Review this code:
+      def process_user(user):
+          if user.age > 18:
+              print("Adult")
+          return user.name.upper()
+
+      def process_admin(admin):
+          if admin.age > 18:
+              print("Adult")
+          return admin.name.upper()
+    assert:
+      - type: llm-rubric
+        value: Response identifies code duplication and suggests refactoring
+`,
+	},
+	{
+		Name:        "language",
+		Description: "Language-specific eval template",
+		Content: `name: {{.Name}}
+description: {{.Description}}
+tags:
+{{- range .Tags}}
+  - {{.}}
+{{- end}}
+
+tests:
+  - name: follows-language-conventions
+    description: Verifies Claude follows language-specific conventions
+    prompt: |
+      Write a function that calculates the factorial of a number.
+    assert:
+      - type: llm-rubric
+        value: Response follows idiomatic patterns for the target language
+
+  - name: uses-appropriate-error-handling
+    description: Verifies Claude uses proper error handling
+    prompt: |
+      Write code to read a file and parse its JSON contents.
+    assert:
+      - type: llm-rubric
+        value: Response includes proper error handling for file and JSON operations
+`,
+	},
+	{
+		Name:        "blank",
+		Description: "Minimal blank template to start from scratch",
+		Content: `name: {{.Name}}
+description: {{.Description}}
+tags:
+{{- range .Tags}}
+  - {{.}}
+{{- end}}
+
+tests:
+  - name: example-test
+    description: Replace with your test description
+    prompt: |
+      Your prompt here
+    assert:
+      - type: llm-rubric
+        value: Your assertion here
+`,
+	},
+}
+
+// GetTemplate returns a template by name.
+func GetTemplate(name string) (*EvalTemplate, error) {
+	for _, t := range Templates {
+		if t.Name == name {
+			return &t, nil
+		}
+	}
+	return nil, fmt.Errorf("template %q not found", name)
+}
+
+// ListTemplates returns all available template names.
+func ListTemplates() []string {
+	names := make([]string, len(Templates))
+	for i, t := range Templates {
+		names[i] = t.Name
+	}
+	return names
+}
+
+// RenderTemplate renders a template with the given variables.
+func RenderTemplate(t *EvalTemplate, vars TemplateVars) (string, error) {
+	// Ensure tags is not nil
+	if vars.Tags == nil {
+		vars.Tags = []string{}
+	}
+
+	tmpl, err := template.New(t.Name).Parse(t.Content)
+	if err != nil {
+		return "", fmt.Errorf("failed to parse template: %w", err)
+	}
+
+	var buf bytes.Buffer
+	if err := tmpl.Execute(&buf, vars); err != nil {
+		return "", fmt.Errorf("failed to execute template: %w", err)
+	}
+
+	return buf.String(), nil
+}
+
+// RenderTemplateByName gets a template by name and renders it.
+func RenderTemplateByName(name string, vars TemplateVars) (string, error) {
+	t, err := GetTemplate(name)
+	if err != nil {
+		return "", err
+	}
+	return RenderTemplate(t, vars)
+}
diff --git a/internal/eval/templates_test.go b/internal/eval/templates_test.go
new file mode 100644
index 0000000..134286a
--- /dev/null
+++ b/internal/eval/templates_test.go
@@ -0,0 +1,204 @@
+package eval
+
+import (
+	"strings"
+	"testing"
+)
+
+func TestGetTemplate_ValidName(t *testing.T) {
+	for _, name := range []string{"security", "quality", "language", "blank"} {
+		tmpl, err := GetTemplate(name)
+		if err != nil {
+			t.Errorf("GetTemplate(%q) returned error: %v", name, err)
+			continue
+		}
+		if tmpl == nil {
+			t.Errorf("GetTemplate(%q) returned nil template", name)
+			continue
+		}
+		if tmpl.Name != name {
+			t.Errorf("GetTemplate(%q).Name = %q, expected %q", name, tmpl.Name, name)
+		}
+		if tmpl.Description == "" {
+			t.Errorf("GetTemplate(%q) has empty description", name)
+		}
+		if tmpl.Content == "" {
+			t.Errorf("GetTemplate(%q) has empty content", name)
+		}
+	}
+}
+
+func TestGetTemplate_InvalidName(t *testing.T) {
+	tmpl, err := GetTemplate("nonexistent")
+	if err == nil {
+		t.Error("expected error for nonexistent template")
+	}
+	if tmpl != nil {
+		t.Error("expected nil template for nonexistent name")
+	}
+}
+
+func TestListTemplates(t *testing.T) {
+	names := ListTemplates()
+	if len(names) == 0 {
+		t.Error("expected at least one template")
+	}
+
+	// Check that expected templates are present
+	expected := []string{"security", "quality", "language", "blank"}
+	for _, exp := range expected {
+		found := false
+		for _, name := range names {
+			if name == exp {
+				found = true
+				break
+			}
+		}
+		if !found {
+			t.Errorf("expected template %q in list", exp)
+		}
+	}
+}
+
+func TestRenderTemplate_WithVars(t *testing.T) {
+	tmpl, err := GetTemplate("blank")
+	if err != nil {
+		t.Fatalf("GetTemplate failed: %v", err)
+	}
+
+	vars := TemplateVars{
+		Name:        "my-custom-eval",
+		Description: "My custom evaluation",
+		Tags:        []string{"custom", "test"},
+	}
+
+	result, err := RenderTemplate(tmpl, vars)
+	if err != nil {
+		t.Fatalf("RenderTemplate failed: %v", err)
+	}
+
+	// Check that variables were substituted
+	if !strings.Contains(result, "name: my-custom-eval") {
+		t.Error("expected name to be substituted")
+	}
+	if !strings.Contains(result, "description: My custom evaluation") {
+		t.Error("expected description to be substituted")
+	}
+	if !strings.Contains(result, "- custom") {
+		t.Error("expected custom tag to be present")
+	}
+	if !strings.Contains(result, "- test") {
+		t.Error("expected test tag to be present")
+	}
+}
+
+func TestRenderTemplate_EmptyTags(t *testing.T) {
+	tmpl, err := GetTemplate("blank")
+	if err != nil {
+		t.Fatalf("GetTemplate failed: %v", err)
+	}
+
+	vars := TemplateVars{
+		Name:        "my-eval",
+		Description: "Test eval",
+		Tags:        nil, // No tags
+	}
+
+	result, err := RenderTemplate(tmpl, vars)
+	if err != nil {
+		t.Fatalf("RenderTemplate failed: %v", err)
+	}
+
+	// Should still produce valid YAML with empty tags
+	if !strings.Contains(result, "name: my-eval") {
+		t.Error("expected name to be substituted")
+	}
+}
+
+func TestRenderTemplateByName(t *testing.T) {
+	vars := TemplateVars{
+		Name:        "test-eval",
+		Description: "Test description",
+		Tags:        []string{"test"},
+	}
+
+	result, err := RenderTemplateByName("security", vars)
+	if err != nil {
+		t.Fatalf("RenderTemplateByName failed: %v", err)
+	}
+
+	if !strings.Contains(result, "name: test-eval") {
+		t.Error("expected name to be substituted")
+	}
+}
+
+func TestRenderTemplateByName_InvalidTemplate(t *testing.T) {
+	vars := TemplateVars{
+		Name:        "test-eval",
+		Description: "Test description",
+	}
+
+	_, err := RenderTemplateByName("nonexistent", vars)
+	if err == nil {
+		t.Error("expected error for nonexistent template")
+	}
+}
+
+func TestAllTemplates_AreValid(t *testing.T) {
+	vars := TemplateVars{
+		Name:        "test-eval",
+		Description: "Test eval for validation",
+		Tags:        []string{"test"},
+	}
+
+	for _, tmpl := range Templates {
+		t.Run(tmpl.Name, func(t *testing.T) {
+			// Render the template
+			rendered, err := RenderTemplate(&tmpl, vars)
+			if err != nil {
+				t.Fatalf("RenderTemplate failed for %q: %v", tmpl.Name, err)
+			}
+
+			// Parse the rendered YAML as an eval
+			eval, err := Parse(rendered, SourcePersonal, "test.yaml")
+			if err != nil {
+				t.Fatalf("Parse failed for rendered template %q: %v\nRendered:\n%s", tmpl.Name, err, rendered)
+			}
+
+			// Validate the parsed eval
+			errors := eval.Validate()
+			if HasErrors(errors) {
+				t.Errorf("Validation errors for template %q:", tmpl.Name)
+				for _, e := range errors {
+					if e.Level == ValidationLevelError {
+						t.Errorf("  %s: %s", e.Field, e.Message)
+					}
+				}
+			}
+		})
+	}
+}
+
+func TestAllTemplates_HaveRequiredFields(t *testing.T) {
+	for _, tmpl := range Templates {
+		t.Run(tmpl.Name, func(t *testing.T) {
+			if tmpl.Name == "" {
+				t.Error("template has empty name")
+			}
+			if tmpl.Description == "" {
+				t.Error("template has empty description")
+			}
+			if tmpl.Content == "" {
+				t.Error("template has empty content")
+			}
+
+			// Check that template contains required placeholders
+			if !strings.Contains(tmpl.Content, "{{.Name}}") {
+				t.Error("template missing {{.Name}} placeholder")
+			}
+			if !strings.Contains(tmpl.Content, "{{.Description}}") {
+				t.Error("template missing {{.Description}} placeholder")
+			}
+		})
+	}
+}

From 0e42034dda3e232510930126043cade734d8f2d2 Mon Sep 17 00:00:00 2001
From: Cody Hart <mrcodyhart@gmail.com>
Date: Sun, 18 Jan 2026 09:22:10 -0500
Subject: [PATCH 2/2] update changelog

---
 CHANGELOG.md | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 689767f..9049529 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [0.5.0] - 2026-01-18
+
+### Added
+
+- `stag eval validate` command to validate eval YAML files before running
+  - Checks assertion types, required fields, YAML structure, and naming conventions
+  - Provides helpful suggestions for common typos (e.g., `llm_rubric` → `llm-rubric`)
+  - Distinguishes between errors (blocking) and warnings (non-blocking)
+- `stag eval create` command to create new evals from templates
+  - Interactive wizard for guided eval creation
+  - Four built-in templates: security, quality, language, blank
+  - `--template` flag to skip wizard and use template directly
+  - `--from` flag to copy and customize existing evals
+  - `--name` and `--description` flags for non-interactive creation
+- `--project` flag for `stag eval create` to save evals to `.staghorn/evals/`
+- `--team` flag for `stag eval create` to save evals to `./evals/` for team/community sharing
+- Example evals in `example/team-repo/evals/` demonstrating team eval patterns
+
+### Changed
+
+- Updated EVALS_GUIDE.md with comprehensive documentation for validate and create commands
+- Expanded CLI flags reference in README.md with new eval commands
+
 ## [0.4.0] - 2026-01-17
 
 ### Added
@@ -75,7 +98,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Support for team, personal, and project configuration layers
 - Automatic CLAUDE.md generation with layered content
 
-[Unreleased]: https://github.com/HartBrook/staghorn/compare/v0.4.0...HEAD
+[Unreleased]: https://github.com/HartBrook/staghorn/compare/v0.5.0...HEAD
+[0.5.0]: https://github.com/HartBrook/staghorn/compare/v0.4.0...v0.5.0
 [0.4.0]: https://github.com/HartBrook/staghorn/compare/v0.3.0...v0.4.0
 [0.3.0]: https://github.com/HartBrook/staghorn/compare/v0.2.0...v0.3.0
 [0.2.0]: https://github.com/HartBrook/staghorn/compare/v0.1.0...v0.2.0