Skip to content

JSLEEKR/agentspec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go License Tests Platform

agentspec

YAML-driven behavioral testing for AI agents.

Define what your agent should do in YAML. Run it against execution logs. Get pass/fail results. No LLM calls required.


Why This Exists

Everyone builds AI agents. Nobody tests them systematically.

Developers test LLM output quality (promptfoo, DeepEval) or mock LLM calls (mocklm). But nobody tests agent behavior -- did the agent call the right tools? In the right order? Did it avoid calling dangerous tools?

agentspec fills this gap. It reads YAML behavior specs and checks them against execution logs. Deterministic. Fast. CI-friendly.

The problem in one sentence: You can test that GPT returns good text, but you can't test that your agent followed the correct workflow.


How It Works

+------------------+     +------------------+     +------------------+
|   YAML Spec      | --> |   agentspec      | <-- |  Execution Log   |
|   (expected)     |     |   (engine)       |     |  (actual)        |
+------------------+     +------------------+     +------------------+
                               |
                               v
                    +------------------+
                    |   Pass / Fail    |
                    |   Report         |
                    +------------------+
  1. Write a YAML spec defining expected agent behavior
  2. Run your agent and capture the execution log (JSON)
  3. agentspec run specs/ compares spec vs log
  4. Get structured pass/fail results (table or JSON)

Installation

go install github.com/JSLEEKR/agentspec/cmd/agentspec@latest

Or build from source:

git clone https://github.com/JSLEEKR/agentspec.git
cd agentspec
go build -o agentspec ./cmd/agentspec

Quick Start

1. Create example specs

agentspec init

This creates specs/file-reader.yaml and specs/file-reader.log.json.

2. Run specs

agentspec run specs/

Output:

agentspec -- Agent Behavior Testing
===================================

specs/file-reader.yaml
  PASS Tool call: read_file (path=main.go)
  PASS Response contains "package main"
  PASS Constraint: no write_file calls
  PASS Constraint: max 3 tool calls (1 used)
  PASS Constraint: ordered execution

Results: 5 passed, 0 failed (1 specs)

3. Validate specs without running

agentspec validate specs/

Writing Specs

Basic Spec

name: "File reader agent reads requested files"
input:
  message: "Read the contents of main.go"

expect:
  tools:
    - name: read_file
      args:
        path: "main.go"

  response:
    contains: "package main"

  constraints:
    - no_tool: "write_file"
    - max_tools: 3
    - ordered: true

Matching Modes

agentspec supports 5 matching modes for tool arguments:

Exact Match (default)

args:
  path: "main.go"  # must be exactly "main.go"

Contains

args:
  query:
    contains: "weather"  # must contain "weather"

Regex

args:
  query:
    regex: "weather.*tokyo"  # must match regex

Schema (Type Check)

args:
  count:
    type: "number"  # must be a number

Supported types: string, number, boolean, null, array, object.

Any

args:
  session_id:
    any: true  # matches any value

Response Matching

expect:
  response:
    contains: "package main"     # substring match
    exact: "Done."               # exact match
    regex: "\\d+ results found"  # regex match

Constraints

expect:
  constraints:
    - no_tool: "write_file"   # agent must NOT call write_file
    - max_tools: 3            # at most 3 tool calls total
    - ordered: true           # tools must be called in listed order

Execution Logs

agentspec reads JSON execution logs that describe what the agent actually did:

{
  "input": "Read the contents of main.go",
  "tool_calls": [
    {
      "name": "read_file",
      "arguments": {"path": "main.go"},
      "result": "package main\n\nfunc main() {}"
    }
  ],
  "response": "Here are the contents of main.go: package main..."
}

Log Discovery

By default, agentspec looks for logs alongside specs:

specs/
  file-reader.yaml       <-- spec
  file-reader.log.json   <-- execution log (auto-discovered)
  search-agent.yaml
  search-agent.log.json

Or specify a log directory:

agentspec run specs/ --logs logs/

Or a single log for all specs:

agentspec run specs/ --logs execution.json

CLI Reference

agentspec run <spec-path>

Run specs against execution logs.

agentspec run specs/                    # run all specs in directory
agentspec run specs/file-reader.yaml    # run single spec
agentspec run specs/ --format json      # JSON output for CI
agentspec run specs/ --format table     # table output (default)
agentspec run specs/ --parallel 4       # parallel execution
agentspec run specs/ --logs logs/       # specify log directory

Exit code 1 if any spec fails (CI-friendly).

agentspec validate <spec-path>

Validate spec syntax without running.

agentspec validate specs/

agentspec init

Create example spec and execution log files.

agentspec init

agentspec version

Show version.

agentspec version

Output Formats

Table (default)

agentspec -- Agent Behavior Testing
===================================

specs/file-reader.yaml
  PASS Tool call: read_file (path=main.go)
  PASS Response contains "package main"
  PASS Constraint: no write_file calls
  PASS Constraint: max 3 tool calls (1 used)

specs/search-agent.yaml
  PASS Tool call: web_search (query matches /weather.*tokyo/)
  FAIL Tool call: summarize -- expected but not called
  PASS Constraint: ordered execution

Results: 5 passed, 1 failed (2 specs)

JSON

{
  "summary": {
    "total": 6,
    "passed": 1,
    "failed": 1,
    "specs": 2
  },
  "specs": [
    {
      "name": "File reader agent",
      "path": "specs/file-reader.yaml",
      "passed": true,
      "checks": [
        {"passed": true, "message": "Tool call: read_file (path=main.go)"}
      ]
    }
  ]
}

Architecture

cmd/agentspec/main.go           -- CLI entry (cobra)
internal/spec/
  parser.go                     -- YAML spec parser
  types.go                      -- Spec data structures
  validator.go                  -- Spec syntax validation
internal/matcher/
  matcher.go                    -- Tool call matching engine
  pattern.go                    -- Exact/contains/regex/schema/any
internal/runner/
  runner.go                     -- Spec execution runner
  parallel.go                   -- Parallel execution
internal/reporter/
  table.go                      -- Table output
  json.go                       -- JSON output
  summary.go                    -- Pass/fail summary
internal/loader/
  loader.go                     -- Load execution logs (JSON)

Use Cases

CI Pipeline

# .github/workflows/agent-test.yml
- name: Test agent behavior
  run: |
    go install github.com/JSLEEKR/agentspec/cmd/agentspec@latest
    agentspec run specs/ --format json > results.json

MCP Server Testing

Test that your MCP server agent calls the right tools:

name: "Code review agent uses diff tool"
input:
  message: "Review this pull request"
expect:
  tools:
    - name: git_diff
      args:
        ref:
          regex: "^(main|master)\\.\\.\\.HEAD$"
    - name: read_file
      args:
        path:
          any: true
  constraints:
    - no_tool: "git_push"
    - max_tools: 10
    - ordered: true

Agent Workflow Validation

Ensure agents follow the correct sequence:

name: "Research agent follows search-then-summarize pattern"
input:
  message: "Research quantum computing advances in 2026"
expect:
  tools:
    - name: web_search
      args:
        query:
          contains: "quantum"
    - name: summarize
  constraints:
    - ordered: true
    - no_tool: "write_file"

Security

  • No network calls -- reads local files only
  • No LLM API calls -- decoupled by design
  • YAML parsing with 1MB size limit per spec file
  • No code execution from specs
  • Strict JSON parsing mode rejects unknown fields

Development

# Run tests
go test ./...

# Build
go build -o agentspec ./cmd/agentspec

# Run example
agentspec init
agentspec run specs/

License

MIT License. See LICENSE.

About

Agent behavioral testing -- YAML specs for tool calls, sequences, constraints

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages