hex-foundation — Test Matrix

This document describes the test suite, what each test verifies, and how to run it locally.

Test categories

Category	Files	Needs API key
Static / unit	`test_skill_frontmatter.sh`, `test_skill_refs.sh`, `test_path_mapping.bats`, `test_hex_doctor_version_sync.bats`, `test_hex_doctor_hex_binary_version_sync.bats`, `test-upgrade-deletion.sh`, `test-upgrade-binary-swap.sh`	No
Core E2E (containerized)	`tests/core-e2e/run-all.sh`	BOI suites only
Live eval — Claude Code	`test_skill_discovery.sh`, `test_e2e.sh`, `test_fullstack.sh`	Yes
Live eval — Codex	`test_skill_discovery_codex.sh`, `test_codex_onboarding.sh`	Yes
Codex parity (containerized)	`tests/codex-parity/run-all.sh`	No (structural); `OPENAI_API_KEY` for live
Migration	`tests/migrate/test-migrate.sh`	No
Memory	`test_memory.py`	No

Core E2E suite (`tests/core-e2e/`)

Auto-discovers all tests/core-e2e/suites/*.sh files and runs them. Non-BOI suites run inside the tests/core-e2e/Dockerfile container; BOI integration suites run on the host (they need Docker access to spin up their own containers).

CI runs both jobs on every PR and blocks merges on failure (see .github/workflows/core-e2e.yml).

# All suites (host must have Docker)
bash tests/core-e2e/run-all.sh

# Filter by pattern — useful when iterating on a specific suite
bash tests/core-e2e/run-all.sh --include boi          # BOI suites only
bash tests/core-e2e/run-all.sh --exclude boi          # skip BOI (e.g. inside Docker)
bash tests/core-e2e/run-all.sh --include 'install|upgrade'  # regex match on suite name

Current suites:

Suite	What it verifies
`test-boi-install`	Fresh BOI install: binary builds, `--help`/`--version`, smoke dispatch
`test-boi-upgrade`	Upgrade path: version bump, stale-symlink detection, doctor catches dangling link
`test-assets`	Asset registry CRUD via `hex asset` subcommands
`test-cli`	All `hex` subcommands reachable; version matches `Cargo.toml`
`test-events`	Event emit, policy firing, trace via `hex events`
`test-messaging`	Message send/receive/filter with SQLite verification
`test-sse`	SSE subscribe/publish, topic filtering, heartbeat
`test-telemetry`	Telemetry JSONL written to `.hex/telemetry/`
`test-doctor-events-coverage`	`hex-doctor` fails loudly on broken policies (parse errors named), passes on valid policies

Tests added in v0.2.4

`tests/test_skill_frontmatter.sh`

Validates every system/skills/*/SKILL.md without running any agent. Checks:

Frontmatter block exists at the top of the file.
name field is present and matches the skill directory name.
description field is present and non-empty.
If allowed-tools is present, it is a YAML list of strings.

Exit 0 = all valid. Exit 1 = summary of failures.

`tests/test_skill_refs.sh`

Installs hex to a temp dir and verifies that every path reference inside SKILL.md files resolves on disk. Catches broken references to scripts, templates, or commands before they reach users.

`tests/test_skill_discovery.sh`

Runs Claude Code in --print mode inside a fresh hex install and asserts:

All 11 shipped skills appear in Claude's response to a discovery prompt.
At least 3 skills (/hex-doctor, /hex-decide, /hex-triage) can be invoked without crashing.

Requires ~/.hex-test.env with ANTHROPIC_API_KEY.

`tests/test_skill_discovery_codex.sh`

Mirror of the above for Codex. Because Codex reads AGENTS.md rather than SKILL.md files directly, this test verifies that the 11 skill names surface via AGENTS.md context and that Codex can perform the same three invocations.

Codex parity suite (`tests/codex-parity/`)

Seven tests that verify behavioral parity between the Claude Code and Codex runtimes. Runs inside a Docker container with Node.js + Codex CLI installed. Structural tests run without an API key; live-dispatch tests are skipped automatically when OPENAI_API_KEY is absent.

bash tests/codex-parity/run-all.sh

Test	What it verifies	API key
`test-install-shape.sh`	Fresh hex install produces `.hex/scripts/`, `.hex/skills/`, `.hex/bin/`, `CLAUDE.md`, `AGENTS.md`	No
`test-agents-md-complete.sh`	`AGENTS.md` covers all sections present in `CLAUDE.md`	No
`test-skill-discovery.sh`	All skills are discoverable from `.hex/skills/*/SKILL.md` under Codex	No
`test-doctor-codex.sh`	`doctor.sh` includes and passes the Codex CLI check	No
`test-upgrade-codex.sh`	`upgrade.sh` preserves `AGENTS.md` user customizations	No
`test-boi-dispatch-codex.sh`	Minimal spec with `runtime=codex` completes and produces output	Yes
`test-memory-search.sh`	Memory search index and CLI work identically under the Codex runtime	No

Gate 5 in system/scripts/release.sh runs this suite and blocks on failure; structural tests always run, live tests are skipped when no key is present.

Running locally

Prerequisites

Docker (for Docker eval suite)
Tart (for macOS eval suite — Apple Silicon only)
~/.hex-test.env containing at minimum:
```
ANTHROPIC_API_KEY=sk-ant-...
```

Static tests (no API key)

cd /path/to/hex-foundation

bash tests/test_skill_frontmatter.sh
bash tests/test_skill_refs.sh
bash tests/migrate/test-migrate.sh
python3 tests/test_memory.py
bats tests/test_hex_doctor_version_sync.bats
bats tests/test_hex_doctor_hex_binary_version_sync.bats
bash tests/test-upgrade-deletion.sh
bash tests/test-upgrade-binary-swap.sh

Full Docker eval suite

bash tests/eval/run_eval_docker.sh --live

Individual cases:

bash tests/eval/run_eval_docker.sh --live --case skill-frontmatter
bash tests/eval/run_eval_docker.sh --live --case skill-refs
bash tests/eval/run_eval_docker.sh --live --case skill-discovery
bash tests/eval/run_eval_docker.sh --live --case skill-discovery-codex

Full macOS Tart eval suite

bash tests/eval/run_eval_macos.sh

Shipped skills (as of v0.2.4)

The 11 skills installed under .hex/skills/ (verified by test_skill_discovery.sh):

hex-startup
hex-checkpoint
hex-shutdown
hex-reflect
hex-consolidate
hex-debrief
hex-decide
hex-triage
hex-doctor
landings
memory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hex-foundation — Test Matrix

Test categories

Core E2E suite (`tests/core-e2e/`)

Tests added in v0.2.4

`tests/test_skill_frontmatter.sh`

`tests/test_skill_refs.sh`

`tests/test_skill_discovery.sh`

`tests/test_skill_discovery_codex.sh`

Codex parity suite (`tests/codex-parity/`)

Running locally

Prerequisites

Static tests (no API key)

Full Docker eval suite

Full macOS Tart eval suite

Shipped skills (as of v0.2.4)

FilesExpand file tree

testing.md

Latest commit

History

testing.md

File metadata and controls

hex-foundation — Test Matrix

Test categories

Core E2E suite (tests/core-e2e/)

Tests added in v0.2.4

tests/test_skill_frontmatter.sh

tests/test_skill_refs.sh

tests/test_skill_discovery.sh

tests/test_skill_discovery_codex.sh

Codex parity suite (tests/codex-parity/)

Running locally

Prerequisites

Static tests (no API key)

Full Docker eval suite

Full macOS Tart eval suite

Shipped skills (as of v0.2.4)

Core E2E suite (`tests/core-e2e/`)

`tests/test_skill_frontmatter.sh`

`tests/test_skill_refs.sh`

`tests/test_skill_discovery.sh`

`tests/test_skill_discovery_codex.sh`

Codex parity suite (`tests/codex-parity/`)