ARPAHLS · rosspeili · Jun 13, 2026 · Jun 13, 2026
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -28,7 +28,7 @@ Humans: Please describe what this PR does and why it's needed.
 ## Checklist (all PRs)
 
 - [ ] My code follows the **Agent Code of Conduct**.
-- [ ] I have run `python -m flake8 .` and `pytest tests/` locally (or the subset relevant to this change).
+- [ ] I have run `python -m flake8 .`, `pytest skills/`, and `pytest tests/` locally (or the subset relevant to this change).
 - [ ] `CHANGELOG.md` updated under `[Unreleased]` if this PR changes user-visible behavior.
 - [ ] `examples/README.md` is updated if this PR adds, renames, or removes a runnable script under `examples/`.
 

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -36,9 +36,14 @@ jobs:
         # strict check against our config
         flake8 . --count --statistics
 
-    - name: Test with pytest
+    - name: Skill bundle tests
       env:
         ANTHROPIC_API_KEY: "dummy_key_for_ci"
         ETHERSCAN_API_KEY: "dummy_key_for_ci"
-      run: |
-        pytest tests/
+      run: pytest skills/
+
+    - name: Framework and maintainer tests
+      env:
+        ANTHROPIC_API_KEY: "dummy_key_for_ci"
+        ETHERSCAN_API_KEY: "dummy_key_for_ci"
+      run: pytest tests/
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,6 +12,7 @@ Contributors add user-facing entries under `[Unreleased]` in the same PR. Mainta
 - **Tests**: Backfilled `test_skill.py` for six registry skills (`mica_module`, `pii_masker`, `synthetic_generator`, `wallet_screening`, `pdf_form_filler`, `prompt_rewriter`); all registry skills now ship co-located bundle tests. Fixed `prompt_rewriter` package export so pytest can collect the bundle (#158).
 
 ### Changed
+- **CI**: GitHub Actions runs `pytest skills/` then `pytest tests/` after lint (bundle + framework/maintainer tests; closes #90) (#159).
 - **CI**: CodeQL GitHub Action upgraded from v3 to v4.
 - **Dependencies**: Extended `[all]` with registry skill runtime deps (`web3`, `fastembed`, `numpy`); added `[defi]` and `[embeddings]` optional extras. Documented manifest ↔ `pyproject.toml` convention in CONTRIBUTING and TESTING.md.
 - **Documentation**: [TESTING.md](docs/TESTING.md), [CONTRIBUTING.md](CONTRIBUTING.md), [ai_native_workflow.md](docs/contributing/ai_native_workflow.md), and README architecture tree document the bundle / framework / maintainer / example testing model. Pytest collects `tests/` and `skills/` only (`examples/` ignored).

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -111,20 +111,21 @@ Follow the [Agent Code of Conduct](CODE_OF_CONDUCT.md): deterministic skill outp
 ### Tests and CI
 
 - Add or update tests in the correct layer when behavior changes (see [TESTING.md](docs/TESTING.md)).
-- **Skill bundle test** — `skills/<category>/<name>/test_skill.py` (required for new skills; ships in the wheel; run locally before skill PRs).
+- **Skill bundle test** — `skills/<category>/<name>/test_skill.py` (required for new skills; ships in the wheel; runs in CI via `pytest skills/`).
 - **Framework test** — `tests/test_*.py` at repo root (loader, CLI, issuer rules).
 - **Maintainer skill test** — optional `tests/skills/<category>/test_<name>.py` for extra loader or edge-case coverage.
 - **Usage examples** — `examples/*.py` are not tests and are not run in CI.
-- **GitHub Actions** installs `pip install -e ".[dev,all]"`, runs `python -m black --check .`, then `flake8 .`, then **`pytest tests/`** (framework + maintainer tests). Do not add per-skill pip lines or test paths to `.github/workflows/ci.yml`.
+- **GitHub Actions** installs `pip install -e ".[dev,all]"`, runs `python -m black --check .`, then `flake8 .`, then **`pytest skills/`** (bundle tests), then **`pytest tests/`** (framework + maintainer tests). Do not add per-skill pip lines or hardcoded skill paths to `.github/workflows/ci.yml`.
 - Run locally before opening a PR:
 
   ```bash
   python -m black --check .
   python -m flake8 .
+  python -m pytest skills/
   python -m pytest tests/
   ```
 
-  For skill work, also run:
+  For a single skill:
 
   ```bash
   python -m pytest skills/<category>/<skill_name>/test_skill.py
@@ -153,6 +154,7 @@ Agents must follow [Agent Contribution Workflow](docs/contributing/ai_native_wor
    ```bash
    python -m black --check .
    python -m flake8 .
+   pytest skills/
    pytest tests/
    ```
 

diff --git a/docs/TESTING.md b/docs/TESTING.md
@@ -22,7 +22,7 @@ pip install -r requirements.txt
 
 | Layer | Location | Shipped in pip wheel? | CI on PR? |
 | :--- | :--- | :---: | :---: |
-| **Skill bundle test** | `skills/<category>/<skill_name>/test_skill.py` | Yes | No — run locally for skill PRs |
+| **Skill bundle test** | `skills/<category>/<skill_name>/test_skill.py` | Yes | Yes |
 | **Framework test** | `tests/test_*.py` (not under `tests/skills/`) | No (clone only) | Yes |
 | **Maintainer skill test** | `tests/skills/<category>/test_<name>.py` | No (clone only) | Yes when present |
 | **Usage example** | `examples/*.py` | No | No — not pytest |
@@ -62,7 +62,7 @@ pip install -r requirements.txt
 | Loader, CLI, registry issuer rules | Framework test | `tests/test_loader.py`, `tests/test_skill_issuer.py` |
 | End-to-end provider demo script | Usage example | `examples/gemini_tos_evaluator.py` |
 
-**Rule of thumb:** if it ships with the skill and must pass before merge → **bundle test** (run locally). If it is extra regression depth for clone-repo work → **maintainer test** (optional). If it proves provider integration → **example**, not pytest.
+**Rule of thumb:** if it ships with the skill and must pass before merge → **bundle test** (CI + local). If it is extra regression depth for clone-repo work → **maintainer test** (optional). If it proves provider integration → **example**, not pytest.
 
 ## 1. Code Formatting (Black)
 
@@ -121,20 +121,21 @@ GitHub Actions installs `pip install -e ".[dev,all]"`, then runs:
 ```bash
 python -m black --check .
 python -m flake8 .
+python -m pytest skills/
 python -m pytest tests/
 ```
 
-That covers **framework tests** and **maintainer skill tests** under `tests/`. It does not run `examples/` or skill bundle tests. Do not add per-skill pip lines or test paths to `.github/workflows/ci.yml`.
+That covers **skill bundle tests** under `skills/` and **framework + maintainer tests** under `tests/`. It does not run `examples/`. Do not add per-skill pip lines or hardcoded skill paths to `.github/workflows/ci.yml`.
 
 The `[all]` extra includes optional SDK groups plus registry skill runtime deps (`web3`, `fastembed`, `numpy`, …) so `pytest skills/` works after `pip install -e ".[dev,all]"`. When a skill adds new `manifest.yaml` `requirements`, add the same packages to the matching optional extra and to `[all]` in `pyproject.toml`.
 
 ### Local commands
 
-Match CI, and run bundle tests when you touch skills:
+Match CI:
 
 ```bash
-python -m pytest tests/
 python -m pytest skills/
+python -m pytest tests/
 ```
 
 Single skill bundle test:
@@ -164,5 +165,6 @@ Before pushing your code, run the following commands:
 1. `skillware list` (verify install and path resolution)
 2. `python -m black --check .` (verify formatting; use `python -m black .` to fix)
 3. `python -m flake8 .` (check quality)
-4. `python -m pytest tests/` (framework + maintainer tests — same scope as CI)
-5. `python -m pytest skills/<category>/<skill_name>/test_skill.py` when your PR adds or changes a skill bundle test (or `pytest skills/` for broad skill changes)
+4. `python -m pytest skills/` (bundle tests — same scope as CI)
+5. `python -m pytest tests/` (framework + maintainer tests — same scope as CI)
+6. `python -m pytest skills/<category>/<skill_name>/test_skill.py` when you want a single-skill subset
diff --git a/docs/contributing/ai_native_workflow.md b/docs/contributing/ai_native_workflow.md
@@ -132,6 +132,7 @@ You must:
 ```bash
 python -m black .
 python -m flake8 .
+pytest skills/
 pytest tests/
 ```
 
@@ -160,7 +161,7 @@ Run a **pre-PR audit** on yourself:
 1. Map every acceptance criterion in the issue to a file or test in your diff.
 2. Complete the [verification checklist](#verification-checklists-by-contribution-type) for your contribution type.
 3. If the change is user-visible, confirm [CHANGELOG.md](../../CHANGELOG.md) has entries under `[Unreleased]` (same rule as [CONTRIBUTING.md](../../CONTRIBUTING.md)).
-4. Run `flake8` and `pytest tests/`; for skill work also run the relevant `pytest skills/.../test_skill.py`. Report actual command output to your operator—do not claim success without evidence.
+4. Run `flake8`, `pytest skills/`, and `pytest tests/`; for skill work also run the relevant `pytest skills/.../test_skill.py`. Report actual command output to your operator—do not claim success without evidence.
 5. Draft PR template answers: check only boxes that apply; fill the skill section only if `skills/` changed.
 
 If anything fails, return to Stage 4, fix, and audit again.
@@ -200,7 +201,7 @@ You should:
 
 1. Draft the PR description (why, not only what; link the issue).
 2. Map changed files to the [pull request template](../../.github/PULL_REQUEST_TEMPLATE.md)—skill checklist only when `skills/` changed.
-3. Monitor CI (lint and `pytest tests/`). If checks fail, diagnose, fix in Stage 4, and push to the same branch.
+3. Monitor CI (lint, `pytest skills/`, and `pytest tests/`). If checks fail, diagnose, fix in Stage 4, and push to the same branch.
 4. Address review comments with focused follow-up commits.
 
 Do not force-push shared branches unless a maintainer instructs you.