Skip to content

[RFC]: Testing architecture - bundle, framework, and maintainer tests in CI #156

Description

@rosspeili

Summary

Define Skillware's testing model and track implementation across docs, CI, packaging, layout, and backfill work.

Four layers:

Layer Path Shipped in pip wheel? CI on PR?
Skill bundle test skills/<cat>/<name>/test_skill.py Yes Yes — required for every registry skill
Framework test tests/test_*.py (not under tests/skills/) No (clone only) Yes
Maintainer skill test tests/skills/<cat>/test_<name>.py No (clone only) Yes — optional per skill, but runs in CI when present
Usage example examples/*.py No No — not pytest, not CI

Vocabulary: use bundle / framework / maintainer / example — not "Tier 1 / Tier 2".

Target CI (end state):

black --check . && flake8 .
pytest skills/
pytest tests/

(examples/ is never collected; optional hardening via pytest config.)

Two separate pytest steps in Actions (same commands, clearer failure output):

  1. Skill bundle tests — pytest skills/
  2. Framework and maintainer tests — pytest tests/ (includes tests/skills/ and root framework tests)

What changes vs today:

What stays out of CI: examples/ only (manual / documented demos; may need API keys).

Supersedes the interim #151 scope adjustment (CI on pytest tests/ only, bundle tests local-only). Install cleanup from #152 stays.

Related open issues: #106, #90, #86, #83

Child issues (opened after this RFC): pyproject [all] for bundle-test deps, bundle-test backfill, CI workflow steps, layout (#86), issuer enforcement, examples≠tests docs.

Motivation

Contributors, agents, and pip users need one clear rule: every skill ships a bundle test that CI enforces. Today the repo inverts part of that — pytest tests/ runs in CI (framework + maintainer skill tests), but skills/**/test_skill.py does not (#90).

Docs, #151, and #152 describe overlapping but conflicting expectations. Without a single tracked architecture, #106, #86, and #90 duplicate effort and contradict each other.

Maintainer tests under tests/skills/ should remain in CI for now: they provide loader integration and edge-case coverage, especially for skills that do not yet have bundle tests and for skills where maintainer tests are deeper than the bundle test (e.g. tos_evaluator). Removing them from CI would be a coverage regression until bundle tests are backfilled and rebalanced.

Detailed Design

Roles (documentation contract)

Bundle test (skills/.../test_skill.py)

  • Required for every registry skill (enforced after backfill + issuer check).
  • Offline / mockable; manifest and execute contract.
  • Ships with the skill via MANIFEST.in; pip users can run with pip install "skillware[dev,all]".

Framework test (tests/test_loader.py, test_cli.py, test_skill_issuer.py, test_version_policy.py, …)

  • Core engine health; not skill-specific.
  • Clone-repo only; always CI.

Maintainer skill test (tests/skills/<cat>/test_<name>.py)

  • Optional depth: loader wiring, heavy mocks, regressions, niche edge cases.
  • Not required for every skill; when present, runs in CI as part of pytest tests/.
  • Clone-repo only; not shipped via pip.

Usage example (examples/*.py)

  • Runnable provider demos; not tests.
  • Never pytest; never CI.

CI workflow shape

- name: Install
  run: pip install -e ".[dev,all]"

- name: Check formatting with black
  run: python -m black --check .

- name: Lint with flake8
  run: flake8 ...

- name: Skill bundle tests
  run: pytest skills/
  env: { dummy keys as needed }

- name: Framework and maintainer tests
  run: pytest tests/
  env: { same dummy keys }

No per-skill hardcoded paths. Extend [all] so bundle tests have manifest deps (web3, fastembed, numpy, …) — separate child issue.

Packaging / pip extras (v1)

Install Purpose
pip install skillware Core + skill bundles (incl. test_skill.py files)
pip install "skillware[dev]" pytest, black, flake8
pip install "skillware[all]" Skill runtime deps across registry
pip install -e ".[dev,all]" Contributor / CI install

No [tests] or [examples] extra in v1 — tests/ and examples/ are git-clone workflows.

Open decisions (child issues)

  1. Grandfather: enable pytest skills/ in CI before or after backfill of 6 missing bundle tests?
  2. Rebalance bundle vs maintainer depth for tos_evaluator, evm_tx_handler — with skill authors.
  3. Future: after full bundle backfill, revisit whether any maintainer tests are redundant (do not block this RFC).

Planned execution order

  1. This RFC + GitHub Project
  2. Comments realigning [Docs]: Clarify two-tier testing model — skill-local vs tests/ integration suite #106, [Bug]: CI only runs pytest tests/ — skill-local test_skill.py files are never exercise #90, [Feat]: Standardise two-tier test layout and document skill-local vs integration test convention #86, [Feat]: CLI command to run skill tests (sub-issue of #16) #83
  3. Child issues: [all] / deps → backfill bundle tests → CI two-step pytest → [Feat]: Standardise two-tier test layout and document skill-local vs integration test convention #86 layout → issuer enforcement → examples≠tests docs → [Feat]: CLI command to run skill tests (sub-issue of #16) #83 CLI
  4. PRs in that dependency order

Drawbacks

Metadata

Metadata

Assignees

No one assigned

    Labels

    core frameworkChanges to loader, env, or base classes.discussionOpen discussion for RFCs and proposals.enhancementNew feature or request
    No fields configured for Feature.

    Projects

    Status
    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions