Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions .claude/skills/nasde-benchmark-creator/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,10 +336,14 @@ variants/gemini-baseline/

> **Codex/Gemini skills are registered natively** (Harbor `config.agent.skills`),
> not via `sandbox_files` — these CLIs auto-discover skills only from a
> HOME-scoped dir, never from a `/app` cwd dir. Each `SKILL.md` **must start with
> a `---` YAML frontmatter line**: Codex's loader rejects a file that opens with
> anything else (`missing YAML frontmatter delimited by ---`) and silently skips
> the skill. Put any provenance comment *below* the closing `---`. See
> HOME-scoped dir, never from a `/app` cwd dir. This applies to **all** ways a
> skill is supplied to a Codex/Gemini variant: the `agents_skills/` /
> `gemini_skills/` snapshot above, a `[[skill]]` by-reference entry in
> `variant.toml`, and a `[nasde.plugin]`'s own `skills/`. Each `SKILL.md`
> **must start with a `---` YAML frontmatter line**: Codex's loader rejects a
> file that opens with anything else (`missing YAML frontmatter delimited by
> ---`) and silently skips the skill. Put any provenance comment *below* the
> closing `---`. See
> [ADR-012](../../../docs/adr/012-native-codex-gemini-skill-injection.md).

If no `harbor_config.json` exists, `nasde` auto-generates one from `variant.toml`. To customize (e.g., add MCP servers), create it explicitly:
Expand Down
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,23 @@ See [docs/RELEASING.md](docs/RELEASING.md) for the release procedure.
max_turns` in `nasde.toml`. ([#54])

### Fixed
- **`[[skill]]` by-reference and `[nasde.plugin]` skills now register natively for Codex/Gemini ([ADR-012](docs/adr/012-native-codex-gemini-skill-injection.md)).**
PR #65 fixed the *snapshot* path (`agents_skills/`, `gemini_skills/`) but left
the two other ways a skill reaches an agent — `[[skill]]` by-reference in
`variant.toml` and a `[nasde.plugin]`'s own `skills/` — still hardcoding
`/app/.claude/skills/` via `stage_skill_dir`. Correct for Claude (cwd discovery
root), but for a **Codex/Gemini** variant the skill landed where the CLI never
scans, so it was never natively registered (the same silent "skill-as-document"
failure). Both paths now resolve to skill *directories*
(`collect_referenced_skill_dirs` / `collect_plugin_skill_dirs`) routed through
Harbor's native `config.agent.skills` for Codex/Gemini, unioned with the
snapshot dirs under one `_nasde_derived_skills` marker (stale-drop +
hand-authored preservation work across both sources); Claude keeps its
unchanged `sandbox_files` path. The frontmatter warning now also fires on this
path, and a new warning flags two derived skill dirs sharing a basename (Harbor
registers only the last). New example variants `codex-nasde-dev-with-arch` and
`gemini-nasde-dev-with-arch` exercise `[[skill]]` by-reference on a non-Claude
agent. ([#67])
- **Codex/Gemini skills are now natively registered ([ADR-012](docs/adr/012-native-codex-gemini-skill-injection.md)).**
Codex and Gemini auto-discover skills only from a HOME-scoped directory
(`$HOME/.agents/skills`, `~/.gemini/skills`), never from a `.agents/skills` /
Expand Down Expand Up @@ -522,4 +539,5 @@ Initial release under the **nasde-toolkit** name (rebrand from
[#59]: https://github.com/NoesisVision/nasde-toolkit/pull/59
[#61]: https://github.com/NoesisVision/nasde-toolkit/pull/61
[#65]: https://github.com/NoesisVision/nasde-toolkit/pull/65
[#67]: https://github.com/NoesisVision/nasde-toolkit/pull/67
[gh-litellm-2026-04]: https://github.com/BerriAI/litellm/security/advisories/GHSA-xqmj-j6mv-4862
24 changes: 14 additions & 10 deletions CLAUDE.md

Large diffs are not rendered by default.

47 changes: 42 additions & 5 deletions docs/adr/012-native-codex-gemini-skill-injection.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,36 @@ skills and `[[skill]]` by-reference (ADR-009) via `stage_skill_dir`. Migrating
it onto `config.agent.skills` would be a larger, riskier change with no
behavioral payoff. Scope is kept to the actual bug.

### All three skill paths, not just the snapshot

A skill reaches an agent three ways: the variant **snapshot**
(`agents_skills/`, `gemini_skills/`), **`[[skill]]` by-reference** in
`variant.toml`, and a **`[nasde.plugin]`**'s own `skills/`. The first revision
of this decision fixed only the snapshot path; the latter two still went
through `stage_skill_dir`'s hardcoded `/app/.claude/skills/` and so were broken
for Codex/Gemini in exactly the same way.

All three are now routed natively for Codex/Gemini. `[[skill]]`/plugin skills
are resolved to skill *directories* by `collect_referenced_skill_dirs` /
`collect_plugin_skill_dirs` (the dir-list counterparts of the unchanged
Claude-only `stage_referenced_skills` / `register_plugin_skills`), threaded as
`extra_skill_dirs` and **unioned** with the snapshot dirs inside
`_refresh_agent_skills`. The union is tracked by the single
`_nasde_derived_skills` marker, so stale-drop and hand-authored preservation
work across both sources. For a Claude agent `extra_skill_dirs` is dropped at
merge time (those skills already ride `sandbox_files`, so feeding the native
list too would double-inject). Because Harbor's `resolve_skills` keys skills by
basename (last-wins), nasde warns when two derived dirs share a basename so the
loser does not silently vanish.

**Coverage note (honest).** The `[[skill]]` path is verified end-to-end on a
live agent (see Consequences). The `[nasde.plugin]` path shares the same channel
from `extra_skill_dirs` onward, but there is no example benchmark with a
`[nasde.plugin]` to run it e2e — so it is covered by an **integration test** that
drives the real `docker.ensure_task_plugin` staging → `collect_plugin_skill_dirs`
→ harbor_config (no Docker build), plus unit tests on the resolver. Full
live-agent e2e for a plugin is a follow-up for when a plugin benchmark exists.

### Frontmatter guardrail

Codex's loader is strict: a `SKILL.md` that does not **start** with a `---`
Expand Down Expand Up @@ -123,18 +153,25 @@ comments moved below the closing `---`.
its native mechanism, not by reading the file.
- Skill×model matrix results for Codex/Gemini collected under the old behavior
are invalid (skill-as-document, not skill-as-native) and must be re-run.
- `sandbox_files` no longer carries skill files for Codex/Gemini; it carries
only `AGENTS.md` / `GEMINI.md` (and, for Claude, the unchanged skill files).
- For Codex/Gemini, skills (snapshot, `[[skill]]`, and plugin) are all carried
by `config.agent.skills`; the agent's effective `sandbox_files` is just
`AGENTS.md` / `GEMINI.md`. The Claude `sandbox_files` skill map is still built
unconditionally (so Claude stays byte-identical and tested) but a Codex/Gemini
agent never consumes it.
- The brittle `_expand_skill_targets` workaround in `ConfigurableGemini` (which
hardcoded `/root/.gemini/skills/`) is removed — Harbor's native command uses
the shell-expanded `$HOME`, correct for non-root users too.

## References

- `runner._collect_native_skill_dirs`, `runner._refresh_agent_skills`,
`runner._agent_type_from_import_path`
`runner._agent_type_from_import_path`, `runner._referenced_skill_dirs_for_agent`,
`runner._warn_skill_basename_collisions`
- `plugin_registration.collect_referenced_skill_dirs`,
`plugin_registration.collect_plugin_skill_dirs` (dir-list channel),
`stage_referenced_skills` / `register_plugin_skills` (unchanged Claude sink)
- Harbor `agents/installed/codex.py::_build_register_skills_command`,
`gemini_cli.py::_build_register_skills_command`, `trial.py` skill upload
- `openai/codex` `core-skills/src/loader.rs` (native discovery roots)
- [ADR-009](009-plugin-and-skill-by-reference.md) (the Claude skill path this
decision deliberately leaves untouched)
- [ADR-009](009-plugin-and-skill-by-reference.md) (the `[[skill]]` / plugin
mechanism whose Claude sink this decision leaves untouched)
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Agent Instructions

## Approach

1. Begin by consulting the **nasde-dev** skill and apply its guidance throughout this task.
2. Before writing any code, explore the existing codebase to understand its architecture and conventions.
3. Follow existing architectural patterns exactly — match naming, namespaces, directory structure, and code style.
4. When modifying nasde-toolkit functionality (CLI, runner, evaluator, config, agents), follow the `nasde-dev` skill's verification protocol, including documentation sync requirements.
5. After making changes, verify documentation is consistent (CLAUDE.md, README.md, ARCHITECTURE.md).
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
agent = "codex"
model = "gpt-5.4"

# nasde-dev skill by-reference (ADR-009) from the live source. For a Codex
# variant this is registered through Harbor's native config.agent.skills so
# the skill lands in $HOME/.agents/skills and Codex auto-discovers it — NOT in
# the /app cwd (which Codex never scans). See ADR-012.
[[skill]]
path = "../../../../.claude/skills/nasde-dev"
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Agent Instructions

## Approach

1. Begin by consulting the **nasde-dev** skill and apply its guidance throughout this task.
2. Before writing any code, explore the existing codebase to understand its architecture and conventions.
3. Follow existing architectural patterns exactly — match naming, namespaces, directory structure, and code style.
4. When modifying nasde-toolkit functionality (CLI, runner, evaluator, config, agents), follow the `nasde-dev` skill's verification protocol, including documentation sync requirements.
5. After making changes, verify documentation is consistent (CLAUDE.md, README.md, ARCHITECTURE.md).
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
agent = "gemini"
model = "google/gemini-3-flash-preview"

# nasde-dev skill by-reference (ADR-009) from the live source. For a Gemini
# variant this is registered through Harbor's native config.agent.skills so
# the skill lands in ~/.gemini/skills and Gemini auto-discovers it — NOT in
# the /app cwd (which Gemini never scans). See ADR-012.
[[skill]]
path = "../../../../.claude/skills/nasde-dev"
64 changes: 59 additions & 5 deletions src/nasde_toolkit/plugin_registration.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,17 @@
2. The ``[[skill]]`` array in variant.toml references a skill from a source
path instead of copying it into ``variants/<v>/skills/``.

Both feed :func:`stage_skill_dir`, which expands a skill directory into the
flat ``{container_path: host_file}`` mapping ConfigurableClaude uploads
(it only uploads regular files, mirroring ``_collect_codex_skills``). The
plugin path additionally calls :func:`inject_mcp_server` to register the
plugin's MCP server in the task's Harbor config.
For **Claude** both feed :func:`stage_skill_dir`, which expands a skill
directory into the flat ``{container_path: host_file}`` mapping
ConfigurableClaude uploads to ``/app/.claude/skills`` (Claude's cwd discovery
root). The plugin path additionally calls :func:`inject_mcp_server` to register
the plugin's MCP server in the task's Harbor config.

For **Codex/Gemini** the same skill *directories* are resolved by
:func:`collect_referenced_skill_dirs` / :func:`collect_plugin_skill_dirs` and
fed to Harbor's native ``config.agent.skills`` instead — those CLIs scan only a
HOME-scoped dir (``$HOME/.agents/skills``, ``~/.gemini/skills``), never the
``/app`` cwd the sandbox map targets. See ADR-012.
"""

from __future__ import annotations
Expand Down Expand Up @@ -119,6 +125,54 @@ def stage_referenced_skills(
console.print(f" [dim]Registered referenced skill '{resolved.name}'[/dim]")


def collect_referenced_skill_dirs(
variant_dir: Path,
worktree_factory: WorktreeFactory,
) -> list[Path]:
"""Resolve the ``[[skill]]`` array to a list of whole skill directories.

Same resolution as :func:`stage_referenced_skills` (each entry's ``path``
relative to the variant dir, optional ``ref`` read from a temporary git
worktree) but returns the resolved *directories* instead of flattening them
into a ``sandbox_files`` map under ``/app/.claude/skills``. Codex and Gemini
auto-discover skills only from a HOME-scoped dir, never from the agent's
``/app`` cwd, so they take this dir list through Harbor's native
``config.agent.skills`` mechanism. Claude keeps using
:func:`stage_referenced_skills`. See ADR-012.

Paths are fully ``resolve()``d so they compare equal to the snapshot dirs
from ``runner._collect_native_skill_dirs`` — the union/dedup, stale-drop and
basename-collision logic in ``_refresh_agent_skills`` key on string equality,
which would break on a symlinked ``/tmp`` (macOS) or a ``ref`` worktree path.
"""
dirs: list[Path] = []
for entry in _read_skill_entries(variant_dir):
resolved = _resolve_skill_source(variant_dir, entry["path"], entry.get("ref", ""), worktree_factory)
if not (resolved / "SKILL.md").exists():
raise RuntimeError(f"[[skill]] '{resolved}' has no SKILL.md")
dirs.append(resolved.resolve())
return dirs


def collect_plugin_skill_dirs(staged: StagedPlugin) -> list[Path]:
"""Resolve a shipped plugin's own ``skills/`` to a list of skill dirs.

The dir-list counterpart of :func:`register_plugin_skills`, for the same
reason as :func:`collect_referenced_skill_dirs`: Codex/Gemini take these
dirs through Harbor's native ``config.agent.skills`` rather than a
cwd ``sandbox_files`` mapping. Paths are ``resolve()``d for the same
string-equality reason as :func:`collect_referenced_skill_dirs`.
"""
skills_root = staged.staged_dir / "skills"
if not skills_root.is_dir():
return []
return [
skill_dir.resolve()
for skill_dir in sorted(skills_root.iterdir())
if skill_dir.is_dir() and (skill_dir / "SKILL.md").exists()
]


def inject_mcp_server(task_dir: Path, staged: StagedPlugin) -> None:
"""Write the plugin's MCP servers into the task's task.toml.

Expand Down
Loading
Loading