diff --git a/.claude/skills/nasde-benchmark-creator/SKILL.md b/.claude/skills/nasde-benchmark-creator/SKILL.md
index 5d8b487..478abe0 100644
--- a/.claude/skills/nasde-benchmark-creator/SKILL.md
+++ b/.claude/skills/nasde-benchmark-creator/SKILL.md
@@ -336,10 +336,14 @@ variants/gemini-baseline/
 
 > **Codex/Gemini skills are registered natively** (Harbor `config.agent.skills`),
 > not via `sandbox_files` — these CLIs auto-discover skills only from a
-> HOME-scoped dir, never from a `/app` cwd dir. Each `SKILL.md` **must start with
-> a `---` YAML frontmatter line**: Codex's loader rejects a file that opens with
-> anything else (`missing YAML frontmatter delimited by ---`) and silently skips
-> the skill. Put any provenance comment *below* the closing `---`. See
+> HOME-scoped dir, never from a `/app` cwd dir. This applies to **all** ways a
+> skill is supplied to a Codex/Gemini variant: the `agents_skills/` /
+> `gemini_skills/` snapshot above, a `[[skill]]` by-reference entry in
+> `variant.toml`, and a `[nasde.plugin]`'s own `skills/`. Each `SKILL.md`
+> **must start with a `---` YAML frontmatter line**: Codex's loader rejects a
+> file that opens with anything else (`missing YAML frontmatter delimited by
+> ---`) and silently skips the skill. Put any provenance comment *below* the
+> closing `---`. See
 > [ADR-012](../../../docs/adr/012-native-codex-gemini-skill-injection.md).
 
 If no `harbor_config.json` exists, `nasde` auto-generates one from `variant.toml`. To customize (e.g., add MCP servers), create it explicitly:
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5b37b97..8207ad9 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -104,6 +104,23 @@ See [docs/RELEASING.md](docs/RELEASING.md) for the release procedure.
   max_turns` in `nasde.toml`. ([#54])
 
 ### Fixed
+- **`[[skill]]` by-reference and `[nasde.plugin]` skills now register natively for Codex/Gemini ([ADR-012](docs/adr/012-native-codex-gemini-skill-injection.md)).**
+  PR #65 fixed the *snapshot* path (`agents_skills/`, `gemini_skills/`) but left
+  the two other ways a skill reaches an agent — `[[skill]]` by-reference in
+  `variant.toml` and a `[nasde.plugin]`'s own `skills/` — still hardcoding
+  `/app/.claude/skills/` via `stage_skill_dir`. Correct for Claude (cwd discovery
+  root), but for a **Codex/Gemini** variant the skill landed where the CLI never
+  scans, so it was never natively registered (the same silent "skill-as-document"
+  failure). Both paths now resolve to skill *directories*
+  (`collect_referenced_skill_dirs` / `collect_plugin_skill_dirs`) routed through
+  Harbor's native `config.agent.skills` for Codex/Gemini, unioned with the
+  snapshot dirs under one `_nasde_derived_skills` marker (stale-drop +
+  hand-authored preservation work across both sources); Claude keeps its
+  unchanged `sandbox_files` path. The frontmatter warning now also fires on this
+  path, and a new warning flags two derived skill dirs sharing a basename (Harbor
+  registers only the last). New example variants `codex-nasde-dev-with-arch` and
+  `gemini-nasde-dev-with-arch` exercise `[[skill]]` by-reference on a non-Claude
+  agent. ([#67])
 - **Codex/Gemini skills are now natively registered ([ADR-012](docs/adr/012-native-codex-gemini-skill-injection.md)).**
   Codex and Gemini auto-discover skills only from a HOME-scoped directory
   (`$HOME/.agents/skills`, `~/.gemini/skills`), never from a `.agents/skills` /
@@ -522,4 +539,5 @@ Initial release under the **nasde-toolkit** name (rebrand from
 [#59]: https://github.com/NoesisVision/nasde-toolkit/pull/59
 [#61]: https://github.com/NoesisVision/nasde-toolkit/pull/61
 [#65]: https://github.com/NoesisVision/nasde-toolkit/pull/65
+[#67]: https://github.com/NoesisVision/nasde-toolkit/pull/67
 [gh-litellm-2026-04]: https://github.com/BerriAI/litellm/security/advisories/GHSA-xqmj-j6mv-4862
diff --git a/CLAUDE.md b/CLAUDE.md
index 26f29f9..adfdbab 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -76,15 +76,15 @@ See [ARCHITECTURE.md](ARCHITECTURE.md) for the full system architecture with dia
 - **Configuration**: Two-layer config — `nasde.toml` for project-level settings, `task.toml` per task (single file, shared with Harbor; nasde-specific fields live under `[nasde.*]`). Both parsed into `@dataclass` models in `config.py`. Task discovery walks `tasks/` (or `.nasde/tasks/`) automatically.
 - **Benchmark runner**: Uses Harbor Python API (`Job`, `JobConfig`) directly instead of subprocess. The runner merges variant config with task registry into a dict, passes it to `JobConfig.model_validate()`, then runs `await job.run()`. Opik tracking via `track_harbor()` (monkey-patches Harbor at runtime).
 - **Evaluator**: Subprocess-based assessment evaluation with pluggable backends. The evaluator spawns a CLI agent (`claude -p` or `codex exec`) as a subprocess to read trial artifacts and score them against assessment criteria. Backend selection via `[evaluation] backend` in `nasde.toml` (`"claude"` default, `"codex"` available). Claude backend uses `--output-format json` (without `--bare` — preserves OAuth/keychain auth for subscription billing and skills auto-discovery). Codex backend uses `--json --quiet --full-auto` for JSONL event streaming. Both backends respect the user's existing CLI authentication (OAuth subscription or API key) — no Agent SDK required, which avoids Anthropic's SDK-billing restrictions on subscription accounts. Configurable via `[evaluation]` in `nasde.toml` — backend, model, tools, MCP servers, skills, system prompt, and trajectory inclusion can all be customized. When `include_trajectory = true`, the evaluator also has access to the agent's ATIF trajectory (`agent/trajectory.json`). Default Claude model is `claude-opus-4-7`. Skills dir is mounted via temp-dir + `cwd` with `--add-dir <workspace>` so Claude's auto-discovery picks them up. **Repeated evaluations (output contract):** the judge is non-deterministic, so each trial is evaluated `eval_repetitions` times (default 3, `[evaluation] eval_repetitions` in `nasde.toml`, override `--eval-repetitions` on `run`/`eval`). The N evaluations of one trial run concurrently via `asyncio.gather` under the shared `--max-concurrent-eval` semaphore; the gather barrier ensures aggregation runs only after all N finish. Each is written **append-only** as `assessment_eval_<N>.json` (never overwritten — the bug being fixed was a single `assessment_eval.json` clobbered on every re-eval). A derived `assessment_summary.json` holds per-cluster mean/std(sample,n-1)/min/max aggregates: averages are computed **only within one `(evaluator_model, dimensions_fingerprint)` cluster**. A different judge model OR a changed `assessment_dimensions.json` rubric (added/removed dimension, changed `max_score`, OR changed `description`) is a different benchmark and is never averaged together — the fingerprint is the first 12 hex chars of the sha256 of the normalized (`sort_keys`) rubric JSON, stamped onto each `assessment_eval_<N>.json` at eval time (legacy files with no fingerprint → `""` cluster). Clustering by fingerprint also makes the per-group `members[0]` dimension/max_score read correct by construction. The largest cluster (tie → newest) is `dominant`. Opik upload uses the **dominant cluster's mean** (plus per-dimension std and `eval_n`), not the last evaluation. Legacy jobs are normalized by `eval_migration.py`, exposed as the **hidden** one-shot command `nasde migrate-evals` (`@app.command(hidden=True)` — invokable, but kept out of `nasde --help` and user docs because it is a migration tool, not a routine command; its tests stay to guard the logic). It takes a `jobs/` root (recurses) or a single trial dir, not a single job dir.
-- **Variant system**: Each variant is a directory under `variants/` with a required `variant.toml` declaring the agent type (`agent = "claude"`, `agent = "codex"`, or `agent = "gemini"`). For Claude Code variants, `CLAUDE.md` is injected into `/app/CLAUDE.md`; for Codex variants, `AGENTS.md` is injected into `/app/AGENTS.md`; for Gemini CLI variants, `GEMINI.md` is injected into `/app/GEMINI.md`. An optional `skills/` subdirectory contains Claude skill snapshots — each `skills/<name>/` is injected **whole** (incl. `references/` and sibling files, via the shared `plugin_registration.stage_skill_dir`) into `/app/.claude/skills/<name>/`. A `variant.toml` may also declare a `[[skill]]` array (skill-by-reference, ADR-009): each entry points at a source skill dir (optional `ref`) staged whole into `/app/.claude/skills/<name>/` without a copy under `variants/`. An optional `agents_skills/` subdirectory contains Codex skill snapshots and an optional `gemini_skills/` subdirectory contains Gemini skill snapshots — each `<name>/` (a dir with a `SKILL.md`) is registered through **Harbor's native skill-injection** (`config.agent.skills`), NOT via `sandbox_files`. See the "Codex/Gemini native skill injection" decision below for why: Codex and Gemini auto-discover skills only from a HOME-scoped dir (`$HOME/.agents/skills`, `~/.gemini/skills`), never from a cwd `/app/.agents/skills` or `/app/.gemini/skills` dir. If no `harbor_config.json` exists, one is auto-generated from `variant.toml`.
-- **Codex/Gemini native skill injection (ADR-012)**: Codex and Gemini auto-discover skills **only from a HOME-scoped directory** — `$HOME/.agents/skills` for Codex, `~/.gemini/skills` for Gemini — never from a `.agents/skills` / `.gemini/skills` dir sitting in the agent's `/app` cwd. The old path uploaded skill files into `/app/.agents/skills/` and `/app/.gemini/skills/` via `sandbox_files`, where the CLI never scans, so the skill was **never registered as a native slash-command** — the agent only stumbled on the file by reading it by hand. The fix routes `agents_skills/<name>/` and `gemini_skills/<name>/` through **Harbor's native `config.agent.skills`** list (`runner._collect_native_skill_dirs`): at trial time Harbor uploads each skill dir into the container's `skills_dir`, passes `skills_dir` to the agent ctor, and the agent's `_build_register_skills_command` copies them into the correct HOME-scoped location *inside* `run()` — the only point with the right timing (an earlier `setup()`/`sandbox_files` upload is in the wrong session relative to the CLI's skill scan, empirically verified). The `skills` list on the harbor_config agent is regenerated each run (tracked via `_nasde_derived_skills`), so a removed skill dir drops out and a hand-authored `skills` entry is preserved — mirroring the `sandbox_files` / `_nasde_derived_keys` contract. **Claude is intentionally NOT routed here**: its `variants/<v>/skills/` → `sandbox_files` → `/app/.claude/skills/` (+ `~/.claude/skills/`) path already lands where Claude Code auto-discovers from cwd, is tested, and also carries `[nasde.plugin]` / `[[skill]]` entries. See ADR-012.
+- **Variant system**: Each variant is a directory under `variants/` with a required `variant.toml` declaring the agent type (`agent = "claude"`, `agent = "codex"`, or `agent = "gemini"`). For Claude Code variants, `CLAUDE.md` is injected into `/app/CLAUDE.md`; for Codex variants, `AGENTS.md` is injected into `/app/AGENTS.md`; for Gemini CLI variants, `GEMINI.md` is injected into `/app/GEMINI.md`. An optional `skills/` subdirectory contains Claude skill snapshots — each `skills/<name>/` is injected **whole** (incl. `references/` and sibling files, via the shared `plugin_registration.stage_skill_dir`) into `/app/.claude/skills/<name>/`. A `variant.toml` may also declare a `[[skill]]` array (skill-by-reference, ADR-009): each entry points at a source skill dir (optional `ref`) without a copy under `variants/` — for Claude staged whole into `/app/.claude/skills/<name>/`, for Codex/Gemini routed through Harbor's native `config.agent.skills` (ADR-012, see the native-injection note below). An optional `agents_skills/` subdirectory contains Codex skill snapshots and an optional `gemini_skills/` subdirectory contains Gemini skill snapshots — each `<name>/` (a dir with a `SKILL.md`) is registered through **Harbor's native skill-injection** (`config.agent.skills`), NOT via `sandbox_files`. See the "Codex/Gemini native skill injection" decision below for why: Codex and Gemini auto-discover skills only from a HOME-scoped dir (`$HOME/.agents/skills`, `~/.gemini/skills`), never from a cwd `/app/.agents/skills` or `/app/.gemini/skills` dir. If no `harbor_config.json` exists, one is auto-generated from `variant.toml`.
+- **Codex/Gemini native skill injection (ADR-012)**: Codex and Gemini auto-discover skills **only from a HOME-scoped directory** — `$HOME/.agents/skills` for Codex, `~/.gemini/skills` for Gemini — never from a `.agents/skills` / `.gemini/skills` dir sitting in the agent's `/app` cwd. The old path uploaded skill files into `/app/.agents/skills/` and `/app/.gemini/skills/` via `sandbox_files`, where the CLI never scans, so the skill was **never registered as a native slash-command** — the agent only stumbled on the file by reading it by hand. The fix routes `agents_skills/<name>/` and `gemini_skills/<name>/` through **Harbor's native `config.agent.skills`** list (`runner._collect_native_skill_dirs`): at trial time Harbor uploads each skill dir into the container's `skills_dir`, passes `skills_dir` to the agent ctor, and the agent's `_build_register_skills_command` copies them into the correct HOME-scoped location *inside* `run()` — the only point with the right timing (an earlier `setup()`/`sandbox_files` upload is in the wrong session relative to the CLI's skill scan, empirically verified). The `skills` list on the harbor_config agent is regenerated each run (tracked via `_nasde_derived_skills`), so a removed skill dir drops out and a hand-authored `skills` entry is preserved — mirroring the `sandbox_files` / `_nasde_derived_keys` contract. This covers **all three** skill paths for Codex/Gemini: the variant snapshot (`agents_skills/`, `gemini_skills/`) plus `[[skill]]` by-reference and `[nasde.plugin]` skills — the latter two are resolved to skill dirs by `plugin_registration.collect_referenced_skill_dirs` / `collect_plugin_skill_dirs` and **unioned** into the same `skills` list (one `_nasde_derived_skills` marker tracks both; nasde warns when two derived dirs share a basename since Harbor's `resolve_skills` is last-wins). **Claude is intentionally NOT routed here**: its `variants/<v>/skills/` → `sandbox_files` → `/app/.claude/skills/` (+ `~/.claude/skills/`) path already lands where Claude Code auto-discovers from cwd, is tested, and is the carrier for its own `[nasde.plugin]` / `[[skill]]` skills (via `stage_skill_dir`); a Claude agent drops `extra_skill_dirs` at merge time to avoid double-injecting. See ADR-012.
 - **Codex ChatGPT OAuth**: Harbor's `Codex` agent uploads `~/.codex/auth.json` into the sandbox as `$CODEX_HOME/auth.json` — but **harbor 0.13 made this opt-in**: the agent now *defaults to `OPENAI_API_KEY`* and only uploads the OAuth `auth.json` when `CODEX_AUTH_JSON_PATH` (a specific file) or `CODEX_FORCE_AUTH_JSON` (truthy → `~/.codex/auth.json`) is set. Without an API key it would otherwise write an **empty** key into the sandbox (`Incorrect API key provided: ''` on `wss://api.openai.com/v1/responses`). So `_ensure_auth` opts into OAuth automatically: when no `OPENAI_API_KEY`/`CODEX_API_KEY` is set but `~/.codex/auth.json` exists, it sets `CODEX_FORCE_AUTH_JSON=true` (respecting a user-set `CODEX_AUTH_JSON_PATH`/`CODEX_FORCE_AUTH_JSON`). API key (`OPENAI_API_KEY`) always takes priority over OAuth. Tokens are not extracted back from sandbox after runs. Validate with `source scripts/export_codex_oauth_token.sh`. The runner bridges `CODEX_API_KEY` → `OPENAI_API_KEY` in `_ensure_auth` for users who prefer the Codex-specific env var name.
 - **Gemini Google OAuth**: When `~/.gemini/oauth_creds.json` exists (created by `gemini login`) and no API key env var is set (`GEMINI_API_KEY`, `GOOGLE_API_KEY`, `GOOGLE_APPLICATION_CREDENTIALS`), `ConfigurableGemini` injects the OAuth credentials into the sandbox via env var. API key always takes priority over OAuth. Validate with `source scripts/export_gemini_oauth_token.sh`.
 - **All dependencies are core**: `harbor`, `opik` are in `[project.dependencies]`. The evaluator depends on having `claude` CLI (default) or `codex` CLI installed and authenticated on the host — not bundled. No optional extras — `uv tool install .` gives full functionality. Assessment evaluation is on by default (`--without-eval` to skip).
 - **Versioning**: Derived from git tags via `hatch-vcs` plugin. Tag `v0.1.0` → version `0.1.0`. Commits after a tag produce dev versions like `0.1.1.dev3+gabcdef`. `_version.py` is auto-generated at build time and gitignored. See ADR-007.
 - **Bundled authoring skills**: `.claude/skills/nasde-benchmark-*` are shipped inside the wheel via `[tool.hatch.build.targets.wheel.force-include]` → `nasde_toolkit/_bundled_skills/`. `nasde install-skills` discovers them via `importlib.resources` (wheel install) with a fallback to the live `.claude/skills/` when running from an editable / source checkout. `nasde-dev` is deliberately excluded from the bundle (internal dev skill, not for end users).
 - **Auto-generated Dockerfile**: When a task has no `environment/Dockerfile`, nasde generates one from `source.git` + `[docker]`. For local paths, also generates `docker-compose.yaml` to override the build context. See `docker.py:ensure_task_environment()`.
-- **`[nasde.plugin]` (ADR-009)**: Ships a local Claude Code plugin (dir with `.claude-plugin/plugin.json`) into the sandbox with one `task.toml` declaration. `docker.py:ensure_task_plugin()` stages the plugin tree (at `ref` via a temp worktree) into a gitignored `_nasde-plugin/` inside the **active** build context (Harbor pins context to `environment/`; with `[nasde.source]` the context is the source repo/worktree — nasde reads it back from the generated compose and stages there), then appends a sentinel-fenced `COPY`+build stage to the Dockerfile (generated base if none, hand-written preserved). `plugin_registration.py` then registers the plugin's own `skills/` (whole dirs) and injects its MCP server (from `<plugin>/.mcp.json`, env-wrapped) into the task's `task.toml` — idempotent, fenced, never clobbers an author-declared same-name server. Skill-by-reference (`[[skill]]` in `variant.toml`) feeds the same skill-registration machinery. Derived sandbox files are merged into the variant's `harbor_config.json` each run. MCP injection writes `task.toml` because Harbor reads MCP servers only from there (`trial.py` → `task.config.environment.mcp_servers`).
+- **`[nasde.plugin]` (ADR-009)**: Ships a local Claude Code plugin (dir with `.claude-plugin/plugin.json`) into the sandbox with one `task.toml` declaration. `docker.py:ensure_task_plugin()` stages the plugin tree (at `ref` via a temp worktree) into a gitignored `_nasde-plugin/` inside the **active** build context (Harbor pins context to `environment/`; with `[nasde.source]` the context is the source repo/worktree — nasde reads it back from the generated compose and stages there), then appends a sentinel-fenced `COPY`+build stage to the Dockerfile (generated base if none, hand-written preserved). `plugin_registration.py` then registers the plugin's own `skills/` (whole dirs — for Claude into `/app/.claude/skills/`, for Codex/Gemini through Harbor's native `config.agent.skills` via `collect_plugin_skill_dirs`, ADR-012) and injects its MCP server (from `<plugin>/.mcp.json`, env-wrapped) into the task's `task.toml` — idempotent, fenced, never clobbers an author-declared same-name server. Skill-by-reference (`[[skill]]` in `variant.toml`) feeds the same agent-aware skill-registration machinery. Derived sandbox files are merged into the variant's `harbor_config.json` each run. MCP injection writes `task.toml` because Harbor reads MCP servers only from there (`trial.py` → `task.config.environment.mcp_servers`).
 - **Pass-through CLI**: `nasde harbor ...` delegates to Harbor's Typer app via `add_typer()`. `nasde opik ...` forwards args to Opik's Click CLI via `ctx.args`.
 - **Rubric calibration (ADR-010)**: `nasde calibrate publish PATHS...` / `pull-comments` close the loop between the LLM-as-a-Judge and a human reviewer by publishing trial diffs + assessments as PRs/MRs and pulling review comments back for rubric tuning. Two layers, deliberately separated: **GIT** (`git_platform_backends/git_ops.py` — `git push`/`ls-remote`, platform-agnostic, subprocess pattern from `docker.py`, not behind a Protocol) and **PLATFORM** (`git_platform_backends/` behind a `@runtime_checkable GitPlatformBackend` Protocol — `repo_exists`/`find_open_pr_for_branch`/`create_pr`/`fetch_pr_comments`/`validate_cli_installed`/`validate_auth`, mirroring `evaluator_backends/`). The base is keyed on `(repo, commit)` as an **orphan branch** `base/<repo>-<sha>` seeded once via `git archive HEAD` from the trial workspace (git deduplicates blobs by content across orphan bases — no shared ancestor needed); each trial is a feature branch `calib/<repo>-<sha>/<trial>` = base + the agent's `changes.patch` applied as a real commit + `.calibration/` files (no trajectory — secrets/clutter). `.calibration/` carries the reviewer's context: the task's `instruction.md` + `assessment_criteria.md` + `assessment_dimensions.json` (resolved from `result.json` `task_name`/`source`, trying both `tasks/<task>` and `evals/<source>/tasks/<task>` layouts), all `assessment_eval_<N>.json`, `assessment_summary.json`, and `metrics.json`. Idempotency is **open-only**: `find_open_pr_for_branch` matches only OPEN PRs/MRs (`gh pr list --state open`, `glab mr list` default), so a re-run skips a live round but lets a fresh round publish once the prior one is closed. The PR body is a pure transform of the dominant `AssessmentSummary` cluster (`calibration_publisher._render_pr_body`). Backend is **auto-detected from the sink repo URL host** (`github.com`→`gh`, `*gitlab*`→`glab`; `[calibration] platform` overrides for self-hosted) — no `backend` config field, eliminating the backend≠host mismatch. Preflight before any work: detect → `validate_cli_installed` (`shutil.which`, precise per-platform message) → `validate_auth` (`gh|glab auth status` exit code) → `repo_exists` (parses OUTPUT — `gh repo view` exits 0 even for a missing repo). Repo creation is out of scope (push creates branches ad-hoc in an existing repo). Reuses `_expand_to_trials`/`_capture_patch`/`_build_metrics` from `results_exporter.py` and `_aggregate_evaluations`/`_load_raw_evaluations`/`AssessmentSummary` from `evaluator.py`. Prerequisites mirror the evaluator's CLI requirement (ADR-002): `git` + `gh`/`glab` + login, no SDK, CLI keyring holds auth. The `nasde-benchmark-calibration` skill orchestrates the human-in-the-loop flow.
 - **Results export (EXPERIMENTAL)**: `results_exporter.py` + `nasde results-export PATHS... --to DIR` copy the analytic *essence* of trial artifacts out of the gitignored `jobs/` tree into a flat per-trial layout (`DIR/<job>__<trial>/` with `metrics.json`, `assessment_eval_*.json` (all repetitions), `assessment_summary.json`, `trajectory.json`, `changes.patch`, `verifier_stdout.txt`, `reward.txt`). Re-export **merges**: missing eval files are copied and the summary/metrics refreshed, while immutable files (trajectory, patch) are left as-is — so evaluations added after a first export are picked up. A legacy bare `assessment_eval.json` (pre-migration trial) is exported as `assessment_eval_1.json` with a `nasde migrate-evals` hint, so the export is never silently empty. Filesystem-as-interface: `DIR` is any plain path (iCloud/Dropbox/git repo) — no cloud SDK. It scans Harbor artifacts (`result.json`/`config.json`/`assessment_eval*.json`/`agent/trajectory.json`/workspace), **not** the best-effort `EXPERIMENT_LOG.md`. `metrics.json` is a self-contained summary composed from `result.json`+`config.json`+`agent/trajectory.json` — including **token & cost economics** (`token_usage`, `cost_usd`, `pricing_as_of`, `reasoning_effort`); see the token-cost note below and [ADR-011](docs/adr/011-token-cost-metrics.md). The code diff is captured as a patch (`git diff HEAD` + untracked via `git ls-files -z` + `git diff --no-index`, never `git add` — the workspace `.git` index is left untouched; `-z`/NUL parsing means non-ASCII untracked filenames are not dropped under `core.quotepath`). Selection is a mixed positional list of job and/or trial dirs (auto-classified: a dir whose children have `result.json` is a job; else a dir whose own `result.json` carries a `trial_name` key is a trial; a dir with a job-level `result.json` but no trial-shaped children/`trial_name` is skipped with a warning rather than mis-exported as garbage); re-export is idempotent and merge-based (a trial is reported `exported` only when something new was copied, else `skipped`). Reuses `_collect_trial_dirs`/`_load_json`/`_compute_duration_sec`/`_resolve_agent_name` from `evaluator.py`. Deliberately does **not** model "experiments" (one job can belong to many — a future UI layer's concern).
@@ -290,7 +290,7 @@ reasoning_effort = "high"           # Optional. Passed straight to the agent (no
 
 tasks = ["my-benchmark/task-a"]     # Optional: task-scope. Restrict this variant to specific tasks.
 
-[[skill]]                           # Optional (ADR-009): skill-by-reference, Claude only.
+[[skill]]                           # Optional (ADR-009): skill-by-reference. Works for any agent.
 path = "../../../src/plugins/my-plugin/skills/my-skill"   # source skill dir (required)
 ref  = "abc1234"                    # optional git ref, same semantics as [nasde.source]
 ```
@@ -311,12 +311,16 @@ the variant's scope the run aborts with a clear error. Either way the scope wins
 over an explicit `--tasks` filter. Absent or empty → unscoped (runs against every
 task, the default).
 
-**`[[skill]]` (skill-by-reference)**: each entry stages the **whole** skill dir
-(incl. `references/`) into `/app/.claude/skills/<name>/` from a source path —
-no copy under `variants/<v>/skills/`. Optional `ref` reads from a temp git
-worktree at that commit. The legacy `variants/<v>/skills/<name>/` copy path
-still works (and now also carries `references/`). Both feed the same
-`plugin_registration.stage_skill_dir` machinery as `[nasde.plugin]`.
+**`[[skill]]` (skill-by-reference)**: each entry references the **whole** skill
+dir (incl. `references/`) from a source path — no copy under
+`variants/<v>/skills/`. Optional `ref` reads from a temp git worktree at that
+commit. **Routing is agent-aware** (ADR-012): for **Claude** the dir is staged
+into `/app/.claude/skills/<name>/` via `plugin_registration.stage_skill_dir`
+(same machinery as `[nasde.plugin]`); for **Codex/Gemini** the dir is resolved
+by `collect_referenced_skill_dirs` and routed through Harbor's native
+`config.agent.skills` (→ `$HOME/.agents/skills`, `~/.gemini/skills`), because
+those CLIs never scan the `/app` cwd. The legacy `variants/<v>/skills/<name>/`
+copy path still works (and carries `references/`).
 
 ### harbor_config.json (per variant)
 
diff --git a/docs/adr/012-native-codex-gemini-skill-injection.md b/docs/adr/012-native-codex-gemini-skill-injection.md
index f9db717..4232b80 100644
--- a/docs/adr/012-native-codex-gemini-skill-injection.md
+++ b/docs/adr/012-native-codex-gemini-skill-injection.md
@@ -94,6 +94,36 @@ skills and `[[skill]]` by-reference (ADR-009) via `stage_skill_dir`. Migrating
 it onto `config.agent.skills` would be a larger, riskier change with no
 behavioral payoff. Scope is kept to the actual bug.
 
+### All three skill paths, not just the snapshot
+
+A skill reaches an agent three ways: the variant **snapshot**
+(`agents_skills/`, `gemini_skills/`), **`[[skill]]` by-reference** in
+`variant.toml`, and a **`[nasde.plugin]`**'s own `skills/`. The first revision
+of this decision fixed only the snapshot path; the latter two still went
+through `stage_skill_dir`'s hardcoded `/app/.claude/skills/` and so were broken
+for Codex/Gemini in exactly the same way.
+
+All three are now routed natively for Codex/Gemini. `[[skill]]`/plugin skills
+are resolved to skill *directories* by `collect_referenced_skill_dirs` /
+`collect_plugin_skill_dirs` (the dir-list counterparts of the unchanged
+Claude-only `stage_referenced_skills` / `register_plugin_skills`), threaded as
+`extra_skill_dirs` and **unioned** with the snapshot dirs inside
+`_refresh_agent_skills`. The union is tracked by the single
+`_nasde_derived_skills` marker, so stale-drop and hand-authored preservation
+work across both sources. For a Claude agent `extra_skill_dirs` is dropped at
+merge time (those skills already ride `sandbox_files`, so feeding the native
+list too would double-inject). Because Harbor's `resolve_skills` keys skills by
+basename (last-wins), nasde warns when two derived dirs share a basename so the
+loser does not silently vanish.
+
+**Coverage note (honest).** The `[[skill]]` path is verified end-to-end on a
+live agent (see Consequences). The `[nasde.plugin]` path shares the same channel
+from `extra_skill_dirs` onward, but there is no example benchmark with a
+`[nasde.plugin]` to run it e2e — so it is covered by an **integration test** that
+drives the real `docker.ensure_task_plugin` staging → `collect_plugin_skill_dirs`
+→ harbor_config (no Docker build), plus unit tests on the resolver. Full
+live-agent e2e for a plugin is a follow-up for when a plugin benchmark exists.
+
 ### Frontmatter guardrail
 
 Codex's loader is strict: a `SKILL.md` that does not **start** with a `---`
@@ -123,8 +153,11 @@ comments moved below the closing `---`.
     its native mechanism, not by reading the file.
 - Skill×model matrix results for Codex/Gemini collected under the old behavior
   are invalid (skill-as-document, not skill-as-native) and must be re-run.
-- `sandbox_files` no longer carries skill files for Codex/Gemini; it carries
-  only `AGENTS.md` / `GEMINI.md` (and, for Claude, the unchanged skill files).
+- For Codex/Gemini, skills (snapshot, `[[skill]]`, and plugin) are all carried
+  by `config.agent.skills`; the agent's effective `sandbox_files` is just
+  `AGENTS.md` / `GEMINI.md`. The Claude `sandbox_files` skill map is still built
+  unconditionally (so Claude stays byte-identical and tested) but a Codex/Gemini
+  agent never consumes it.
 - The brittle `_expand_skill_targets` workaround in `ConfigurableGemini` (which
   hardcoded `/root/.gemini/skills/`) is removed — Harbor's native command uses
   the shell-expanded `$HOME`, correct for non-root users too.
@@ -132,9 +165,13 @@ comments moved below the closing `---`.
 ## References
 
 - `runner._collect_native_skill_dirs`, `runner._refresh_agent_skills`,
-  `runner._agent_type_from_import_path`
+  `runner._agent_type_from_import_path`, `runner._referenced_skill_dirs_for_agent`,
+  `runner._warn_skill_basename_collisions`
+- `plugin_registration.collect_referenced_skill_dirs`,
+  `plugin_registration.collect_plugin_skill_dirs` (dir-list channel),
+  `stage_referenced_skills` / `register_plugin_skills` (unchanged Claude sink)
 - Harbor `agents/installed/codex.py::_build_register_skills_command`,
   `gemini_cli.py::_build_register_skills_command`, `trial.py` skill upload
 - `openai/codex` `core-skills/src/loader.rs` (native discovery roots)
-- [ADR-009](009-plugin-and-skill-by-reference.md) (the Claude skill path this
-  decision deliberately leaves untouched)
+- [ADR-009](009-plugin-and-skill-by-reference.md) (the `[[skill]]` / plugin
+  mechanism whose Claude sink this decision leaves untouched)
diff --git a/examples/nasde-dev-skill/variants/codex-nasde-dev-with-arch/AGENTS.md b/examples/nasde-dev-skill/variants/codex-nasde-dev-with-arch/AGENTS.md
new file mode 100644
index 0000000..dcbde60
--- /dev/null
+++ b/examples/nasde-dev-skill/variants/codex-nasde-dev-with-arch/AGENTS.md
@@ -0,0 +1,9 @@
+# Agent Instructions
+
+## Approach
+
+1. Begin by consulting the **nasde-dev** skill and apply its guidance throughout this task.
+2. Before writing any code, explore the existing codebase to understand its architecture and conventions.
+3. Follow existing architectural patterns exactly — match naming, namespaces, directory structure, and code style.
+4. When modifying nasde-toolkit functionality (CLI, runner, evaluator, config, agents), follow the `nasde-dev` skill's verification protocol, including documentation sync requirements.
+5. After making changes, verify documentation is consistent (CLAUDE.md, README.md, ARCHITECTURE.md).
diff --git a/examples/nasde-dev-skill/variants/codex-nasde-dev-with-arch/variant.toml b/examples/nasde-dev-skill/variants/codex-nasde-dev-with-arch/variant.toml
new file mode 100644
index 0000000..31a8c55
--- /dev/null
+++ b/examples/nasde-dev-skill/variants/codex-nasde-dev-with-arch/variant.toml
@@ -0,0 +1,9 @@
+agent = "codex"
+model = "gpt-5.4"
+
+# nasde-dev skill by-reference (ADR-009) from the live source. For a Codex
+# variant this is registered through Harbor's native config.agent.skills so
+# the skill lands in $HOME/.agents/skills and Codex auto-discovers it — NOT in
+# the /app cwd (which Codex never scans). See ADR-012.
+[[skill]]
+path = "../../../../.claude/skills/nasde-dev"
diff --git a/examples/nasde-dev-skill/variants/gemini-nasde-dev-with-arch/GEMINI.md b/examples/nasde-dev-skill/variants/gemini-nasde-dev-with-arch/GEMINI.md
new file mode 100644
index 0000000..dcbde60
--- /dev/null
+++ b/examples/nasde-dev-skill/variants/gemini-nasde-dev-with-arch/GEMINI.md
@@ -0,0 +1,9 @@
+# Agent Instructions
+
+## Approach
+
+1. Begin by consulting the **nasde-dev** skill and apply its guidance throughout this task.
+2. Before writing any code, explore the existing codebase to understand its architecture and conventions.
+3. Follow existing architectural patterns exactly — match naming, namespaces, directory structure, and code style.
+4. When modifying nasde-toolkit functionality (CLI, runner, evaluator, config, agents), follow the `nasde-dev` skill's verification protocol, including documentation sync requirements.
+5. After making changes, verify documentation is consistent (CLAUDE.md, README.md, ARCHITECTURE.md).
diff --git a/examples/nasde-dev-skill/variants/gemini-nasde-dev-with-arch/variant.toml b/examples/nasde-dev-skill/variants/gemini-nasde-dev-with-arch/variant.toml
new file mode 100644
index 0000000..df000e7
--- /dev/null
+++ b/examples/nasde-dev-skill/variants/gemini-nasde-dev-with-arch/variant.toml
@@ -0,0 +1,9 @@
+agent = "gemini"
+model = "google/gemini-3-flash-preview"
+
+# nasde-dev skill by-reference (ADR-009) from the live source. For a Gemini
+# variant this is registered through Harbor's native config.agent.skills so
+# the skill lands in ~/.gemini/skills and Gemini auto-discovers it — NOT in
+# the /app cwd (which Gemini never scans). See ADR-012.
+[[skill]]
+path = "../../../../.claude/skills/nasde-dev"
diff --git a/src/nasde_toolkit/plugin_registration.py b/src/nasde_toolkit/plugin_registration.py
index 82f5bbb..83d841c 100644
--- a/src/nasde_toolkit/plugin_registration.py
+++ b/src/nasde_toolkit/plugin_registration.py
@@ -11,11 +11,17 @@
 2. The ``[[skill]]`` array in variant.toml references a skill from a source
    path instead of copying it into ``variants/<v>/skills/``.
 
-Both feed :func:`stage_skill_dir`, which expands a skill directory into the
-flat ``{container_path: host_file}`` mapping ConfigurableClaude uploads
-(it only uploads regular files, mirroring ``_collect_codex_skills``). The
-plugin path additionally calls :func:`inject_mcp_server` to register the
-plugin's MCP server in the task's Harbor config.
+For **Claude** both feed :func:`stage_skill_dir`, which expands a skill
+directory into the flat ``{container_path: host_file}`` mapping
+ConfigurableClaude uploads to ``/app/.claude/skills`` (Claude's cwd discovery
+root). The plugin path additionally calls :func:`inject_mcp_server` to register
+the plugin's MCP server in the task's Harbor config.
+
+For **Codex/Gemini** the same skill *directories* are resolved by
+:func:`collect_referenced_skill_dirs` / :func:`collect_plugin_skill_dirs` and
+fed to Harbor's native ``config.agent.skills`` instead — those CLIs scan only a
+HOME-scoped dir (``$HOME/.agents/skills``, ``~/.gemini/skills``), never the
+``/app`` cwd the sandbox map targets. See ADR-012.
 """
 
 from __future__ import annotations
@@ -119,6 +125,54 @@ def stage_referenced_skills(
         console.print(f"  [dim]Registered referenced skill '{resolved.name}'[/dim]")
 
 
+def collect_referenced_skill_dirs(
+    variant_dir: Path,
+    worktree_factory: WorktreeFactory,
+) -> list[Path]:
+    """Resolve the ``[[skill]]`` array to a list of whole skill directories.
+
+    Same resolution as :func:`stage_referenced_skills` (each entry's ``path``
+    relative to the variant dir, optional ``ref`` read from a temporary git
+    worktree) but returns the resolved *directories* instead of flattening them
+    into a ``sandbox_files`` map under ``/app/.claude/skills``. Codex and Gemini
+    auto-discover skills only from a HOME-scoped dir, never from the agent's
+    ``/app`` cwd, so they take this dir list through Harbor's native
+    ``config.agent.skills`` mechanism. Claude keeps using
+    :func:`stage_referenced_skills`. See ADR-012.
+
+    Paths are fully ``resolve()``d so they compare equal to the snapshot dirs
+    from ``runner._collect_native_skill_dirs`` — the union/dedup, stale-drop and
+    basename-collision logic in ``_refresh_agent_skills`` key on string equality,
+    which would break on a symlinked ``/tmp`` (macOS) or a ``ref`` worktree path.
+    """
+    dirs: list[Path] = []
+    for entry in _read_skill_entries(variant_dir):
+        resolved = _resolve_skill_source(variant_dir, entry["path"], entry.get("ref", ""), worktree_factory)
+        if not (resolved / "SKILL.md").exists():
+            raise RuntimeError(f"[[skill]] '{resolved}' has no SKILL.md")
+        dirs.append(resolved.resolve())
+    return dirs
+
+
+def collect_plugin_skill_dirs(staged: StagedPlugin) -> list[Path]:
+    """Resolve a shipped plugin's own ``skills/`` to a list of skill dirs.
+
+    The dir-list counterpart of :func:`register_plugin_skills`, for the same
+    reason as :func:`collect_referenced_skill_dirs`: Codex/Gemini take these
+    dirs through Harbor's native ``config.agent.skills`` rather than a
+    cwd ``sandbox_files`` mapping. Paths are ``resolve()``d for the same
+    string-equality reason as :func:`collect_referenced_skill_dirs`.
+    """
+    skills_root = staged.staged_dir / "skills"
+    if not skills_root.is_dir():
+        return []
+    return [
+        skill_dir.resolve()
+        for skill_dir in sorted(skills_root.iterdir())
+        if skill_dir.is_dir() and (skill_dir / "SKILL.md").exists()
+    ]
+
+
 def inject_mcp_server(task_dir: Path, staged: StagedPlugin) -> None:
     """Write the plugin's MCP servers into the task's task.toml.
 
diff --git a/src/nasde_toolkit/runner.py b/src/nasde_toolkit/runner.py
index c43610d..a49ad2c 100644
--- a/src/nasde_toolkit/runner.py
+++ b/src/nasde_toolkit/runner.py
@@ -28,6 +28,8 @@
     ensure_task_plugin,
 )
 from nasde_toolkit.plugin_registration import (
+    collect_plugin_skill_dirs,
+    collect_referenced_skill_dirs,
     inject_mcp_server,
     register_plugin_skills,
     stage_referenced_skills,
@@ -74,9 +76,9 @@ async def run_benchmark(
 
     variant_dir = resolve_variant_dir(config.project_dir, variant)
 
-    extra_sandbox_files = _prepare_task_environments(config, variant_dir)
+    extra_sandbox_files, extra_skill_dirs = _prepare_task_environments(config, variant_dir)
 
-    harbor_config_path = _ensure_harbor_config(variant_dir, variant, extra_sandbox_files)
+    harbor_config_path = _ensure_harbor_config(variant_dir, variant, extra_sandbox_files, extra_skill_dirs)
 
     _ensure_auth(_read_agent_import_path(harbor_config_path))
 
@@ -121,15 +123,30 @@ async def run_benchmark(
 # ---------------------------------------------------------------------------
 
 
-def _prepare_task_environments(config: ProjectConfig, variant_dir: Path) -> dict[str, str]:
+def _prepare_task_environments(config: ProjectConfig, variant_dir: Path) -> tuple[dict[str, str], list[str]]:
     """Generate per-task Docker environments and collect agent skill files.
 
     Per task: auto-generate the source Dockerfile/compose ([nasde.source]),
     then stage the [nasde.plugin] tree into the build context, register the
     plugin's own skills, and inject its MCP server into the task's task.toml.
-    Then add the variant's referenced [[skill]] dirs. Returns the extra
-    ``{container_path: host_file}`` entries to merge into the variant's
-    harbor_config.json sandbox_files.
+    Then add the variant's referenced [[skill]] dirs.
+
+    Returns ``(extra_sandbox_files, extra_skill_dirs)``:
+
+    - ``extra_sandbox_files`` — the ``{container_path: host_file}`` map for the
+      **Claude** carrier (``/app/.claude/skills/...``), built unconditionally so
+      Claude stays byte-identical and tested.
+    - ``extra_skill_dirs`` — the same plugin / ``[[skill]]`` skill directories
+      as a host-path list, fed to **Codex/Gemini** through Harbor's native
+      ``config.agent.skills`` (those CLIs scan only a HOME-scoped dir, never the
+      ``/app`` cwd the sandbox map targets). The claude-vs-native choice is made
+      per agent at merge time in ``_refresh_agent_skills``. See ADR-012.
+
+    The ``[[skill]]`` array is resolved twice (once per channel — the Claude
+    ``stage_referenced_skills`` and the native ``collect_referenced_skill_dirs``).
+    A ``ref`` worktree is created by each call; all worktrees live in
+    ``_active_worktrees`` until ``cleanup_worktrees()`` runs in the ``finally``
+    after the job, well after Harbor has uploaded the skill dirs.
 
     Plugin skills go to a single variant-wide ``/app/.claude/skills/`` path,
     so heterogeneous ``[nasde.plugin]`` across tasks in one project would
@@ -140,6 +157,7 @@ def _prepare_task_environments(config: ProjectConfig, variant_dir: Path) -> dict
     _require_homogeneous_plugins(config)
 
     extra_sandbox_files: dict[str, str] = {}
+    extra_skill_dirs: list[str] = []
 
     for task in config.tasks:
         if task.source is not None:
@@ -152,10 +170,12 @@ def _prepare_task_environments(config: ProjectConfig, variant_dir: Path) -> dict
                 has_source=task.source is not None,
             )
             register_plugin_skills(staged, extra_sandbox_files)
+            extra_skill_dirs.extend(str(d) for d in collect_plugin_skill_dirs(staged))
             inject_mcp_server(task.path, staged)
 
     stage_referenced_skills(variant_dir, extra_sandbox_files, create_ref_worktree)
-    return extra_sandbox_files
+    extra_skill_dirs.extend(str(d) for d in collect_referenced_skill_dirs(variant_dir, create_ref_worktree))
+    return extra_sandbox_files, extra_skill_dirs
 
 
 def _require_homogeneous_plugins(config: ProjectConfig) -> None:
@@ -535,6 +555,7 @@ def _ensure_harbor_config(
     variant_dir: Path,
     variant: str,
     extra_sandbox_files: dict[str, str],
+    extra_skill_dirs: list[str] | None = None,
 ) -> Path:
     """Generate harbor_config.json if absent, then refresh sandbox_files.
 
@@ -544,12 +565,16 @@ def _ensure_harbor_config(
     ADR-009). This is unconditional so removed/renamed ``[[skill]]`` or
     ``[nasde.plugin]`` entries between runs do not linger as stale mappings
     pointing at now-deleted worktree paths.
+
+    ``extra_skill_dirs`` carries the same plugin / ``[[skill]]`` directories as
+    a host-path list for the Codex/Gemini native channel (``config.agent.skills``);
+    Claude ignores it (its copies ride ``extra_sandbox_files``). See ADR-012.
     """
     harbor_config_path = variant_dir / "harbor_config.json"
     if not harbor_config_path.exists():
         _generate_harbor_config(variant_dir, variant)
 
-    _refresh_sandbox_files(harbor_config_path, variant_dir, extra_sandbox_files)
+    _refresh_sandbox_files(harbor_config_path, variant_dir, extra_sandbox_files, extra_skill_dirs or [])
     return harbor_config_path
 
 
@@ -583,6 +608,7 @@ def _refresh_sandbox_files(
     harbor_config_path: Path,
     variant_dir: Path,
     extra: dict[str, str],
+    extra_skill_dirs: list[str] | None = None,
 ) -> None:
     """Rebuild each agent's sandbox_files, preserving hand-written entries.
 
@@ -609,7 +635,7 @@ def _refresh_sandbox_files(
     config = json.loads(harbor_config_path.read_text())
     agent_type = load_variant_agent_type(variant_dir)
     for agent in config.get("agents", []):
-        _refresh_agent_sandbox_files(agent, variant_dir, agent_type, extra)
+        _refresh_agent_sandbox_files(agent, variant_dir, agent_type, extra, extra_skill_dirs or [])
     harbor_config_path.write_text(json.dumps(config, indent=2))
 
 
@@ -618,6 +644,7 @@ def _refresh_agent_sandbox_files(
     variant_dir: Path,
     agent_type: str,
     extra: dict[str, str],
+    extra_skill_dirs: list[str],
 ) -> None:
     kwargs = agent.setdefault("kwargs", {})
     current = kwargs.get("sandbox_files", {}) or {}
@@ -635,27 +662,46 @@ def _refresh_agent_sandbox_files(
 
     kwargs["sandbox_files"] = {**extra, **authored, **handwritten}
     agent["_nasde_derived_keys"] = sorted(extra.keys())
-    _refresh_agent_skills(agent, variant_dir, agent_type)
+    _refresh_agent_skills(agent, variant_dir, agent_type, extra_skill_dirs)
 
 
-def _refresh_agent_skills(agent: dict, variant_dir: Path, agent_type: str) -> None:
+def _refresh_agent_skills(
+    agent: dict,
+    variant_dir: Path,
+    agent_type: str,
+    extra_skill_dirs: list[str] | None = None,
+) -> None:
     """Rebuild the agent's ``skills`` list, preserving hand-authored entries.
 
-    nasde owns the entries derived from the variant's ``agents_skills/`` /
-    ``gemini_skills/`` subdirs (tracked via ``_nasde_derived_skills``). A
-    removed skill dir disappears from the list on the next run, exactly like a
-    removed ``[[skill]]`` drops from ``sandbox_files``. Any other entry was put
-    in ``harbor_config.json`` by hand and is kept untouched, so an author who
-    wires Harbor-native ``skills`` themselves is never clobbered.
+    nasde owns the entries derived from **two** sources, both routed through
+    Harbor's native ``config.agent.skills`` for Codex/Gemini:
+
+    1. the variant's ``agents_skills/`` / ``gemini_skills/`` snapshot subdirs
+       (``_collect_native_skill_dirs``), and
+    2. ``[[skill]]`` by-reference + ``[nasde.plugin]`` skills resolved this run
+       (``extra_skill_dirs``), which for Claude instead ride ``sandbox_files``.
+
+    The union is tracked via a single ``_nasde_derived_skills`` marker, so a
+    removed skill from *either* source drops on the next run while a
+    hand-authored ``skills`` entry is preserved. Any entry that is neither
+    derived this run nor was nasde-derived last run was put in
+    ``harbor_config.json`` by hand and is kept untouched.
 
     The skill subdir is keyed on *this agent's own* type, read back from its
     ``import_path`` (falling back to the variant's declared type). A variant is
     one agent type by contract, but a hand-written multi-agent config can mix
-    types — keying per agent keeps each one reading the right subdir instead of
-    the variant's first declared type.
+    types — keying per agent keeps each one reading the right subdir. For a
+    Claude agent ``extra_skill_dirs`` is intentionally dropped: those skills are
+    already carried in ``sandbox_files`` to ``/app/.claude/skills`` (and would
+    double-inject otherwise). See ADR-012.
     """
     effective_type = _agent_type_from_import_path(agent.get("import_path")) or agent_type
-    derived = _collect_native_skill_dirs(variant_dir, effective_type)
+    snapshot = _collect_native_skill_dirs(variant_dir, effective_type)
+    referenced = _referenced_skill_dirs_for_agent(effective_type, extra_skill_dirs or [])
+    derived = list(dict.fromkeys(snapshot + referenced))
+
+    _warn_skill_basename_collisions(str(agent.get("name", "<unnamed>")), derived)
+
     prev_derived = set(agent.get("_nasde_derived_skills", []) or [])
     current = agent.get("skills", []) or []
     handwritten = [s for s in current if s not in prev_derived and s not in derived]
@@ -663,6 +709,43 @@ def _refresh_agent_skills(agent: dict, variant_dir: Path, agent_type: str) -> No
     agent["_nasde_derived_skills"] = sorted(derived)
 
 
+def _referenced_skill_dirs_for_agent(effective_type: str, extra_skill_dirs: list[str]) -> list[str]:
+    """``[[skill]]``/plugin dirs for the native channel — empty for Claude.
+
+    Claude's referenced/plugin skills already ride ``sandbox_files`` to its
+    cwd ``/app/.claude/skills`` discovery root, so feeding them here too would
+    double-inject. Codex/Gemini get the dir list and warn on bad frontmatter.
+    """
+    if effective_type == "claude":
+        return []
+    for skill_dir in extra_skill_dirs:
+        skill_md = Path(skill_dir) / "SKILL.md"
+        if skill_md.exists():
+            _warn_if_skill_frontmatter_malformed(skill_md, effective_type)
+    return extra_skill_dirs
+
+
+def _warn_skill_basename_collisions(agent_name: str, skill_dirs: list[str]) -> None:
+    """Warn when two derived skill dirs share a basename (Harbor last-wins).
+
+    Harbor's ``resolve_skills`` keys skills by ``dir.name`` and the last one
+    wins, so a snapshot ``agents_skills/foo`` and a referenced ``.../foo`` with
+    the same basename silently collapse to one in the container. Surface it so
+    the loser doesn't vanish unnoticed.
+    """
+    by_name: dict[str, list[str]] = {}
+    for skill_dir in skill_dirs:
+        by_name.setdefault(Path(skill_dir).name, []).append(skill_dir)
+    for name, dirs in sorted(by_name.items()):
+        if len(dirs) > 1:
+            joined = "\n  ".join(dirs)
+            console.print(
+                f"[yellow]WARNING ({agent_name}): {len(dirs)} skill dirs share the basename "
+                f"'{name}' — Harbor registers only the last (the others vanish):\n  {joined}\n"
+                "  → rename one skill dir so each has a unique name.[/yellow]"
+            )
+
+
 def _agent_type_from_import_path(import_path: str | None) -> str | None:
     """Map an agent import_path back to its ``agent`` type, or None if unknown."""
     if _is_codex_agent(import_path):
diff --git a/tests/test_plugin_registration.py b/tests/test_plugin_registration.py
index 3acf4d6..c8646fe 100644
--- a/tests/test_plugin_registration.py
+++ b/tests/test_plugin_registration.py
@@ -11,6 +11,8 @@
 
 from nasde_toolkit.docker import StagedPlugin, cleanup_worktrees, create_ref_worktree
 from nasde_toolkit.plugin_registration import (
+    collect_plugin_skill_dirs,
+    collect_referenced_skill_dirs,
     inject_mcp_server,
     register_plugin_skills,
     stage_referenced_skills,
@@ -361,6 +363,126 @@ def test_stage_referenced_skills_no_entries_is_noop(tmp_path: Path) -> None:
     assert sandbox == {}
 
 
+# --- collect_referenced_skill_dirs / collect_plugin_skill_dirs (ADR-012) ---
+# Dir-list counterparts feeding the Codex/Gemini native config.agent.skills path.
+
+
+def test_collect_referenced_skill_dirs_returns_dirs_not_sandbox(tmp_path: Path) -> None:
+    src_skills = tmp_path / "src" / "skills"
+    src_skills.mkdir(parents=True)
+    skill = _make_skill(src_skills, "analyze-conversation")
+
+    variant_dir = tmp_path / "variants" / "with-skill"
+    variant_dir.mkdir(parents=True)
+    (variant_dir / "variant.toml").write_text(
+        'agent = "codex"\nmodel = "m"\n\n[[skill]]\npath = "../../src/skills/analyze-conversation"\n'
+    )
+
+    dirs = collect_referenced_skill_dirs(variant_dir, create_ref_worktree)
+
+    assert dirs == [skill.resolve()]
+
+
+def test_collect_referenced_skill_dirs_no_entries_is_empty(tmp_path: Path) -> None:
+    variant_dir = tmp_path / "variants" / "vanilla"
+    variant_dir.mkdir(parents=True)
+    (variant_dir / "variant.toml").write_text('agent = "codex"\nmodel = "m"\n')
+
+    assert collect_referenced_skill_dirs(variant_dir, create_ref_worktree) == []
+
+
+def test_collect_referenced_skill_dirs_returns_canonical_path(tmp_path: Path) -> None:
+    """Returned dirs must be fully resolved (no symlink, no `..`) so they
+    string-compare equal to runner._collect_native_skill_dirs' resolved snapshot
+    dirs — otherwise dedup / stale-drop / basename-collision break on a
+    symlinked /tmp (macOS) or a ref worktree path. See ADR-012."""
+    src_skills = tmp_path / "src" / "skills"
+    src_skills.mkdir(parents=True)
+    skill = _make_skill(src_skills, "analyze-conversation")
+
+    variant_dir = tmp_path / "variants" / "with-skill"
+    variant_dir.mkdir(parents=True)
+    (variant_dir / "variant.toml").write_text(
+        'agent = "codex"\nmodel = "m"\n\n[[skill]]\npath = "../../src/skills/analyze-conversation"\n'
+    )
+
+    dirs = collect_referenced_skill_dirs(variant_dir, create_ref_worktree)
+
+    assert dirs == [d.resolve() for d in dirs]
+    assert dirs == [skill.resolve()]
+
+
+def test_collect_plugin_skill_dirs_returns_canonical_path(tmp_path: Path) -> None:
+    staged = _staged_plugin(tmp_path)
+
+    dirs = collect_plugin_skill_dirs(staged)
+
+    assert dirs == [d.resolve() for d in dirs]
+
+
+def test_collect_referenced_skill_dirs_missing_skill_md_raises(tmp_path: Path) -> None:
+    bad = tmp_path / "src" / "not-a-skill"
+    bad.mkdir(parents=True)
+    variant_dir = tmp_path / "variants" / "v"
+    variant_dir.mkdir(parents=True)
+    (variant_dir / "variant.toml").write_text(
+        'agent = "codex"\nmodel = "m"\n\n[[skill]]\npath = "../../src/not-a-skill"\n'
+    )
+
+    with pytest.raises(RuntimeError, match="no SKILL.md"):
+        collect_referenced_skill_dirs(variant_dir, create_ref_worktree)
+
+
+def test_collect_referenced_skill_dirs_ref_uses_snapshot(tmp_path: Path) -> None:
+    repo = tmp_path / "repo"
+    skills = repo / "src" / "skills"
+    skills.mkdir(parents=True)
+    _make_skill(skills, "analyze-conversation")
+    subprocess.run(["git", "init"], cwd=repo, capture_output=True, check=True)
+    subprocess.run(["git", "config", "user.email", "t@t.com"], cwd=repo, capture_output=True, check=True)
+    subprocess.run(["git", "config", "user.name", "T"], cwd=repo, capture_output=True, check=True)
+    subprocess.run(["git", "add", "."], cwd=repo, capture_output=True, check=True)
+    subprocess.run(["git", "commit", "-m", "init"], cwd=repo, capture_output=True, check=True)
+    commit = subprocess.run(
+        ["git", "rev-parse", "HEAD"], cwd=repo, capture_output=True, text=True, check=True
+    ).stdout.strip()
+    (skills / "analyze-conversation" / "SKILL.md").write_text("---\nname: analyze-conversation\n---\nCHANGED")
+    subprocess.run(["git", "add", "."], cwd=repo, capture_output=True, check=True)
+    subprocess.run(["git", "commit", "-m", "later"], cwd=repo, capture_output=True, check=True)
+
+    variant_dir = tmp_path / "variants" / "with-skill"
+    variant_dir.mkdir(parents=True)
+    (variant_dir / "variant.toml").write_text(
+        'agent = "codex"\nmodel = "m"\n\n'
+        "[[skill]]\n"
+        f'path = "../../repo/src/skills/analyze-conversation"\nref = "{commit}"\n'
+    )
+
+    try:
+        dirs = collect_referenced_skill_dirs(variant_dir, create_ref_worktree)
+        assert len(dirs) == 1
+        assert "CHANGED" not in (dirs[0] / "SKILL.md").read_text()
+    finally:
+        cleanup_worktrees()
+
+
+def test_collect_plugin_skill_dirs_returns_each_skill(tmp_path: Path) -> None:
+    staged = _staged_plugin(tmp_path)
+
+    dirs = collect_plugin_skill_dirs(staged)
+
+    assert dirs == [staged.staged_dir / "skills" / "analyze-conversation"]
+
+
+def test_collect_plugin_skill_dirs_noop_without_skills(tmp_path: Path) -> None:
+    staged_dir = tmp_path / "_nasde-plugin"
+    (staged_dir / ".claude-plugin").mkdir(parents=True)
+    (staged_dir / ".claude-plugin" / "plugin.json").write_text("{}")
+    staged = StagedPlugin(staged_dir=staged_dir, install_root="/opt/x", plugin_name="x", env={})
+
+    assert collect_plugin_skill_dirs(staged) == []
+
+
 def test_stage_referenced_skills_ref_uses_snapshot(tmp_path: Path) -> None:
     repo = tmp_path / "repo"
     skills = repo / "src" / "skills"
diff --git a/tests/test_runner_skills.py b/tests/test_runner_skills.py
index 97adfff..2bf959d 100644
--- a/tests/test_runner_skills.py
+++ b/tests/test_runner_skills.py
@@ -13,12 +13,13 @@
 
 import pytest
 
-from nasde_toolkit.config import PluginConfig, ProjectConfig, TaskConfig
+from nasde_toolkit.config import DockerConfig, PluginConfig, ProjectConfig, TaskConfig
 from nasde_toolkit.runner import (
     _collect_native_skill_dirs,
     _collect_sandbox_files,
     _ensure_harbor_config,
     _generate_harbor_config,
+    _prepare_task_environments,
     _require_homogeneous_plugins,
 )
 
@@ -482,7 +483,7 @@ def test_collect_native_skill_dirs_warns_on_leading_comment_frontmatter(
     dirs = _collect_native_skill_dirs(variant_dir, "codex")
 
     assert dirs == [str(skill.resolve())]
-    assert "missing YAML frontmatter" in capsys.readouterr().out
+    assert "missing YAML frontmatter" in " ".join(capsys.readouterr().out.split())
 
 
 def test_collect_native_skill_dirs_no_warning_on_clean_frontmatter(
@@ -493,7 +494,7 @@ def test_collect_native_skill_dirs_no_warning_on_clean_frontmatter(
 
     _collect_native_skill_dirs(variant_dir, "codex")
 
-    assert "missing YAML frontmatter" not in capsys.readouterr().out
+    assert "missing YAML frontmatter" not in " ".join(capsys.readouterr().out.split())
 
 
 def test_refresh_keys_skills_per_agent_in_mixed_config(tmp_path: Path) -> None:
@@ -520,3 +521,182 @@ def test_refresh_keys_skills_per_agent_in_mixed_config(tmp_path: Path) -> None:
 
     assert _skills(harbor_path, agent_index=0) == [str(codex_skill.resolve())]
     assert _skills(harbor_path, agent_index=1) == [str(gemini_skill.resolve())]
+
+
+# ---------------------------------------------------------------------------
+# extra_skill_dirs — [[skill]] / [nasde.plugin] native channel (ADR-012)
+# ---------------------------------------------------------------------------
+#
+# For Codex/Gemini these dirs flow into agent["skills"] (Harbor native), NOT
+# into /app/.claude/skills sandbox_files. For Claude they are dropped here
+# (they already ride sandbox_files via stage_referenced_skills/register).
+
+
+def _standalone_skill(parent: Path, name: str) -> Path:
+    skill = parent / name
+    skill.mkdir(parents=True)
+    (skill / "SKILL.md").write_text(f"---\nname: {name}\n---\nbody")
+    return skill
+
+
+def test_extra_skill_dirs_go_to_native_skills_for_codex(tmp_path: Path) -> None:
+    variant_dir = _codex_variant(tmp_path / "variants" / "v")
+    referenced = _standalone_skill(tmp_path / "src", "tactical-ddd")
+    harbor_path = variant_dir / "harbor_config.json"
+    _generate_harbor_config(variant_dir, "v")
+
+    _ensure_harbor_config(variant_dir, "v", {}, [str(referenced)])
+
+    assert _skills(harbor_path) == [str(referenced)]
+
+
+def test_extra_skill_dirs_dropped_for_claude(tmp_path: Path) -> None:
+    """Claude's [[skill]]/plugin skills ride sandbox_files (stage_skill_dir),
+    so the native channel must stay empty — feeding it here would double-inject."""
+    variant_dir = _bare_variant(tmp_path / "variants" / "v")
+    referenced = _standalone_skill(tmp_path / "src", "tactical-ddd")
+    harbor_path = variant_dir / "harbor_config.json"
+    _generate_harbor_config(variant_dir, "v")
+
+    _ensure_harbor_config(variant_dir, "v", {}, [str(referenced)])
+
+    assert _skills(harbor_path) == []
+
+
+def test_extra_skill_dirs_union_with_snapshot(tmp_path: Path) -> None:
+    """snapshot (agents_skills/) ∪ [[skill]]/plugin (extra_skill_dirs) — both
+    land in agent.skills, tracked by one _nasde_derived_skills marker."""
+    variant_dir = _codex_variant(tmp_path / "variants" / "v")
+    snapshot = _make_native_skill(variant_dir, "agents_skills", "snap-skill")
+    referenced = _standalone_skill(tmp_path / "src", "ref-skill")
+    harbor_path = variant_dir / "harbor_config.json"
+    _generate_harbor_config(variant_dir, "v")
+
+    _ensure_harbor_config(variant_dir, "v", {}, [str(referenced)])
+
+    assert set(_skills(harbor_path)) == {str(snapshot.resolve()), str(referenced)}
+
+
+def test_extra_skill_dirs_stale_drop_between_runs(tmp_path: Path) -> None:
+    """A [[skill]] removed between runs drops from agent.skills next run."""
+    variant_dir = _codex_variant(tmp_path / "variants" / "v")
+    referenced = _standalone_skill(tmp_path / "src", "tactical-ddd")
+    harbor_path = variant_dir / "harbor_config.json"
+    _generate_harbor_config(variant_dir, "v")
+
+    _ensure_harbor_config(variant_dir, "v", {}, [str(referenced)])
+    assert _skills(harbor_path) == [str(referenced)]
+
+    _ensure_harbor_config(variant_dir, "v", {}, [])
+    assert _skills(harbor_path) == []
+
+
+def test_extra_skill_dirs_preserves_handwritten(tmp_path: Path) -> None:
+    variant_dir = _codex_variant(tmp_path / "variants" / "v")
+    referenced = _standalone_skill(tmp_path / "src", "tactical-ddd")
+    harbor_path = variant_dir / "harbor_config.json"
+    _generate_harbor_config(variant_dir, "v")
+    config = json.loads(harbor_path.read_text())
+    config["agents"][0]["skills"] = ["/host/hand/authored-skill"]
+    harbor_path.write_text(json.dumps(config))
+
+    _ensure_harbor_config(variant_dir, "v", {}, [str(referenced)])
+
+    skills = _skills(harbor_path)
+    assert str(referenced) in skills
+    assert "/host/hand/authored-skill" in skills
+
+
+def test_extra_skill_dirs_warns_on_basename_collision(tmp_path: Path, capsys: pytest.CaptureFixture[str]) -> None:
+    """snapshot agents_skills/foo and referenced .../foo share a basename —
+    Harbor registers only the last (last-wins). Warn so the loser isn't lost."""
+    variant_dir = _codex_variant(tmp_path / "variants" / "v")
+    _make_native_skill(variant_dir, "agents_skills", "tactical-ddd")
+    collide = _standalone_skill(tmp_path / "elsewhere", "tactical-ddd")
+    _generate_harbor_config(variant_dir, "v")
+
+    _ensure_harbor_config(variant_dir, "v", {}, [str(collide)])
+
+    out = " ".join(capsys.readouterr().out.split())
+    assert "share the basename" in out
+    assert "tactical-ddd" in out
+
+
+def test_extra_skill_dirs_warns_on_malformed_frontmatter(tmp_path: Path, capsys: pytest.CaptureFixture[str]) -> None:
+    variant_dir = _codex_variant(tmp_path / "variants" / "v")
+    bad = tmp_path / "src" / "tactical-ddd"
+    bad.mkdir(parents=True)
+    (bad / "SKILL.md").write_text("<!-- Source: x -->\n---\nname: tactical-ddd\n---\nbody")
+    _generate_harbor_config(variant_dir, "v")
+
+    _ensure_harbor_config(variant_dir, "v", {}, [str(bad)])
+
+    assert "missing YAML frontmatter" in " ".join(capsys.readouterr().out.split())
+
+
+# ---------------------------------------------------------------------------
+# [nasde.plugin] skills — integration through the REAL ensure_task_plugin (ADR-012)
+# ---------------------------------------------------------------------------
+#
+# Unlike the [[skill]] e2e (verified on a live agent), there is no example with
+# a [nasde.plugin], so these tests drive the real ensure_task_plugin → staging
+# → collect_plugin_skill_dirs → harbor_config chain (no Docker) to prove a
+# plugin's own skills route natively for codex/gemini and to sandbox for claude.
+
+
+def _make_local_plugin(root: Path, plugin_name: str, skill_name: str) -> Path:
+    plugin = root / plugin_name
+    (plugin / ".claude-plugin").mkdir(parents=True)
+    (plugin / ".claude-plugin" / "plugin.json").write_text(json.dumps({"name": plugin_name}))
+    skill = plugin / "skills" / skill_name
+    skill.mkdir(parents=True)
+    (skill / "SKILL.md").write_text(f"---\nname: {skill_name}\n---\nbody")
+    return plugin
+
+
+def _project_with_plugin(tmp_path: Path, plugin: Path) -> ProjectConfig:
+    task_path = tmp_path / "tasks" / "t"
+    (task_path / "environment").mkdir(parents=True)
+    (task_path / "environment" / "Dockerfile").write_text("FROM ubuntu:22.04\n")
+    (task_path / "task.toml").write_text('version = "1.0"\n\n[task]\nname = "p/t"\n')
+    task = TaskConfig(
+        name="t",
+        path=task_path,
+        plugin=PluginConfig(path=str(plugin)),
+    )
+    return ProjectConfig(name="p", project_dir=tmp_path, tasks=[task], docker=DockerConfig())
+
+
+def test_plugin_skills_route_native_for_codex(tmp_path: Path) -> None:
+    """A [nasde.plugin]'s own skills/ must reach a Codex agent through the
+    native config.agent.skills (resolved via the REAL ensure_task_plugin
+    staging). The Claude sandbox map is still built unconditionally, but a
+    Codex agent consumes the native skills list, not that map."""
+    plugin = _make_local_plugin(tmp_path / "plugins", "my-plugin", "plug-skill")
+    config = _project_with_plugin(tmp_path, plugin)
+    variant_dir = _codex_variant(tmp_path / "variants" / "v")
+    harbor_path = variant_dir / "harbor_config.json"
+    _generate_harbor_config(variant_dir, "v")
+
+    extra_sandbox, extra_skill_dirs = _prepare_task_environments(config, variant_dir)
+    _ensure_harbor_config(variant_dir, "v", extra_sandbox, extra_skill_dirs)
+
+    skills = _skills(harbor_path)
+    assert any(Path(s).name == "plug-skill" for s in skills)
+    assert skills == [str(Path(s).resolve()) for s in skills]
+
+
+def test_plugin_skills_route_sandbox_for_claude(tmp_path: Path) -> None:
+    """The same plugin on a Claude variant rides sandbox_files to
+    /app/.claude/skills (unchanged), and the native skills list stays empty."""
+    plugin = _make_local_plugin(tmp_path / "plugins", "my-plugin", "plug-skill")
+    config = _project_with_plugin(tmp_path, plugin)
+    variant_dir = _bare_variant(tmp_path / "variants" / "v")
+    harbor_path = variant_dir / "harbor_config.json"
+    _generate_harbor_config(variant_dir, "v")
+
+    extra_sandbox, extra_skill_dirs = _prepare_task_environments(config, variant_dir)
+    _ensure_harbor_config(variant_dir, "v", extra_sandbox, extra_skill_dirs)
+
+    assert "/app/.claude/skills/plug-skill/SKILL.md" in _sandbox(harbor_path)
+    assert _skills(harbor_path) == []