feat(pricing): convention-based layered price override (ADR-013) by szjanikowski · Pull Request #71 · NoesisVision/nasde-toolkit

szjanikowski · 2026-06-22T14:00:58Z

Closes the last blocker before v0.5.0. Model prices are now overridable post-install without hacking the wheel — the core need behind the cost-migration product (rates change faster than releases; users have enterprise/Azure/Bedrock rates).

What

Drop a file literally named pricing.toml at a known location and it's auto-detected and merged onto the bundled catalog. Convention, not config — no [pricing] key, mirroring assessment_dimensions.json.

Precedence (higher wins, per-model whole-entry merge):

<project>/pricing.toml (next to nasde.toml)
~/.nasde/pricing.toml (user-wide)
bundled pricing.toml (the floor)

An override lists only the models it changes/adds; the rest fall through. Applied layers print a dim transparency line.

Why `~/.nasde/` not platformdirs

platformdirs.user_config_dir maps to ~/Library/Application Support on macOS — where Electron app-state (cookies/cache) lives, not a file a human edits. Every agent CLI the user works with keeps user config in a HOME dotfolder (~/.claude, ~/.codex, ~/.gemini). So config → ~/.nasde/; platformdirs stays for cache (update_check.py).

Both write paths

build_trial_economics is the single extractor feeding assessment_summary.json (run) and metrics.json (export). Both now thread project_dir, so a trial reports the same cost in the run summary and a later export — the ADR-011 invariant.

Implementation

pricing.py: new load_pricing_layered(project_dir); bundled lru_cache + load_pricing(path) untouched; merged catalog not cached (cheap per-job re-read).
Wired through evaluator.py/runner.py, results_exporter.py/cli.py, calibration_publisher.py, eval_migration.py.
Zero new dependencies (uv.lock unchanged).

Tests (14 new)

7 unit (precedence, partial override, new model, missing-file fallback, whole-entry replacement)
3-layer compose with user↔project overlap → project wins; whole-entry on the overlap
run + export e2e (evaluator spine and export_results) pick up the override
All 411 tests, ruff, mypy green.

Verified manually

nasde results-export on a synthetic trial with both user + project pricing.toml → two transparency lines, cost_usd = the project rate (beating user, beating an unpriced bundled). Full CLI plumbing confirmed.

Docs

ADR-013, website token-cost.md (shipped, replacing "planned improvement") + configuration.md, CLAUDE.md.

🤖 Generated with Claude Code

Model prices are now overridable post-install without hacking the wheel. A file literally named pricing.toml at a known location is auto-detected and merged onto the bundled catalog — no [pricing] config key: <project>/pricing.toml > ~/.nasde/pricing.toml > bundled Per-model whole-entry merge (higher wins): an override lists only the models it changes/adds; the rest fall through. User layer is a HOME dotfolder (~/.nasde/, like ~/.claude/~/.codex/~/.gemini), deliberately NOT platformdirs (maps to ~/Library/Application Support on macOS = app state, not user config). An applied layer prints a dim transparency line. Both write paths thread project_dir so run (assessment_summary.json) and export (metrics.json) agree on cost — the ADR-011 single-extractor invariant. New load_pricing_layered(project_dir) in pricing.py; wired through evaluator/runner, results_exporter/cli, calibration_publisher, and eval_migration. Bundled lru_cache and load_pricing(path) unchanged; merged catalog is not cached (cheap per-job re-read). 14 new tests incl. a three-layer compose case with user↔project overlap (project wins) and run+export e2e. ADR-013 + docs (token-cost.md, configuration.md) + CLAUDE.md. 411 tests, ruff, mypy green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

CI runs `ruff format --check` in addition to `ruff check`; the new three-layer test had a long line the formatter wraps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- bug_001 (normal): migrate-evals dropped project_dir, so its assessment_summary.json cost used bundled+user only, disagreeing with the other three write paths (ADR-013 invariant). Thread project_dir through cli → migrate_job_evals → migrate_trial_evals → load_pricing_layered. - bug_006 (normal): switching the exporter to load_pricing_layered made pre-existing tests read the developer's real ~/.nasde/pricing.toml. Move empty_user_layer to an autouse fixture in tests/conftest.py so the whole suite is hermetic by default; layered tests opt in by name. - bug_004 (nit): calibration publish re-read layered pricing per-trial (N×L transparency lines). Hoist load_pricing_layered above the loop, thread pricing through _publish_one_trial → _open_pr_for_trial. - bug_003 (nit): move the orphaned cost_efficiency/token_efficiency hasattr guards back into test_assessment_summary_includes_economics. +1 regression test (migrate-evals threads project pricing). 412 tests, ruff check + format, mypy green. Verified with a HOME-override hermeticity check and a migrate-evals project-override smoke. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ort) Layered overrides (ADR-013) had no way to inspect the merged result. Add layer provenance and three read surfaces: - pricing.py: resolve_pricing_layers(project_dir) → ordered PricingLayer stack; effective_pricing_with_source(project_dir) → {model: (price, layer)}. load_pricing_layered reimplemented on the same core (DRY, unchanged behaviour + transparency line). load_pricing(path) and bundled lru_cache untouched. - nasde pricing show [--show-source]: new `pricing` sub-app printing the effective catalog (Model / In / Out / as_of, +Layer with --show-source). Sub-app leaves room for future pricing validate/path. - nasde run: "Pricing used (effective)" table at the end of the summary, filtered to the models actually in the run, with source layer. - results-export: pricing_used.json next to the trials — effective rate + source layer per priced model, so a report is a self-contained cost audit. - pricing_report.py: shared Rich table renderer (show + run). Docs: token-cost.md (verifying the catalog), CLAUDE.md CLI reference + architecture note, ADR-013 provenance note. Website builds clean. 423 tests, ruff check + format, mypy green; smoke-verified pricing show and pricing_used.json on a real trial. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Code-review follow-up (two findings from the /code-review pass): - ModelPrice is now @DataClass(frozen=True). It was a mutable dataclass shared via the bundled lru_cache (resolve_pricing_layers does dict(load_pricing()) — a shallow copy sharing the ModelPrice objects), so an in-place field mutation would silently corrupt every later lookup in the process. A rate is an immutable fact; freezing turns a latent cache-corruption footgun into an immediate FrozenInstanceError at the mutation site. No code mutates ModelPrice fields (verified), so this is safe; build a new instance via dataclasses.replace to adjust a rate. - pricing_report._fmt_rate no longer uses ${rate:g}, which renders scientific notation at the extremes ($1e+06, $5e-05). It now trims a fixed 4-decimal format, giving $3 / $2.5 in the normal range and $0.0001 / $1000000 at the edges. (.2f was rejected — it drops sub-cent cached rates to $0.00.) +2 regression tests (frozen guard, _fmt_rate edge table). 433 tests, ruff check + format, mypy green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Final code-review nits (#1/#2/#3): - _print_pricing_used now derives the used-models set from the economics `rows` it is handed, instead of re-walking every trial dir and re-parsing every assessment_summary.json a second time in the same _print_job_summary call (the rows already carry model_name). _models_used_in_job is deleted. Removes a redundant per-trial I/O pass and the caller-after-callee ordering. - _finalize_economics_row now exposes "model" (already destructured) so the pricing-used table can read it without a second source. - _write_pricing_used: replaced the walrus + `for price, layer in [entry]` dict-comprehension with a plain readable loop. 432 tests, ruff check + format, mypy green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…aveat) Improve the experience of authoring a pricing.toml override: - Malformed override files now fail fast with a clear Rich message naming the file (so you know project vs ~/.nasde layer), the cause, and a hint — instead of a raw TOMLDecodeError/KeyError traceback. A decimal comma (2,5) and a missing input_per_1m/output_per_1m are the common cases. _load_override_models wraps the per-layer load; SystemExit(1), no crash. - nasde init now scaffolds a fully-commented pricing.toml.example (a real bundled model name to copy, the decimal-point hint, and the model-name caveat). Named .example so it's inert until copied to pricing.toml. - token-cost.md: a :::caution that the model name MUST match variant.toml's `model` or the override is SILENTLY ignored — verify with `pricing show --show-source` (model under `bundled` not `project` = typo). Plus a note that malformed files fail loudly. Out of scope (deliberately later): `nasde pricing validate` (check all entries + flag unknown model names up front) and `nasde pricing set` (add a single override via CLI). 436 tests, ruff check + format, mypy, website build green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Rewrite the "Verifying the effective catalog" section of token-cost.md to include actual command output: a --show-source table for a working override, a side-by-side example of the silent model-name-typo failure (the real model stays `bundled` while the typo'd key sits as a dead `project` row), the loud malformed-file errors (decimal comma, missing field), and a sample pricing_used.json. Theory alone didn't make the silent-miss case obvious. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The Unreleased section was missing everything merged after the ADR-010/011/012 batch. Added entries for: layered pricing override + visibility (#71, ADR-013), results-export + repeated-evaluation accumulation (#57), the NASDE→Nasde rebrand and Starlight docs migration (#64/#69), the parallel-run job_dir race fix (#62), and the June-2026 CVE dependency pins (#70, new ### Security section). All [#NN] references now have link targets. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The Unreleased section drifted 5 PRs behind before v0.5.0 even though the nasde-dev skill already mentioned updating it — the rule was buried mid-list. Surface it: a new "Development workflow" section in CLAUDE.md (which had no release guidance), and a prominent "Definition of done — CHANGELOG first" callout at the top of the skill's doc-consistency step. A user-visible change is not done until it has an [Unreleased] entry with a [#NN] ref, in the same PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Szymon Janikowski and others added 10 commits June 22, 2026 16:00

style: ruff format test_pricing.py

bcc9639

CI runs `ruff format --check` in addition to `ruff check`; the new three-layer test had a long line the formatter wraps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

szjanikowski merged commit 4d96bf0 into main Jun 24, 2026
9 checks passed

szjanikowski deleted the feat/layered-pricing-override branch June 24, 2026 11:54

szjanikowski mentioned this pull request Jun 24, 2026

chore: release v0.5.0 #72

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pricing): convention-based layered price override (ADR-013)#71

feat(pricing): convention-based layered price override (ADR-013)#71
szjanikowski merged 10 commits into
mainfrom
feat/layered-pricing-override

szjanikowski commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

szjanikowski commented Jun 22, 2026

What

Why ~/.nasde/ not platformdirs

Both write paths

Implementation

Tests (14 new)

Verified manually

Docs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why `~/.nasde/` not platformdirs