feat(pricing): convention-based layered price override (ADR-013)#71
Merged
Conversation
Model prices are now overridable post-install without hacking the wheel. A file literally named pricing.toml at a known location is auto-detected and merged onto the bundled catalog — no [pricing] config key: <project>/pricing.toml > ~/.nasde/pricing.toml > bundled Per-model whole-entry merge (higher wins): an override lists only the models it changes/adds; the rest fall through. User layer is a HOME dotfolder (~/.nasde/, like ~/.claude/~/.codex/~/.gemini), deliberately NOT platformdirs (maps to ~/Library/Application Support on macOS = app state, not user config). An applied layer prints a dim transparency line. Both write paths thread project_dir so run (assessment_summary.json) and export (metrics.json) agree on cost — the ADR-011 single-extractor invariant. New load_pricing_layered(project_dir) in pricing.py; wired through evaluator/runner, results_exporter/cli, calibration_publisher, and eval_migration. Bundled lru_cache and load_pricing(path) unchanged; merged catalog is not cached (cheap per-job re-read). 14 new tests incl. a three-layer compose case with user↔project overlap (project wins) and run+export e2e. ADR-013 + docs (token-cost.md, configuration.md) + CLAUDE.md. 411 tests, ruff, mypy green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI runs `ruff format --check` in addition to `ruff check`; the new three-layer test had a long line the formatter wraps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- bug_001 (normal): migrate-evals dropped project_dir, so its assessment_summary.json cost used bundled+user only, disagreeing with the other three write paths (ADR-013 invariant). Thread project_dir through cli → migrate_job_evals → migrate_trial_evals → load_pricing_layered. - bug_006 (normal): switching the exporter to load_pricing_layered made pre-existing tests read the developer's real ~/.nasde/pricing.toml. Move empty_user_layer to an autouse fixture in tests/conftest.py so the whole suite is hermetic by default; layered tests opt in by name. - bug_004 (nit): calibration publish re-read layered pricing per-trial (N×L transparency lines). Hoist load_pricing_layered above the loop, thread pricing through _publish_one_trial → _open_pr_for_trial. - bug_003 (nit): move the orphaned cost_efficiency/token_efficiency hasattr guards back into test_assessment_summary_includes_economics. +1 regression test (migrate-evals threads project pricing). 412 tests, ruff check + format, mypy green. Verified with a HOME-override hermeticity check and a migrate-evals project-override smoke. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ort)
Layered overrides (ADR-013) had no way to inspect the merged result. Add
layer provenance and three read surfaces:
- pricing.py: resolve_pricing_layers(project_dir) → ordered PricingLayer
stack; effective_pricing_with_source(project_dir) → {model: (price, layer)}.
load_pricing_layered reimplemented on the same core (DRY, unchanged behaviour
+ transparency line). load_pricing(path) and bundled lru_cache untouched.
- nasde pricing show [--show-source]: new `pricing` sub-app printing the
effective catalog (Model / In / Out / as_of, +Layer with --show-source).
Sub-app leaves room for future pricing validate/path.
- nasde run: "Pricing used (effective)" table at the end of the summary,
filtered to the models actually in the run, with source layer.
- results-export: pricing_used.json next to the trials — effective rate +
source layer per priced model, so a report is a self-contained cost audit.
- pricing_report.py: shared Rich table renderer (show + run).
Docs: token-cost.md (verifying the catalog), CLAUDE.md CLI reference +
architecture note, ADR-013 provenance note. Website builds clean.
423 tests, ruff check + format, mypy green; smoke-verified pricing show
and pricing_used.json on a real trial.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code-review follow-up (two findings from the /code-review pass): - ModelPrice is now @DataClass(frozen=True). It was a mutable dataclass shared via the bundled lru_cache (resolve_pricing_layers does dict(load_pricing()) — a shallow copy sharing the ModelPrice objects), so an in-place field mutation would silently corrupt every later lookup in the process. A rate is an immutable fact; freezing turns a latent cache-corruption footgun into an immediate FrozenInstanceError at the mutation site. No code mutates ModelPrice fields (verified), so this is safe; build a new instance via dataclasses.replace to adjust a rate. - pricing_report._fmt_rate no longer uses ${rate:g}, which renders scientific notation at the extremes ($1e+06, $5e-05). It now trims a fixed 4-decimal format, giving $3 / $2.5 in the normal range and $0.0001 / $1000000 at the edges. (.2f was rejected — it drops sub-cent cached rates to $0.00.) +2 regression tests (frozen guard, _fmt_rate edge table). 433 tests, ruff check + format, mypy green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Final code-review nits (#1/#2/#3): - _print_pricing_used now derives the used-models set from the economics `rows` it is handed, instead of re-walking every trial dir and re-parsing every assessment_summary.json a second time in the same _print_job_summary call (the rows already carry model_name). _models_used_in_job is deleted. Removes a redundant per-trial I/O pass and the caller-after-callee ordering. - _finalize_economics_row now exposes "model" (already destructured) so the pricing-used table can read it without a second source. - _write_pricing_used: replaced the walrus + `for price, layer in [entry]` dict-comprehension with a plain readable loop. 432 tests, ruff check + format, mypy green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aveat) Improve the experience of authoring a pricing.toml override: - Malformed override files now fail fast with a clear Rich message naming the file (so you know project vs ~/.nasde layer), the cause, and a hint — instead of a raw TOMLDecodeError/KeyError traceback. A decimal comma (2,5) and a missing input_per_1m/output_per_1m are the common cases. _load_override_models wraps the per-layer load; SystemExit(1), no crash. - nasde init now scaffolds a fully-commented pricing.toml.example (a real bundled model name to copy, the decimal-point hint, and the model-name caveat). Named .example so it's inert until copied to pricing.toml. - token-cost.md: a :::caution that the model name MUST match variant.toml's `model` or the override is SILENTLY ignored — verify with `pricing show --show-source` (model under `bundled` not `project` = typo). Plus a note that malformed files fail loudly. Out of scope (deliberately later): `nasde pricing validate` (check all entries + flag unknown model names up front) and `nasde pricing set` (add a single override via CLI). 436 tests, ruff check + format, mypy, website build green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrite the "Verifying the effective catalog" section of token-cost.md to include actual command output: a --show-source table for a working override, a side-by-side example of the silent model-name-typo failure (the real model stays `bundled` while the typo'd key sits as a dead `project` row), the loud malformed-file errors (decimal comma, missing field), and a sample pricing_used.json. Theory alone didn't make the silent-miss case obvious. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Unreleased section was missing everything merged after the ADR-010/011/012 batch. Added entries for: layered pricing override + visibility (#71, ADR-013), results-export + repeated-evaluation accumulation (#57), the NASDE→Nasde rebrand and Starlight docs migration (#64/#69), the parallel-run job_dir race fix (#62), and the June-2026 CVE dependency pins (#70, new ### Security section). All [#NN] references now have link targets. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Unreleased section drifted 5 PRs behind before v0.5.0 even though the nasde-dev skill already mentioned updating it — the rule was buried mid-list. Surface it: a new "Development workflow" section in CLAUDE.md (which had no release guidance), and a prominent "Definition of done — CHANGELOG first" callout at the top of the skill's doc-consistency step. A user-visible change is not done until it has an [Unreleased] entry with a [#NN] ref, in the same PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the last blocker before
v0.5.0. Model prices are now overridable post-install without hacking the wheel — the core need behind the cost-migration product (rates change faster than releases; users have enterprise/Azure/Bedrock rates).What
Drop a file literally named
pricing.tomlat a known location and it's auto-detected and merged onto the bundled catalog. Convention, not config — no[pricing]key, mirroringassessment_dimensions.json.Precedence (higher wins, per-model whole-entry merge):
<project>/pricing.toml(next tonasde.toml)~/.nasde/pricing.toml(user-wide)pricing.toml(the floor)An override lists only the models it changes/adds; the rest fall through. Applied layers print a dim transparency line.
Why
~/.nasde/not platformdirsplatformdirs.user_config_dirmaps to~/Library/Application Supporton macOS — where Electron app-state (cookies/cache) lives, not a file a human edits. Every agent CLI the user works with keeps user config in a HOME dotfolder (~/.claude,~/.codex,~/.gemini). So config →~/.nasde/; platformdirs stays for cache (update_check.py).Both write paths
build_trial_economicsis the single extractor feedingassessment_summary.json(run) andmetrics.json(export). Both now threadproject_dir, so a trial reports the same cost in the run summary and a later export — the ADR-011 invariant.Implementation
pricing.py: newload_pricing_layered(project_dir); bundledlru_cache+load_pricing(path)untouched; merged catalog not cached (cheap per-job re-read).evaluator.py/runner.py,results_exporter.py/cli.py,calibration_publisher.py,eval_migration.py.uv.lockunchanged).Tests (14 new)
export_results) pick up the overrideVerified manually
nasde results-exporton a synthetic trial with both user + projectpricing.toml→ two transparency lines,cost_usd= the project rate (beating user, beating an unpriced bundled). Full CLI plumbing confirmed.Docs
ADR-013,
websitetoken-cost.md (shipped, replacing "planned improvement") + configuration.md, CLAUDE.md.🤖 Generated with Claude Code