docs: migrate README into Starlight docs site (website/) on GitHub Pages#64
Merged
Conversation
Split the 877-line README into a searchable Starlight (Astro) docs site living in website/, deployed to GitHub Pages via a path-filtered workflow. - Starlight 0.39 + Astro 6, base /nasde-toolkit/, Pagefind search (built-in), llms.txt (starlight-llms-txt), mermaid (astro-mermaid), and starlight-links-validator as a build-time broken-link gate. - 25 content pages across Getting Started / Concepts / Reference / Guides, migrated faithfully from README sections + docs/use-cases.md + docs/benchmark-results.md. ADR/ARCHITECTURE/RELEASING stay internal in the repo and are linked out to GitHub. - Slimmed 877 → 108 lines: branding, four-steps, quick-start, skills table, prominent link to the docs site. Everything else lives only on the site (single source of truth). docs/use-cases.md + docs/benchmark-results.md become pointer stubs. pyproject Documentation URL → live site. - .github/workflows/docs-deploy.yml: path-filtered (website/**) push-to-main Pages deploy, Node 20, npm ci. NOT a required check (avoids PR-wedge). - .gitignore: website/node_modules, website/dist, website/.astro. - Logo resized 4.99MB/2816px → 505KB/800px (sips), reused in site + README. Validation: website build green (all internal links valid, 26 pages, mermaid renders, llms.txt generated). Python CI intact — ruff/mypy/pytest (369) all pass; tooling is scoped to src/+tests/ and never touches website/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nding
UX/polish pass on the Starlight site after live review:
- **Logo**: replace the broken default favicon-as-logo with the Noesis logo
(src/assets/noesis-logo.png), shown alongside the "NASDE Toolkit" title
(no more replacesTitle, no "NASDE Toolkit | NASDE Toolkit" dup).
- **Brand theme**: src/styles/custom.css sets a teal/cyan accent from the
Noesis logo, consistent with noesis-docs, replacing Starlight's default
purple. Works in light + dark.
- **Top-level nav**: HeaderNav.astro overrides SocialIcons to add
"Documentation" + "Changelog" header links (the Docusaurus-style navbar
users expect), with active-state highlighting.
- **Landing reworked**: the splash now reads as documentation, not a product
page — hero tagline says "the official documentation", logo as hero image,
Get Started / What is NASDE? / GitHub actions, and pictogram LinkCards for
Getting Started / Concepts / Reference / Guides. The "what is NASDE / four
steps" product blurb lives in the docs (getting-started/overview), not on
the landing.
Search verified working — "calibration" returns "Calibrating the Rubric"
first; the earlier "call" result was a typo ("callibr"), not a config issue
(Pagefind has no fuzzy typo matching).
Verified live in Chrome (light + dark): logo, teal accent, nav links,
landing cards, sidebar active state, and the mermaid pipeline diagram all
render. Build green, all internal links valid, 26 pages.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e hero contrast
Follow-up polish from live review:
- **Branding split**: NASDE product logo (nasde-toolkit-logo.png) as the hero
image on the landing; Noesis Vision logo in the navbar and as the favicon
(public/favicon.png). Product on the page, company in the chrome.
- **Title**: "Nasde Toolkit Docs" (normal case, not all-caps NASDE) in the
site title, hero, and "What is Nasde?" action.
- **Fix light-mode hero invisible title**: custom.css overrode
--sl-color-white globally but didn't set it for the light theme, so the
hero <h1> rendered white-on-white. Light theme now sets --sl-color-white
to the dark heading color (Starlight's inverted-scale convention).
Verified: hero title rgb(24,26,31) on white (light), rgb(255,255,255) on
dark; deep-page body text contrast intact in both.
- **Hero logo card**: rounded corners + soft card background so the
light-background NASDE artwork sits cleanly in dark mode.
Search note: Pagefind has no typo tolerance by design ("callibrate" won't
match "calibration") — accepted as the standard static-search trade-off;
correct spelling returns the right page first.
Verified live in Chrome, light + dark.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d TOC Live review showed the imbalance vs noesis-docs: too many thin pages made the left sidebar long (~23 items) while the right "On this page" TOC was often empty (pages were continuous prose with no ## sections). noesis-docs does the opposite — few dense pages, each with a rich TOC. This rebalances to match. Approved "B-corrected" structure — 13 pages, each with a coherent flow: - Getting Started (2): Overview · Quick Start (← Prerequisites + Installation + Quick Start) - Concepts (4): How It Works (← Scoring + Pipeline, de-duplicated into one staged narrative) · A Real Task · Token & Cost · Calibrating the Rubric - Reference (3): CLI Reference (← Cheatsheet + Commands) · Configuration (← Project Structure + variant.toml + task.toml) · Authentication & Opik (← Authentication + Verifying Opik) - Guides (4): Running & Configuring Runs (← Local Repo + Cloud + Reviewer + Exporting) · Plugins & Skills (← Plugin + Skill-by-ref + Scoping) · Use Cases · Benchmark Results Every page now has ## sections, so the right-hand TOC fills on all of them (Token & Cost and Calibration also got headings added). All internal links and anchors rewritten to the merged slugs. Verified live in Chrome: short sidebar + populated TOC on calibration and how-it-works; mermaid renders in How It Works. Build green, all internal links valid, 13 doc pages. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ymmetry, glossary, boundaries Information-architecture review (from a "never heard of NASDE" perspective, one subagent pass + first-time-user walkthrough) surfaced gaps beyond the earlier polish. This addresses them and finalizes the skeleton. ## Structural fixes - **Agent ↔ reviewer symmetry (the big one)**: running-benchmarks.md now has a "Configuring the agent under test" section (instructions / skills / MCP / reasoning effort / scoping) mirroring the reviewer section, with a tip box spelling out the parallel. Configuring the agent under test was previously scattered across three pages. - **General → specific**: Overview opens with "Why NASDE? — the problem" (the hook) and adds "What NASDE is — and is not" (boundaries: not a CI replacement, not production, needs Docker/subscription) before the mechanics. ## New pages (5) - getting-started/reading-results — the run summary table, jobs/ layout, interpreting assessment_summary.json, agent-noise vs judge-noise. - concepts/key-terms — glossary (variant, trial, job, rubric, dimension, reviewer, trajectory, Harbor, Opik, ...). - creating-benchmarks/ (NEW GROUP): anatomy (what a benchmark is made of + task-files mermaid) and assessment-criteria (how to write dimensions + score ladders — the core value). - guides/troubleshooting — Docker/auth/OOM(137)/flaky-eval/rate-limit + what-to-expect (time, cost, trials) + FAQ. ## Images / diagrams - How It Works: per-dimension radar (existing asset) at the scoring section. - Configuration + Anatomy: "what each task file does" mermaid. - Calibration: mermaid loop flowchart (measure→diagnose→fix→re-measure). - Token & Cost: section retitled "Quality vs. cost: the Pareto frontier", value-led rewrite + image placeholder (pareto.png — TODO, user generates). - Reading Your Results: placeholder for a real `nasde run` screenshot (TODO). ## Reference - Configuration: "Quick reference: configuring a variant" checklist at top. Sidebar: 18 pages across 5 groups (Getting Started · Concepts · Creating Benchmarks · Reference · Guides). Build green, all internal links valid, 20 pages, 4 mermaid diagrams render. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…CLI images
Accuracy pass after a fact-check (the troubleshooting/results content was
partly written from project memory; this verifies it against the code and
corrects the parts that conflated two distinct knobs).
- **Fix attempts vs eval-repetitions conflation** (troubleshooting + reading-
results): `--attempts`/`-n` is the number of independent AGENT runs per task
(the source of the between-trial `mean ±std`); `--eval-repetitions` is the
number of REVIEWER passes per trial (judge noise). These were blurred into
"trials/repetitions" and "run × repetitions". Now stated precisely, with the
two noise sources mapped to where each appears. (Verified against cli.py:
--attempts/-n and --eval-repetitions are separate flags; timeout_sec 1800,
memory_mb 4096 confirmed in CLAUDE.md/scaffold.)
- **Real images replace placeholders**:
- concepts/token-cost.md — the actual quality-vs-cost / quality-vs-tokens
Pareto chart from a real skill×model matrix, with a caption explaining the
shared cost panel + per-provider token panels.
- getting-started/reading-results.md — the real `nasde run` startup banner;
the section is reworded so the prose matches what the screenshot actually
shows (the config banner) vs. the end-of-run summary table (described in
text, since that's a separate screen).
Build green, all internal links valid, images optimized to webp.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… styling Systematic technical-accuracy audit of every page against the source code (cli.py, config.py, runner/evaluator/results_exporter, scaffold). Fixes all findings; the core claims (effort priority, rubric fingerprint, cost formula, eval clustering, CLI set, scaffold defaults) audited as correct. WRONG (a user would hit these): - reading-results: trial dir is `verifier/` (reward.txt + test-stdout.txt), not `logs/verifier/` (that's the in-container path test.sh writes to). - configuration + running-benchmarks: `[nasde.source]` shown as JSON; it's a TOML table in task.toml. Pasting the JSON would break the file. - use-cases: the per-task config file is `task.toml`, not `task.json` (×3). IMPRECISE: - cli-reference: run options table was missing --all-variants, --attempts/-n, --eval-repetitions, --max-concurrent-eval, --job-suffix. Added. - authentication: Codex auth checks file *presence* (sets CODEX_FORCE_AUTH_JSON), not `auth_mode:"chatgpt"`. Gemini Vertex env vars corrected to the ones the code actually checks (GOOGLE_API_KEY / GOOGLE_APPLICATION_CREDENTIALS, not GOOGLE_CLOUD_PROJECT). Opik feedback-score list completed (_std, eval_n). - configuration: nasde.toml sample now shows eval_repetitions + project_name. - calibration: .calibration/ list now includes assessment_summary.json. Pricing honesty: - token-cost: softened "edit pricing.toml" — the catalog is bundled, so after a PyPI install editing it is impractical (wiped on upgrade); editing works from a source checkout, and a per-project/user override is flagged as planned. (load_pricing() is called with no path at both call-sites, so no override is wired today — tracked as a follow-up.) Styling: - Inline `code` in headings scaled to 0.85em (was oversized). - Previous/Next pager link titles toned down to h5. Build green, all internal links valid, verified live (light + dark). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Previous/Next pager titles (.link-title) were still ~24px — louder than the page's own text. Target .link-title specifically (the earlier rule hit the wrong span) and set it to body size (--sl-text-base, weight 600), so the pager sits quietly under its Previous/Next labels. Verified live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The "On this page" panel read smaller than the left sidebar: its heading was 13px (below its own 14px links) and the rows were tight. Bump the starlight-toc heading to --sl-text-sm and relax the link rows (line-height + padding) so the TOC is as legible as the left nav. Verified live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The header nav links were --sl-text-sm, reading small next to the rest of the navbar. Bump to --sl-text-base so they sit as first-class navigation. Verified live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…guage Two sections read like expert shorthand — dense, jargon-heavy, assuming the reader already knows the two noise sources and prompt-cache mechanics. Rewritten for a newcomer who's never used NASDE. - "A mean is never reported bare" → "Why scores come with a ± (and why that matters)". Opens with the concrete question (is A really better than B, or did it get lucky?), explains "wobble" before using it, splits the two noise sources into bolded bullets, drops the dense parenthetical jargon. - "How cost is computed" → leads with what it means for the user (consistent, comparable cost), explains prompt caching in plain terms and why ignoring it keeps comparisons fair — instead of stating the billing rule tersely. No factual changes — same behavior, clearer prose with emphasis on key terms. Verified live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-use credit NASDE drives every agent non-interactively (claude -p / codex exec / Gemini CLI equivalent) — verified in code (claude_subprocess.py uses `claude -p`; the agent under test runs headless via Harbor). Document this clearly, since it's a real current limitation (interactive mode is planned) and it touches Anthropic's programmatic-use terms. Per Anthropic's announcement, from 2026-06-15 paid Claude plans include a dedicated monthly credit for programmatic usage (covering `claude -p`, the Agent SDK, Claude Code GitHub Actions) — so running NASDE on a paid plan is supported, not restricted. Framed accurately (a credit, not a block) with a link to Anthropic's terms. Added in three places: - Overview → new "How NASDE drives the agents" subsection under the boundaries. - Authentication & Opik → a note box at the top. - Troubleshooting FAQ → "interactive mode?" and "programmatic-use?" Q&As. Verified live, build green, links valid. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The README's Documentation section linked to pre-consolidation slugs that now 404 on the live site: - concepts/scoring → concepts/how-it-works - reference/commands → reference/cli-reference - getting-started/installation (merged into quick-start) → dropped README lives outside website/, so the build's link-validator never caught these. Verified every README docs link now resolves to an existing page. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…frontmatter rule After rebasing onto main (ADR-012, #65/66/67), the docs needed two additions to stay accurate with how Codex/Gemini skills now work: - Plugins & Skills: new "How skills reach each agent" section — Claude discovers from /app/.claude/skills/, while Codex/Gemini auto-discover only from a HOME-scoped dir ($HOME/.agents/skills, ~/.gemini/skills), where NASDE now routes them natively (covers agents_skills/, [[skill]], and plugin skills). - A caution box on the strict requirement: a Codex/Gemini SKILL.md must START with a `---` frontmatter line, or the loader silently drops the skill (a leading comment above the frontmatter is the common trap); NASDE warns at run time. Cross-linked from Configuration. Our docs never named the old wrong /app/.agents/skills destination, so nothing was contradicted — these are additions, not corrections. Build green, links valid. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
867ffa1 to
7410c74
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Splits the 877-line README into a searchable Starlight (Astro) documentation site living in
website/, deployed to GitHub Pages via a path-filtered workflow. README becomes a ~108-line landing that links to the site.Why
The README had grown to 877 lines / 48 KB across 26 H2 sections — a manual without search, versioning, or navigation. This extracts the full documentation into a proper site (single source of truth) while keeping README as the landing.
Structure
base: /nasde-toolkit/, built-in Pagefind search,llms.txt(starlight-llms-txt), mermaid (astro-mermaid), and starlight-links-validator as a build-time broken-link gate.docs/use-cases.md+docs/benchmark-results.md.README & cross-refs
docs/use-cases.md+docs/benchmark-results.md→ pointer stubs to the site (no duplicate source of truth).pyproject.tomlDocumentation URL → live site.sips), reused in site + README.Deploy
.github/workflows/docs-deploy.yml: path-filtered (website/**) push-to-main Pages deploy, Node 20,npm ci.Validation
websitebuild green — all internal links valid, 26 pages, mermaid renders,llms.txtgenerated, images optimized to webp.ruff/mypy/pytest(369 passed) all pass; tooling is scoped tosrc/+tests/and never toucheswebsite/.docs-deployworkflow does not run on PRs (push-to-main only), so it won't block this PR.🤖 Generated with Claude Code