docs: migrate README into Starlight docs site (website/) on GitHub Pages by szjanikowski · Pull Request #64 · NoesisVision/nasde-toolkit

szjanikowski · 2026-06-09T16:40:40Z

What

Splits the 877-line README into a searchable Starlight (Astro) documentation site living in website/, deployed to GitHub Pages via a path-filtered workflow. README becomes a ~108-line landing that links to the site.

Why

The README had grown to 877 lines / 48 KB across 26 H2 sections — a manual without search, versioning, or navigation. This extracts the full documentation into a proper site (single source of truth) while keeping README as the landing.

Structure

Generator: Starlight 0.39 + Astro 6. base: /nasde-toolkit/, built-in Pagefind search, llms.txt (starlight-llms-txt), mermaid (astro-mermaid), and starlight-links-validator as a build-time broken-link gate.
25 content pages across Getting Started / Concepts / Reference / Guides, migrated faithfully from README + docs/use-cases.md + docs/benchmark-results.md.
Internal-only docs (ADR, ARCHITECTURE.md, RELEASING.md, superpowers) stay in the repo and are linked out to GitHub — the site does not host them.

README & cross-refs

README slimmed 877 → 108 lines: branding, four-steps, quick-start, skills table, prominent docs-site link. Everything else lives only on the site.
docs/use-cases.md + docs/benchmark-results.md → pointer stubs to the site (no duplicate source of truth).
pyproject.toml Documentation URL → live site.
Logo resized 4.99 MB / 2816px → 505 KB / 800px (sips), reused in site + README.

Deploy

.github/workflows/docs-deploy.yml: path-filtered (website/**) push-to-main Pages deploy, Node 20, npm ci.
⚠️ One-time manual step (human): Settings → Pages → Source = GitHub Actions before the first deploy can publish. First-deploy 404 until set.
Deploy is not a required status check (avoids the path-filter PR-wedge).

Validation

✅ website build green — all internal links valid, 26 pages, mermaid renders, llms.txt generated, images optimized to webp.
✅ Python CI intact — ruff / mypy / pytest (369 passed) all pass; tooling is scoped to src/+tests/ and never touches website/.
This docs-deploy workflow does not run on PRs (push-to-main only), so it won't block this PR.

🤖 Generated with Claude Code

Split the 877-line README into a searchable Starlight (Astro) docs site living in website/, deployed to GitHub Pages via a path-filtered workflow. - Starlight 0.39 + Astro 6, base /nasde-toolkit/, Pagefind search (built-in), llms.txt (starlight-llms-txt), mermaid (astro-mermaid), and starlight-links-validator as a build-time broken-link gate. - 25 content pages across Getting Started / Concepts / Reference / Guides, migrated faithfully from README sections + docs/use-cases.md + docs/benchmark-results.md. ADR/ARCHITECTURE/RELEASING stay internal in the repo and are linked out to GitHub. - Slimmed 877 → 108 lines: branding, four-steps, quick-start, skills table, prominent link to the docs site. Everything else lives only on the site (single source of truth). docs/use-cases.md + docs/benchmark-results.md become pointer stubs. pyproject Documentation URL → live site. - .github/workflows/docs-deploy.yml: path-filtered (website/**) push-to-main Pages deploy, Node 20, npm ci. NOT a required check (avoids PR-wedge). - .gitignore: website/node_modules, website/dist, website/.astro. - Logo resized 4.99MB/2816px → 505KB/800px (sips), reused in site + README. Validation: website build green (all internal links valid, 26 pages, mermaid renders, llms.txt generated). Python CI intact — ruff/mypy/pytest (369) all pass; tooling is scoped to src/+tests/ and never touches website/. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…nding UX/polish pass on the Starlight site after live review: - **Logo**: replace the broken default favicon-as-logo with the Noesis logo (src/assets/noesis-logo.png), shown alongside the "NASDE Toolkit" title (no more replacesTitle, no "NASDE Toolkit | NASDE Toolkit" dup). - **Brand theme**: src/styles/custom.css sets a teal/cyan accent from the Noesis logo, consistent with noesis-docs, replacing Starlight's default purple. Works in light + dark. - **Top-level nav**: HeaderNav.astro overrides SocialIcons to add "Documentation" + "Changelog" header links (the Docusaurus-style navbar users expect), with active-state highlighting. - **Landing reworked**: the splash now reads as documentation, not a product page — hero tagline says "the official documentation", logo as hero image, Get Started / What is NASDE? / GitHub actions, and pictogram LinkCards for Getting Started / Concepts / Reference / Guides. The "what is NASDE / four steps" product blurb lives in the docs (getting-started/overview), not on the landing. Search verified working — "calibration" returns "Calibrating the Rubric" first; the earlier "call" result was a typo ("callibr"), not a config issue (Pagefind has no fuzzy typo matching). Verified live in Chrome (light + dark): logo, teal accent, nav links, landing cards, sidebar active state, and the mermaid pipeline diagram all render. Build green, all internal links valid, 26 pages. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e hero contrast Follow-up polish from live review: - **Branding split**: NASDE product logo (nasde-toolkit-logo.png) as the hero image on the landing; Noesis Vision logo in the navbar and as the favicon (public/favicon.png). Product on the page, company in the chrome. - **Title**: "Nasde Toolkit Docs" (normal case, not all-caps NASDE) in the site title, hero, and "What is Nasde?" action. - **Fix light-mode hero invisible title**: custom.css overrode --sl-color-white globally but didn't set it for the light theme, so the hero <h1> rendered white-on-white. Light theme now sets --sl-color-white to the dark heading color (Starlight's inverted-scale convention). Verified: hero title rgb(24,26,31) on white (light), rgb(255,255,255) on dark; deep-page body text contrast intact in both. - **Hero logo card**: rounded corners + soft card background so the light-background NASDE artwork sits cleanly in dark mode. Search note: Pagefind has no typo tolerance by design ("callibrate" won't match "calibration") — accepted as the standard static-search trade-off; correct spelling returns the right page first. Verified live in Chrome, light + dark. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…d TOC Live review showed the imbalance vs noesis-docs: too many thin pages made the left sidebar long (~23 items) while the right "On this page" TOC was often empty (pages were continuous prose with no ## sections). noesis-docs does the opposite — few dense pages, each with a rich TOC. This rebalances to match. Approved "B-corrected" structure — 13 pages, each with a coherent flow: - Getting Started (2): Overview · Quick Start (← Prerequisites + Installation + Quick Start) - Concepts (4): How It Works (← Scoring + Pipeline, de-duplicated into one staged narrative) · A Real Task · Token & Cost · Calibrating the Rubric - Reference (3): CLI Reference (← Cheatsheet + Commands) · Configuration (← Project Structure + variant.toml + task.toml) · Authentication & Opik (← Authentication + Verifying Opik) - Guides (4): Running & Configuring Runs (← Local Repo + Cloud + Reviewer + Exporting) · Plugins & Skills (← Plugin + Skill-by-ref + Scoping) · Use Cases · Benchmark Results Every page now has ## sections, so the right-hand TOC fills on all of them (Token & Cost and Calibration also got headings added). All internal links and anchors rewritten to the merged slugs. Verified live in Chrome: short sidebar + populated TOC on calibration and how-it-works; mermaid renders in How It Works. Build green, all internal links valid, 13 doc pages. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ymmetry, glossary, boundaries Information-architecture review (from a "never heard of NASDE" perspective, one subagent pass + first-time-user walkthrough) surfaced gaps beyond the earlier polish. This addresses them and finalizes the skeleton. ## Structural fixes - **Agent ↔ reviewer symmetry (the big one)**: running-benchmarks.md now has a "Configuring the agent under test" section (instructions / skills / MCP / reasoning effort / scoping) mirroring the reviewer section, with a tip box spelling out the parallel. Configuring the agent under test was previously scattered across three pages. - **General → specific**: Overview opens with "Why NASDE? — the problem" (the hook) and adds "What NASDE is — and is not" (boundaries: not a CI replacement, not production, needs Docker/subscription) before the mechanics. ## New pages (5) - getting-started/reading-results — the run summary table, jobs/ layout, interpreting assessment_summary.json, agent-noise vs judge-noise. - concepts/key-terms — glossary (variant, trial, job, rubric, dimension, reviewer, trajectory, Harbor, Opik, ...). - creating-benchmarks/ (NEW GROUP): anatomy (what a benchmark is made of + task-files mermaid) and assessment-criteria (how to write dimensions + score ladders — the core value). - guides/troubleshooting — Docker/auth/OOM(137)/flaky-eval/rate-limit + what-to-expect (time, cost, trials) + FAQ. ## Images / diagrams - How It Works: per-dimension radar (existing asset) at the scoring section. - Configuration + Anatomy: "what each task file does" mermaid. - Calibration: mermaid loop flowchart (measure→diagnose→fix→re-measure). - Token & Cost: section retitled "Quality vs. cost: the Pareto frontier", value-led rewrite + image placeholder (pareto.png — TODO, user generates). - Reading Your Results: placeholder for a real `nasde run` screenshot (TODO). ## Reference - Configuration: "Quick reference: configuring a variant" checklist at top. Sidebar: 18 pages across 5 groups (Getting Started · Concepts · Creating Benchmarks · Reference · Guides). Build green, all internal links valid, 20 pages, 4 mermaid diagrams render. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…CLI images Accuracy pass after a fact-check (the troubleshooting/results content was partly written from project memory; this verifies it against the code and corrects the parts that conflated two distinct knobs). - **Fix attempts vs eval-repetitions conflation** (troubleshooting + reading- results): `--attempts`/`-n` is the number of independent AGENT runs per task (the source of the between-trial `mean ±std`); `--eval-repetitions` is the number of REVIEWER passes per trial (judge noise). These were blurred into "trials/repetitions" and "run × repetitions". Now stated precisely, with the two noise sources mapped to where each appears. (Verified against cli.py: --attempts/-n and --eval-repetitions are separate flags; timeout_sec 1800, memory_mb 4096 confirmed in CLAUDE.md/scaffold.) - **Real images replace placeholders**: - concepts/token-cost.md — the actual quality-vs-cost / quality-vs-tokens Pareto chart from a real skill×model matrix, with a caption explaining the shared cost panel + per-provider token panels. - getting-started/reading-results.md — the real `nasde run` startup banner; the section is reworded so the prose matches what the screenshot actually shows (the config banner) vs. the end-of-run summary table (described in text, since that's a separate screen). Build green, all internal links valid, images optimized to webp. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… styling Systematic technical-accuracy audit of every page against the source code (cli.py, config.py, runner/evaluator/results_exporter, scaffold). Fixes all findings; the core claims (effort priority, rubric fingerprint, cost formula, eval clustering, CLI set, scaffold defaults) audited as correct. WRONG (a user would hit these): - reading-results: trial dir is `verifier/` (reward.txt + test-stdout.txt), not `logs/verifier/` (that's the in-container path test.sh writes to). - configuration + running-benchmarks: `[nasde.source]` shown as JSON; it's a TOML table in task.toml. Pasting the JSON would break the file. - use-cases: the per-task config file is `task.toml`, not `task.json` (×3). IMPRECISE: - cli-reference: run options table was missing --all-variants, --attempts/-n, --eval-repetitions, --max-concurrent-eval, --job-suffix. Added. - authentication: Codex auth checks file *presence* (sets CODEX_FORCE_AUTH_JSON), not `auth_mode:"chatgpt"`. Gemini Vertex env vars corrected to the ones the code actually checks (GOOGLE_API_KEY / GOOGLE_APPLICATION_CREDENTIALS, not GOOGLE_CLOUD_PROJECT). Opik feedback-score list completed (_std, eval_n). - configuration: nasde.toml sample now shows eval_repetitions + project_name. - calibration: .calibration/ list now includes assessment_summary.json. Pricing honesty: - token-cost: softened "edit pricing.toml" — the catalog is bundled, so after a PyPI install editing it is impractical (wiped on upgrade); editing works from a source checkout, and a per-project/user override is flagged as planned. (load_pricing() is called with no path at both call-sites, so no override is wired today — tracked as a follow-up.) Styling: - Inline `code` in headings scaled to 0.85em (was oversized). - Previous/Next pager link titles toned down to h5. Build green, all internal links valid, verified live (light + dark). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The Previous/Next pager titles (.link-title) were still ~24px — louder than the page's own text. Target .link-title specifically (the earlier rule hit the wrong span) and set it to body size (--sl-text-base, weight 600), so the pager sits quietly under its Previous/Next labels. Verified live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The "On this page" panel read smaller than the left sidebar: its heading was 13px (below its own 14px links) and the rows were tight. Bump the starlight-toc heading to --sl-text-sm and relax the link rows (line-height + padding) so the TOC is as legible as the left nav. Verified live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The header nav links were --sl-text-sm, reading small next to the rest of the navbar. Bump to --sl-text-base so they sit as first-class navigation. Verified live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…guage Two sections read like expert shorthand — dense, jargon-heavy, assuming the reader already knows the two noise sources and prompt-cache mechanics. Rewritten for a newcomer who's never used NASDE. - "A mean is never reported bare" → "Why scores come with a ± (and why that matters)". Opens with the concrete question (is A really better than B, or did it get lucky?), explains "wobble" before using it, splits the two noise sources into bolded bullets, drops the dense parenthetical jargon. - "How cost is computed" → leads with what it means for the user (consistent, comparable cost), explains prompt caching in plain terms and why ignoring it keeps comparisons fair — instead of stating the billing rule tersely. No factual changes — same behavior, clearer prose with emphasis on key terms. Verified live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…-use credit NASDE drives every agent non-interactively (claude -p / codex exec / Gemini CLI equivalent) — verified in code (claude_subprocess.py uses `claude -p`; the agent under test runs headless via Harbor). Document this clearly, since it's a real current limitation (interactive mode is planned) and it touches Anthropic's programmatic-use terms. Per Anthropic's announcement, from 2026-06-15 paid Claude plans include a dedicated monthly credit for programmatic usage (covering `claude -p`, the Agent SDK, Claude Code GitHub Actions) — so running NASDE on a paid plan is supported, not restricted. Framed accurately (a credit, not a block) with a link to Anthropic's terms. Added in three places: - Overview → new "How NASDE drives the agents" subsection under the boundaries. - Authentication & Opik → a note box at the top. - Troubleshooting FAQ → "interactive mode?" and "programmatic-use?" Q&As. Verified live, build green, links valid. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The README's Documentation section linked to pre-consolidation slugs that now 404 on the live site: - concepts/scoring → concepts/how-it-works - reference/commands → reference/cli-reference - getting-started/installation (merged into quick-start) → dropped README lives outside website/, so the build's link-validator never caught these. Verified every README docs link now resolves to an existing page. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…frontmatter rule After rebasing onto main (ADR-012, #65/66/67), the docs needed two additions to stay accurate with how Codex/Gemini skills now work: - Plugins & Skills: new "How skills reach each agent" section — Claude discovers from /app/.claude/skills/, while Codex/Gemini auto-discover only from a HOME-scoped dir ($HOME/.agents/skills, ~/.gemini/skills), where NASDE now routes them natively (covers agents_skills/, [[skill]], and plugin skills). - A caution box on the strict requirement: a Codex/Gemini SKILL.md must START with a `---` frontmatter line, or the loader silently drops the skill (a leading comment above the frontmatter is the common trap); NASDE warns at run time. Cross-linked from Configuration. Our docs never named the old wrong /app/.agents/skills destination, so nothing was contradicted — these are additions, not corrections. Build green, links valid. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Szymon Janikowski and others added 14 commits June 10, 2026 15:54

szjanikowski force-pushed the docs/starlight-site branch from 867ffa1 to 7410c74 Compare June 10, 2026 13:59

szjanikowski merged commit bc4a2f4 into main Jun 10, 2026
9 checks passed

szjanikowski deleted the docs/starlight-site branch June 10, 2026 14:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: migrate README into Starlight docs site (website/) on GitHub Pages#64

docs: migrate README into Starlight docs site (website/) on GitHub Pages#64
szjanikowski merged 14 commits into
mainfrom
docs/starlight-site

szjanikowski commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

szjanikowski commented Jun 9, 2026

What

Why

Structure

README & cross-refs

Deploy

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant