Skip to content

docs: migrate README into Starlight docs site (website/) on GitHub Pages#64

Merged
szjanikowski merged 14 commits into
mainfrom
docs/starlight-site
Jun 10, 2026
Merged

docs: migrate README into Starlight docs site (website/) on GitHub Pages#64
szjanikowski merged 14 commits into
mainfrom
docs/starlight-site

Conversation

@szjanikowski

Copy link
Copy Markdown
Contributor

What

Splits the 877-line README into a searchable Starlight (Astro) documentation site living in website/, deployed to GitHub Pages via a path-filtered workflow. README becomes a ~108-line landing that links to the site.

Why

The README had grown to 877 lines / 48 KB across 26 H2 sections — a manual without search, versioning, or navigation. This extracts the full documentation into a proper site (single source of truth) while keeping README as the landing.

Structure

  • Generator: Starlight 0.39 + Astro 6. base: /nasde-toolkit/, built-in Pagefind search, llms.txt (starlight-llms-txt), mermaid (astro-mermaid), and starlight-links-validator as a build-time broken-link gate.
  • 25 content pages across Getting Started / Concepts / Reference / Guides, migrated faithfully from README + docs/use-cases.md + docs/benchmark-results.md.
  • Internal-only docs (ADR, ARCHITECTURE.md, RELEASING.md, superpowers) stay in the repo and are linked out to GitHub — the site does not host them.

README & cross-refs

  • README slimmed 877 → 108 lines: branding, four-steps, quick-start, skills table, prominent docs-site link. Everything else lives only on the site.
  • docs/use-cases.md + docs/benchmark-results.md → pointer stubs to the site (no duplicate source of truth).
  • pyproject.toml Documentation URL → live site.
  • Logo resized 4.99 MB / 2816px → 505 KB / 800px (sips), reused in site + README.

Deploy

  • .github/workflows/docs-deploy.yml: path-filtered (website/**) push-to-main Pages deploy, Node 20, npm ci.
  • ⚠️ One-time manual step (human): Settings → Pages → Source = GitHub Actions before the first deploy can publish. First-deploy 404 until set.
  • Deploy is not a required status check (avoids the path-filter PR-wedge).

Validation

  • website build green — all internal links valid, 26 pages, mermaid renders, llms.txt generated, images optimized to webp.
  • ✅ Python CI intact — ruff / mypy / pytest (369 passed) all pass; tooling is scoped to src/+tests/ and never touches website/.
  • This docs-deploy workflow does not run on PRs (push-to-main only), so it won't block this PR.

🤖 Generated with Claude Code

Szymon Janikowski and others added 14 commits June 10, 2026 15:54
Split the 877-line README into a searchable Starlight (Astro) docs site
living in website/, deployed to GitHub Pages via a path-filtered workflow.

- Starlight 0.39 + Astro 6, base /nasde-toolkit/, Pagefind search (built-in),
  llms.txt (starlight-llms-txt), mermaid (astro-mermaid), and
  starlight-links-validator as a build-time broken-link gate.
- 25 content pages across Getting Started / Concepts / Reference / Guides,
  migrated faithfully from README sections + docs/use-cases.md +
  docs/benchmark-results.md. ADR/ARCHITECTURE/RELEASING stay internal in the
  repo and are linked out to GitHub.

- Slimmed 877 → 108 lines: branding, four-steps, quick-start, skills table,
  prominent link to the docs site. Everything else lives only on the site
  (single source of truth). docs/use-cases.md + docs/benchmark-results.md
  become pointer stubs. pyproject Documentation URL → live site.

- .github/workflows/docs-deploy.yml: path-filtered (website/**) push-to-main
  Pages deploy, Node 20, npm ci. NOT a required check (avoids PR-wedge).
- .gitignore: website/node_modules, website/dist, website/.astro.

- Logo resized 4.99MB/2816px → 505KB/800px (sips), reused in site + README.

Validation: website build green (all internal links valid, 26 pages, mermaid
renders, llms.txt generated). Python CI intact — ruff/mypy/pytest (369) all
pass; tooling is scoped to src/+tests/ and never touches website/.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nding

UX/polish pass on the Starlight site after live review:

- **Logo**: replace the broken default favicon-as-logo with the Noesis logo
  (src/assets/noesis-logo.png), shown alongside the "NASDE Toolkit" title
  (no more replacesTitle, no "NASDE Toolkit | NASDE Toolkit" dup).
- **Brand theme**: src/styles/custom.css sets a teal/cyan accent from the
  Noesis logo, consistent with noesis-docs, replacing Starlight's default
  purple. Works in light + dark.
- **Top-level nav**: HeaderNav.astro overrides SocialIcons to add
  "Documentation" + "Changelog" header links (the Docusaurus-style navbar
  users expect), with active-state highlighting.
- **Landing reworked**: the splash now reads as documentation, not a product
  page — hero tagline says "the official documentation", logo as hero image,
  Get Started / What is NASDE? / GitHub actions, and pictogram LinkCards for
  Getting Started / Concepts / Reference / Guides. The "what is NASDE / four
  steps" product blurb lives in the docs (getting-started/overview), not on
  the landing.

Search verified working — "calibration" returns "Calibrating the Rubric"
first; the earlier "call" result was a typo ("callibr"), not a config issue
(Pagefind has no fuzzy typo matching).

Verified live in Chrome (light + dark): logo, teal accent, nav links,
landing cards, sidebar active state, and the mermaid pipeline diagram all
render. Build green, all internal links valid, 26 pages.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e hero contrast

Follow-up polish from live review:

- **Branding split**: NASDE product logo (nasde-toolkit-logo.png) as the hero
  image on the landing; Noesis Vision logo in the navbar and as the favicon
  (public/favicon.png). Product on the page, company in the chrome.
- **Title**: "Nasde Toolkit Docs" (normal case, not all-caps NASDE) in the
  site title, hero, and "What is Nasde?" action.
- **Fix light-mode hero invisible title**: custom.css overrode
  --sl-color-white globally but didn't set it for the light theme, so the
  hero <h1> rendered white-on-white. Light theme now sets --sl-color-white
  to the dark heading color (Starlight's inverted-scale convention).
  Verified: hero title rgb(24,26,31) on white (light), rgb(255,255,255) on
  dark; deep-page body text contrast intact in both.
- **Hero logo card**: rounded corners + soft card background so the
  light-background NASDE artwork sits cleanly in dark mode.

Search note: Pagefind has no typo tolerance by design ("callibrate" won't
match "calibration") — accepted as the standard static-search trade-off;
correct spelling returns the right page first.

Verified live in Chrome, light + dark.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d TOC

Live review showed the imbalance vs noesis-docs: too many thin pages made the
left sidebar long (~23 items) while the right "On this page" TOC was often
empty (pages were continuous prose with no ## sections). noesis-docs does the
opposite — few dense pages, each with a rich TOC. This rebalances to match.

Approved "B-corrected" structure — 13 pages, each with a coherent flow:

- Getting Started (2): Overview · Quick Start (← Prerequisites + Installation
  + Quick Start)
- Concepts (4): How It Works (← Scoring + Pipeline, de-duplicated into one
  staged narrative) · A Real Task · Token & Cost · Calibrating the Rubric
- Reference (3): CLI Reference (← Cheatsheet + Commands) · Configuration
  (← Project Structure + variant.toml + task.toml) · Authentication & Opik
  (← Authentication + Verifying Opik)
- Guides (4): Running & Configuring Runs (← Local Repo + Cloud + Reviewer +
  Exporting) · Plugins & Skills (← Plugin + Skill-by-ref + Scoping) ·
  Use Cases · Benchmark Results

Every page now has ## sections, so the right-hand TOC fills on all of them
(Token & Cost and Calibration also got headings added). All internal links
and anchors rewritten to the merged slugs.

Verified live in Chrome: short sidebar + populated TOC on calibration and
how-it-works; mermaid renders in How It Works. Build green, all internal
links valid, 13 doc pages.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ymmetry, glossary, boundaries

Information-architecture review (from a "never heard of NASDE" perspective,
one subagent pass + first-time-user walkthrough) surfaced gaps beyond the
earlier polish. This addresses them and finalizes the skeleton.

## Structural fixes
- **Agent ↔ reviewer symmetry (the big one)**: running-benchmarks.md now has a
  "Configuring the agent under test" section (instructions / skills / MCP /
  reasoning effort / scoping) mirroring the reviewer section, with a tip box
  spelling out the parallel. Configuring the agent under test was previously
  scattered across three pages.
- **General → specific**: Overview opens with "Why NASDE? — the problem" (the
  hook) and adds "What NASDE is — and is not" (boundaries: not a CI
  replacement, not production, needs Docker/subscription) before the mechanics.

## New pages (5)
- getting-started/reading-results — the run summary table, jobs/ layout,
  interpreting assessment_summary.json, agent-noise vs judge-noise.
- concepts/key-terms — glossary (variant, trial, job, rubric, dimension,
  reviewer, trajectory, Harbor, Opik, ...).
- creating-benchmarks/ (NEW GROUP): anatomy (what a benchmark is made of +
  task-files mermaid) and assessment-criteria (how to write dimensions +
  score ladders — the core value).
- guides/troubleshooting — Docker/auth/OOM(137)/flaky-eval/rate-limit +
  what-to-expect (time, cost, trials) + FAQ.

## Images / diagrams
- How It Works: per-dimension radar (existing asset) at the scoring section.
- Configuration + Anatomy: "what each task file does" mermaid.
- Calibration: mermaid loop flowchart (measure→diagnose→fix→re-measure).
- Token & Cost: section retitled "Quality vs. cost: the Pareto frontier",
  value-led rewrite + image placeholder (pareto.png — TODO, user generates).
- Reading Your Results: placeholder for a real `nasde run` screenshot (TODO).

## Reference
- Configuration: "Quick reference: configuring a variant" checklist at top.

Sidebar: 18 pages across 5 groups (Getting Started · Concepts · Creating
Benchmarks · Reference · Guides). Build green, all internal links valid,
20 pages, 4 mermaid diagrams render.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…CLI images

Accuracy pass after a fact-check (the troubleshooting/results content was
partly written from project memory; this verifies it against the code and
corrects the parts that conflated two distinct knobs).

- **Fix attempts vs eval-repetitions conflation** (troubleshooting + reading-
  results): `--attempts`/`-n` is the number of independent AGENT runs per task
  (the source of the between-trial `mean ±std`); `--eval-repetitions` is the
  number of REVIEWER passes per trial (judge noise). These were blurred into
  "trials/repetitions" and "run × repetitions". Now stated precisely, with the
  two noise sources mapped to where each appears. (Verified against cli.py:
  --attempts/-n and --eval-repetitions are separate flags; timeout_sec 1800,
  memory_mb 4096 confirmed in CLAUDE.md/scaffold.)
- **Real images replace placeholders**:
  - concepts/token-cost.md — the actual quality-vs-cost / quality-vs-tokens
    Pareto chart from a real skill×model matrix, with a caption explaining the
    shared cost panel + per-provider token panels.
  - getting-started/reading-results.md — the real `nasde run` startup banner;
    the section is reworded so the prose matches what the screenshot actually
    shows (the config banner) vs. the end-of-run summary table (described in
    text, since that's a separate screen).

Build green, all internal links valid, images optimized to webp.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… styling

Systematic technical-accuracy audit of every page against the source code
(cli.py, config.py, runner/evaluator/results_exporter, scaffold). Fixes all
findings; the core claims (effort priority, rubric fingerprint, cost formula,
eval clustering, CLI set, scaffold defaults) audited as correct.

WRONG (a user would hit these):
- reading-results: trial dir is `verifier/` (reward.txt + test-stdout.txt),
  not `logs/verifier/` (that's the in-container path test.sh writes to).
- configuration + running-benchmarks: `[nasde.source]` shown as JSON; it's a
  TOML table in task.toml. Pasting the JSON would break the file.
- use-cases: the per-task config file is `task.toml`, not `task.json` (×3).

IMPRECISE:
- cli-reference: run options table was missing --all-variants, --attempts/-n,
  --eval-repetitions, --max-concurrent-eval, --job-suffix. Added.
- authentication: Codex auth checks file *presence* (sets CODEX_FORCE_AUTH_JSON),
  not `auth_mode:"chatgpt"`. Gemini Vertex env vars corrected to the ones the
  code actually checks (GOOGLE_API_KEY / GOOGLE_APPLICATION_CREDENTIALS, not
  GOOGLE_CLOUD_PROJECT). Opik feedback-score list completed (_std, eval_n).
- configuration: nasde.toml sample now shows eval_repetitions + project_name.
- calibration: .calibration/ list now includes assessment_summary.json.

Pricing honesty:
- token-cost: softened "edit pricing.toml" — the catalog is bundled, so after
  a PyPI install editing it is impractical (wiped on upgrade); editing works
  from a source checkout, and a per-project/user override is flagged as
  planned. (load_pricing() is called with no path at both call-sites, so no
  override is wired today — tracked as a follow-up.)

Styling:
- Inline `code` in headings scaled to 0.85em (was oversized).
- Previous/Next pager link titles toned down to h5.

Build green, all internal links valid, verified live (light + dark).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Previous/Next pager titles (.link-title) were still ~24px — louder than
the page's own text. Target .link-title specifically (the earlier rule hit
the wrong span) and set it to body size (--sl-text-base, weight 600), so the
pager sits quietly under its Previous/Next labels. Verified live.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The "On this page" panel read smaller than the left sidebar: its heading was
13px (below its own 14px links) and the rows were tight. Bump the
starlight-toc heading to --sl-text-sm and relax the link rows (line-height +
padding) so the TOC is as legible as the left nav. Verified live.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The header nav links were --sl-text-sm, reading small next to the rest of the
navbar. Bump to --sl-text-base so they sit as first-class navigation. Verified
live.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…guage

Two sections read like expert shorthand — dense, jargon-heavy, assuming the
reader already knows the two noise sources and prompt-cache mechanics. Rewritten
for a newcomer who's never used NASDE.

- "A mean is never reported bare" → "Why scores come with a ± (and why that
  matters)". Opens with the concrete question (is A really better than B, or
  did it get lucky?), explains "wobble" before using it, splits the two noise
  sources into bolded bullets, drops the dense parenthetical jargon.
- "How cost is computed" → leads with what it means for the user (consistent,
  comparable cost), explains prompt caching in plain terms and why ignoring it
  keeps comparisons fair — instead of stating the billing rule tersely.

No factual changes — same behavior, clearer prose with emphasis on key terms.
Verified live.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-use credit

NASDE drives every agent non-interactively (claude -p / codex exec / Gemini CLI
equivalent) — verified in code (claude_subprocess.py uses `claude -p`; the
agent under test runs headless via Harbor). Document this clearly, since it's a
real current limitation (interactive mode is planned) and it touches Anthropic's
programmatic-use terms.

Per Anthropic's announcement, from 2026-06-15 paid Claude plans include a
dedicated monthly credit for programmatic usage (covering `claude -p`, the
Agent SDK, Claude Code GitHub Actions) — so running NASDE on a paid plan is
supported, not restricted. Framed accurately (a credit, not a block) with a
link to Anthropic's terms.

Added in three places:
- Overview → new "How NASDE drives the agents" subsection under the boundaries.
- Authentication & Opik → a note box at the top.
- Troubleshooting FAQ → "interactive mode?" and "programmatic-use?" Q&As.

Verified live, build green, links valid.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The README's Documentation section linked to pre-consolidation slugs that now
404 on the live site:
- concepts/scoring → concepts/how-it-works
- reference/commands → reference/cli-reference
- getting-started/installation (merged into quick-start) → dropped

README lives outside website/, so the build's link-validator never caught
these. Verified every README docs link now resolves to an existing page.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…frontmatter rule

After rebasing onto main (ADR-012, #65/66/67), the docs needed two additions to
stay accurate with how Codex/Gemini skills now work:

- Plugins & Skills: new "How skills reach each agent" section — Claude
  discovers from /app/.claude/skills/, while Codex/Gemini auto-discover only
  from a HOME-scoped dir ($HOME/.agents/skills, ~/.gemini/skills), where NASDE
  now routes them natively (covers agents_skills/, [[skill]], and plugin skills).
- A caution box on the strict requirement: a Codex/Gemini SKILL.md must START
  with a `---` frontmatter line, or the loader silently drops the skill (a
  leading comment above the frontmatter is the common trap); NASDE warns at run
  time. Cross-linked from Configuration.

Our docs never named the old wrong /app/.agents/skills destination, so nothing
was contradicted — these are additions, not corrections. Build green, links valid.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@szjanikowski szjanikowski force-pushed the docs/starlight-site branch from 867ffa1 to 7410c74 Compare June 10, 2026 13:59
@szjanikowski szjanikowski merged commit bc4a2f4 into main Jun 10, 2026
9 checks passed
@szjanikowski szjanikowski deleted the docs/starlight-site branch June 10, 2026 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant