Skip to content

docs: README v0.11 highlights + honest v1.4 framing + readability reorganization#35

Open
AIMLPM wants to merge 4 commits into
mainfrom
docs/readme-v011-prose-update
Open

docs: README v0.11 highlights + honest v1.4 framing + readability reorganization#35
AIMLPM wants to merge 4 commits into
mainfrom
docs/readme-v011-prose-update

Conversation

@AIMLPM
Copy link
Copy Markdown
Owner

@AIMLPM AIMLPM commented May 15, 2026

Summary

Three concerns rolled into one README pass so the result is review-ready:

1. v0.11 highlights (replaces stale v0.10 section)

  • Aggregator URL filter (v0.11.1) — headline retrieval fix
  • Binary downloads (v0.11.0) — capability addition
  • v0.10.1 local-embedder + Tenacity retry items kept (still accurate)

2. Honest v1.4 benchmark framing

Rewritten for zero-context readers. Replaces the original "anchor-bias" jargon with plain-language explanation of:

  1. What v0.11.1 fixes (+0.02 to +0.04 MRR target)
  2. What upcoming releases target (crawl-strategy)
  3. What the bench itself is improving (~5–10% misses recovered)

Plus an explicit goal: 7th → mid-pack on the next benchmark cycle (+0.10 to +0.20 MRR).

3. Readability reorganization

  • Resolve duplicate `## Installation` headers — rename the first one to `## What's New` (it was a highlights section, not install instructions). Fixes anchor-link collision.
  • Add top-of-README `Latest: v0.11.1` pointer — single source of truth for version queries.
  • Add `pages.jsonl` schema table in Quickstart — 9 fields documented; downstream code-gen + LLM tooling can rely on it.
  • Extract Common Recipes (254 lines) to `docs/RECIPES.md` with proper `### ` headings (no more bash-comments-as-section-markers). README keeps a 14-line teaser. The benchmark `
    Details` comparison block stays in README.
  • Consolidate 4 micro-sections (Contributing, Security, Privacy, License — 14 lines) into one `## Project info` block.

Net effect

Before After
README.md line count 814 581 (-28%)
Duplicate H2 anchors yes (`## Installation` × 2) no
`pages.jsonl` formal schema no (one example only) 9-field table
Recipes location inline (289 lines) `docs/RECIPES.md` (organized by category)
Footer micro-sections 4 (3 lines each) 1 consolidated bullet block
v1.4 bench standing not mentioned in prose explicitly + with active-improvement note

Verification

  • All internal links verified to resolve (`CHANGELOG.md`, `docs/ARCHITECTURE.md`, `docs/BENCHMARKS.md`, `docs/LLM_PROMPT.md`, `docs/RECIPES.md`, `docs/SUPABASE.md`)
  • No duplicate H2 anchors
  • Section flow follows: hook → quickstart → what's new → recipes → install → crawling → optional → reference → footer

🤖 Generated with Claude Code

AIMLPM added 3 commits May 15, 2026 11:38
Replaces the stale v0.10 highlights section with v0.11.0 (binary
downloads + filters) and v0.11.1 (aggregator URL filter) bullets,
keeping the v0.10.1 local-embedder and Tenacity retry items.

Adds an honest framing block on the v1.4 leaderboard result: markcrawl
1st on cost but 7th of 7 on answer quality + retrieval MRR. Notes the
active improvement work (v0.11.1 aggregator filter targets a measured
retrieval failure mode; v0.12 track + bench v1.5 methodology hardening
underway).

The benchmark table on lines 339-359 was auto-updated by PR #33; this
commit only changes the prose highlights.
…raction, schema table, footer consolidation

Five changes to improve human + LLM readability:

1. Rename '## Installation / Upgrading' (line 22) → '## What's New' to
   resolve the duplicate-header collision with the canonical '## Installation'
   section. Anchor links + LLM section parsing now disambiguate cleanly.

2. Add a top-of-README 'Latest: v0.11.1 (2026-05-12)' pointer so version
   queries don't require triangulating across the file.

3. Add a 'pages.jsonl' schema table in the Quickstart section — 9 rows
   covering all current fields (url, title, crawled_at, citation, tool,
   text, downloads, images, screenshot). Lets downstream code-gen and LLM
   tooling rely on the schema without inferring from one example.

4. Extract Common Recipes (254 lines, 30% of the README) to a new
   docs/RECIPES.md file with table-of-contents and proper '### <recipe name>'
   headings (replacing the bash-comments-as-headings pattern). README
   retains a 14-line teaser pointing to the new file. The benchmark
   '<details>' comparison block stays in README.

5. Consolidate 4 micro-sections (Contributing, Security, Privacy,
   License — 14 lines combined) into a single '## Project info' bulleted
   block.

Net result: 814 → 581 lines (-28%), no duplicate section anchors,
all internal links verified to resolve.
@AIMLPM AIMLPM changed the title docs(readme): v0.11 highlights + honest v1.4 bench framing docs: README v0.11 highlights + honest v1.4 framing + readability reorganization May 16, 2026
Pre-existing lint failure on main since PR #34 (v0.11.1 ship) — ruff
wants underscore-prefixed names sorted first in the import list. Applied
`ruff check --fix`. No semantic change to the tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant