release: v0.21.0 — Connector Expansion + Multimodal + Observability by johnnichev · Pull Request #53 · johnnichev/selectools

johnnichev · 2026-04-08T16:34:17Z

Summary

v0.21.0 is the Connector Expansion release: three new vector stores, the Azure OpenAI provider, OpenTelemetry + Langfuse observers, multimodal image support across all four LLM providers, four new document loaders, and nine new toolbox tools.

Vector stores (3 new)

FAISSVectorStore — in-process FAISS with save/load persistence, thread-safe
QdrantVectorStore — REST + gRPC connector, auto-collection management
PgVectorStore — PostgreSQL pgvector extension, JSONB metadata, auto-table creation

Document loaders (4 new)

DocumentLoader.from_csv, from_json, from_html, from_url — stdlib-only by default, optional beautifulsoup4 for CSS selectors.

Toolbox (9 new tools)

Code: execute_python, execute_shell (subprocess-isolated, 10KB output cap, shell metacharacter blocklist)
Search: web_search (DuckDuckGo, no API key), scrape_url (SSRF guards)
GitHub: github_search_repos, github_get_file, github_list_issues (GITHUB_TOKEN optional)
Database: query_sqlite (PRAGMA query_only = ON), query_postgres

Multimodal messages

New ContentPart dataclass + image_message(image, prompt) helper. Message.content now accepts list[ContentPart] in addition to str. Works on OpenAI GPT-4o, Azure OpenAI, Anthropic Claude, Gemini 2.5 Flash, and Ollama vision models.

Providers (1 new)

AzureOpenAIProvider — extends OpenAIProvider with Azure deployment-name routing, AZURE_OPENAI_* env-var fallback, and AAD token auth.

Observers (2 new)

OTelObserver — emits GenAI semantic-convention spans to Jaeger / Tempo / Datadog / Honeycomb / any OTLP backend
LangfuseObserver — ships traces + generations + spans to Langfuse Cloud or self-hosted (rewritten for Langfuse 3.x API — start_span / start_generation / update_current_trace)

Bug fixes (all pre-existing, surfaced by real-call simulations during release prep)

#	Bug	Impact if shipped
5	`QdrantVectorStore.search()` called the removed `qdrant-client` `client.search()` API	Every user with `qdrant-client >=1.13` would have hit `AttributeError` on first query
6	`@tool()` on class methods fundamentally broken (`RAGTool` / `SemanticSearchTool` / `HybridSearchTool`)	RAG broken via the canonical documented API — every user hits `TypeError`
7	Gemini provider silently dropped `content_parts` images	Every Gemini vision user gets "I cannot see images"
8	Anthropic provider silently dropped `content_parts` images	Every Claude vision user gets "I don't see any image attached"
9	`LangfuseObserver` written for Langfuse 2.x, broken on Langfuse 3.x	Every Langfuse user crashes at runtime

Fixes: _BoundMethodTool descriptor in tools/decorators.py, query_points() migration in rag/stores/qdrant.py, content_parts loop in both provider _format_messages(), full Langfuse 3.x rewrite with updated mock tests.

Tests

5215 tests (+603 since v0.20.1).

Unit (4961): all mock-based tests for every v0.21.0 subsystem
Per-subsystem e2e (43): real faiss-cpu bindings, real subprocess.run, real sqlite3, real HTTP, real opentelemetry-sdk, real Qdrant Docker container, real pgvector-enabled Postgres, real OpenAI + Anthropic + Gemini vision calls, real DuckDuckGo + GitHub REST API
Integration simulations (4): FAISS RAG + OpenAI + OTel; Gemini multimodal + code execution + OTel; Anthropic SQL + code chaining; Qdrant RAG + OpenAI + OTel
App-shaped simulations (10): Skylake docs Q&A bot (3 turns, ConversationMemory); sales data analyst bot (SQL → Python chaining); knowledge base librarian (Qdrant + FAISS variants covering CSV + JSON + HTML loaders)
Multimodal coverage: sync + async + image_url URL-path regression tests for all 3 cloud providers (9 tests)

Quality gate

mypy src/: zero errors across 150 files (cleaned up 46 pre-existing issues in serve/, mcp/, observer.py, trace.py, _starlette_app.py, etc.)
black + isort + flake8: clean
bandit -r src/: clean
mkdocs build: clean (fixed pre-existing broken anchors in QUICKSTART.md and PARSER.md)
Full e2e suite with Qdrant + Postgres running: 70 collected, 64 passed, 6 skipped (Azure OpenAI + Langfuse credential-dependent), 0 failed

Documentation

7 new module docs: FAISS.md, QDRANT.md, PGVECTOR.md, MULTIMODAL.md, OTEL.md, AZURE_OPENAI.md, LANGFUSE.md
12 new examples: 77_faiss_vector_store.py through 88_langfuse_observer.py
README.md, CHANGELOG.md, CONTRIBUTING.md, docs/index.md, docs/QUICKSTART.md, ROADMAP.md, docs/llms.txt, docs/llms-full.txt, landing/index.html all updated with v0.21.0 stats, features, and Azure OpenAI enumeration
37 relative ](docs/*.md) links in README.md converted to absolute GitHub URLs per project convention

Test plan

CI passes on this PR (lint + tests + security)
Review the rewritten observe/langfuse.py for the Langfuse 3.x migration
Review the _BoundMethodTool descriptor in tools/decorators.py
After merge, tag v0.21.0 to trigger the PyPI publish workflow
Verify GitHub Pages docs auto-deploy
Verify pip install selectools==0.21.0 in a clean venv

@beta

…kers, version bump - Bump version to 0.21.0 in pyproject.toml and __init__.py - Add OTelObserver/LangfuseObserver lazy exports to observe/__init__.py - Export AzureOpenAIProvider and observe submodule from package root - Add ContentPart/image_message/text_content to public __all__ - Apply @beta to all 9 new toolbox tools (code, search, github, db) - Extend stability.beta()/stable() with Any overload for Tool objects - Add qdrant-client/faiss-cpu/beautifulsoup4 to [rag] extras - Add new [observe] extras with opentelemetry-api/langfuse

…py fix - CHANGELOG: 0.21.0 entry covering all 7 connector subsystems - README: What's New in v0.21 section, Azure provider row, FAISS/Qdrant/pgvector imports, test count 4960 - 7 new module docs in docs/modules/: FAISS, QDRANT, PGVECTOR, MULTIMODAL, OTEL, AZURE_OPENAI, LANGFUSE - mkdocs.yml nav: surfaced new pages in Core/Features/Reference sections - llms.txt + llms-full.txt: 7 new module pointers, version bumped to v0.21.0, page count 32 -> 39 - Fix pre-existing mypy error in azure_openai_provider.py default_model assignment

Every existing v0.21.0 test file mocks its backend: test_faiss_store.py injects a fake faiss module, test_code_tools.py mocks subprocess.run, test_qdrant_store.py mocks qdrant_client, etc. That leaves the real wire format, real C++ bindings, real subprocesses, real HTTP, and real vision APIs completely unverified — if our assumptions differ from reality we ship green tests and broken code. This commit adds 12 new test files marked @pytest.mark.e2e that exercise real backends: Tier 1 — no external services (28 tests, all passing): - tests/rag/test_e2e_faiss_store.py (real faiss-cpu, 5) - tests/tools/test_e2e_code_tools.py (real subprocess.run, 8) - tests/tools/test_e2e_db_tools.py (real sqlite3, 6) - tests/rag/test_e2e_document_loaders.py (real files + example.com, 6) - tests/test_e2e_otel_observer.py (real opentelemetry-sdk, 3) Tier 2 — real API calls, credentials via .env (8 tests, all passing): - tests/test_e2e_multimodal.py (real OpenAI gpt-4o-mini + Anthropic claude-haiku-4-5 + Gemini gemini-2.5-flash with an in-memory 4x4 PNG) - tests/tools/test_e2e_search_tools.py (real DuckDuckGo + scrape) - tests/tools/test_e2e_github_tools.py (real GitHub REST API) Tier 3 — skip-if-missing-deps-or-credentials (7 tests, 2 passing + 5 skip): - tests/rag/test_e2e_qdrant_store.py (skip if Qdrant not reachable) - tests/rag/test_e2e_pgvector_store.py (passes against local pgvector) - tests/providers/test_e2e_azure_openai.py (skip if AZURE_* not set) - tests/test_e2e_langfuse_observer.py (skip if LANGFUSE_* not set) Result: pytest --run-e2e → 38 passed, 5 skipped, 0 failed. Also fix three v0.21.0 module docs whose quickstart examples showed the wrong VectorStore.search() signature: search() takes a query embedding (List[float]), not a string. Updated FAISS.md, QDRANT.md, PGVECTOR.md to show the correct embed-first pattern (matches RAG.md).

…ints() qdrant-client >=1.13 removed QdrantClient.search() in favour of query_points(). The new API differs in two ways: 1. The kwarg is `query=` instead of `query_vector=` 2. The return value is a `QueryResponse` object whose `.points` attribute holds the list of `ScoredPoint`s, not a flat list The mock-based unit tests in tests/rag/test_qdrant_store.py never caught this regression because they mocked QdrantClient — the mock had a `search` attribute that didn't exist on the real client. The new e2e test in tests/rag/test_e2e_qdrant_store.py exposed the bug on the first real call against Qdrant 1.17.1. Also fix a second consistency bug exposed by the e2e test: after clear() drops the collection, query_points() raises 404 instead of returning empty results. Caught the 404 in search() and return [] to match FAISSVectorStore semantics (search-after-clear → []). Mock unit tests updated to mirror the new API: - s/client.search/client.query_points/ - Mock return values now wrap a points list in a MagicMock with a .points attribute - Assertions that checked call_kwargs["query_vector"] now check call_kwargs["query"] After fix: 35 mock tests + 2 e2e tests against real Qdrant 1.17.1 all pass. Full e2e suite: 40 passed, 3 skipped (Azure + Langfuse, no creds). Full non-e2e suite: 4961 passed, 0 regressions.

@tool

…ations Adds four end-to-end integration scenarios in tests/test_e2e_v0_21_0_simulations.py that wire multiple v0.21.0 features together with real LLM calls: 1. FAISS + real OpenAI embeddings + RAGTool + real OpenAI agent + OTel 2. Multimodal image + execute_python tool + real Gemini agent + OTel 3. query_sqlite + execute_python + real Anthropic Claude agent 4. Qdrant + real OpenAI embeddings + RAGTool + real OpenAI agent + OTel Running the simulations surfaced three pre-existing shipping blockers that the entire existing test suite (188 mock-based v0.21.0 tests + 4 "workflow" tests that never actually call agent.run) had silently hidden: Bug 6 — @tool() on class methods fundamentally broken ---------------------------------------------------- @tool() applied to a method (def f(self, query: str)) produced a class-level Tool whose function was the unbound method. When the agent executor called tool.function(**llm_kwargs) Python raised TypeError: missing 1 required positional argument: 'self', so the LLM got back a "Tool Execution Failed" string and gave up. This broke the canonical RAG pattern documented everywhere in selectools: rag_tool = RAGTool(vector_store=store) agent = Agent(tools=[rag_tool.search_knowledge_base], provider=...) RAGTool, SemanticSearchTool, and HybridSearchTool were all affected. The existing tests/rag/test_rag_workflow.py tests that appeared to exercise this path only asserted isinstance(agent, Agent) and never actually ran the agent, so nobody noticed. Fix: add a _BoundMethodTool descriptor to selectools/tools/decorators.py that detects method-decorated tools (first param is self) and returns a per-instance Tool on attribute access. The descriptor wraps the original function in functools.partial(fn, instance) so the agent executor can invoke it with only the LLM's kwargs. Class-level access falls through to a template Tool for introspection (.name, .description, etc.). Callers that previously worked around the bug by manually passing the instance as the first argument to .function (test_rag_workflow.py, test_hybrid_search.py, test_rag_regression_phase3.py) are updated to the correct API. Bug 7 — Gemini provider silently drops images from content_parts --------------------------------------------------------------- GeminiProvider._format_messages only handled the legacy message.image_base64 attribute. The v0.21.0 image_message() helper creates a Message with content_parts=[ContentPart(type="image_base64", ...)] and explicitly sets message.image_base64 = None, so Gemini received only the text prompt and replied "I cannot see images". Fix: add a content_parts loop to GeminiProvider that converts each ContentPart to types.Part(inline_data=...) or file_data=... . Bug 8 — Anthropic provider has the same bug ------------------------------------------- Same pattern in AnthropicProvider. Claude replied "I don't see any image attached". Fix: content_parts loop producing the Anthropic native {type: image, source: {type: base64, ...}} shape. OpenAI already had the right handling in providers/_openai_compat.py, so only Gemini and Anthropic needed the fix. Also: tighten tests/test_e2e_multimodal.py assertions so the provider can never silently drop an image again. Previously the tests only asserted result.content was non-empty, which passed on "I cannot see images" — a classic false-green. Now each provider must actually say "red" in its reply to a 4x4 red PNG. Finally: move the shared otel_exporter fixture into tests/conftest.py so every e2e file that needs OTel span capture uses the same singleton TracerProvider. OpenTelemetry only allows one global TracerProvider per process, and having each file install its own caused later-loaded files to silently see empty span lists when run in the same suite. Verification: - 47 e2e tests collected → 44 passed, 3 skipped (Azure OpenAI x2 and Langfuse x1 skip cleanly when no credentials are set) - Full non-e2e suite: 4961 passed, 3 skipped, 0 regressions - The 4 full-release simulations in test_e2e_v0_21_0_simulations.py now verify every v0.21.0 subsystem works together with real LLM calls

The previous e2e work proved individual v0.21.0 subsystems work in isolation (tests/test_e2e_*) and that multiple features compose (the 4 scenarios in tests/test_e2e_v0_21_0_simulations.py). Those are integration tests — they prove the wiring doesn't throw. This commit adds something different: **app-shaped** simulations that match the idiom already used in tests/test_simulation_evals.py. Each test sets up an agent with a realistic system prompt, drives it through a plausible user workflow, and asserts on the behaviour a real app author would care about. App 1 — Documentation Q&A Bot ----------------------------- A support bot for a fictional product called "Skylake" backed by a FAQ CSV. The CSV is loaded via the new DocumentLoader.from_csv, embedded with real OpenAI text-embedding-3-small, indexed in real FAISS, and wrapped in a RAGTool. The bot runs on real OpenAI gpt-4o-mini with a ConversationMemory so it can carry context across turns. Three asserts: - Turn 1: bot answers an in-KB install question by quoting KB facts (curl URL, version string) - Turn 2: same agent instance answers a follow-up port question (8742) — proves memory + tool calling continue to work across turns on a memory-enabled agent - Turn 3: bot refuses an out-of-KB WebSocket question instead of hallucinating a number App 2 — Data Analyst Bot ------------------------ An analytics assistant over a small SQLite sales database. Real Anthropic Claude agent with query_sqlite + execute_python. The user asks a question whose answer requires *chaining*: 1. SQL query to find the top region by total sales 2. Python computation for the average 3. Natural-language explanation Asserts that "EU" and "2000" both appear in the final answer, proving the LLM successfully chained two real tool calls end-to-end. App 3 — Knowledge Base Librarian --------------------------------- The only simulation that exercises ALL FOUR new document loaders in a single workflow: - DocumentLoader.from_csv (product catalog) - DocumentLoader.from_json (release notes) - DocumentLoader.from_html (about page) Real OpenAI embeddings, real Qdrant store, real Gemini gemini-2.5-flash agent with a RAGTool. Three asserts, one per source format, each asking for a deliberately unique anchor phrase (THUNDERCAT-7, MOONWALK, VANTA-NORTH) that exists in exactly one of the loaded files. Proves that every loader's output is actually retrievable through the full embed → store → search → LLM pipeline. Verification ------------ Solo run of tests/test_e2e_v0_21_0_apps.py: 7 passed in 30.41s Full e2e suite including new app sims: 54 collected → 51 passed, 3 skipped (Azure OpenAI x2 + Langfuse x1, no creds), 0 failed, 50.67s total Full non-e2e suite: 4961 passed, 3 skipped, 239 deselected (+7 from the new app file), 0 regressions

@tool

…oarding Cross-reference audit run (via the project /audit and /doc-audit-skill skills with 4 parallel QA sub-agents) found 13 MUST-FIX issues left over after the earlier release-prep commit. This commit fixes all of them. CHANGELOG.md ------------ - Add the missing ### Fixed section documenting bugs 6, 7, 8 (RAGTool @tool() on methods, Gemini + Anthropic content_parts image drop) and the Qdrant query_points() API migration. These landed in commits f4401f2 and b047c1a after the initial doc commit but never made it into the release notes. - Add the missing ### Tests section documenting the 345 new e2e tests, 4 integration simulations, and 7 app-shaped simulations. - Update Stats: 4,960 -> 5,203 tests. README.md --------- - Line 489 and 1111: stale "4960 Tests" -> 5203. - Line 133: restore the historical "4612 tests total" in the v0.19 What's New section (I had over-corrected it to 4960 earlier). - Line 460: "5 LLM Providers" enumeration was missing Azure OpenAI, even though it's claimed in the count. Added. - Line 467: "4 Vector Stores" -> "7 Vector Stores" with FAISS, Qdrant, pgvector added to the list. - Install section: added "pip install selectools[observe]" and "pip install selectools[postgres]" extras and updated the [rag] extras comment to mention FAISS, Qdrant, and beautifulsoup4. CONTRIBUTING.md + docs/CONTRIBUTING.md ------------------------------------- - Main file was stale: v0.20.1 / 4612 tests. Updated to v0.21.0 / 5203. - docs/CONTRIBUTING.md was stale by TWO releases (v0.19.2, 61 examples, 24 tools, 100% coverage, different release script examples). Fixed by re-copying from the updated CONTRIBUTING.md. docs/llms.txt ------------- - Line 3: "4960 tests at 95% coverage" -> "5203 tests at 95% coverage". docs/QUICKSTART.md ------------------ - Added a v0.21.0 callout under Step 5 (RAG) linking to the new FAISS.md, QDRANT.md, and PGVECTOR.md module docs and mentioning the new DocumentLoader.from_csv / from_json / from_html / from_url loaders. Minimal addition — does not rewrite the working example. docs/index.md ------------- - RAG Pipeline feature card: "4 vector store backends" -> "7 vector store backends", listed all 7 explicitly, and mentioned the four new document loaders. landing/index.html ------------------ - All 8 occurrences of "4612" / "4,612" in visible text, schema descriptions, animated counter targets, and FAQ answers -> "5203" / "5,203". Pure text substitution, no visual changes. Verification ------------ - mkdocs build: clean (only the pre-existing Material "Excluding README.md" template warning, unrelated to this release) - Full non-e2e suite: 4961 passed, 3 skipped, 239 deselected, 0 regressions - diff CHANGELOG.md docs/CHANGELOG.md: byte-identical - diff CONTRIBUTING.md docs/CONTRIBUTING.md: byte-identical - grep for any remaining 4612 / 4960 in user-facing docs: clean (only legitimate "up from 4,612" delta reference in the 0.21.0 Stats block remains)

…tores, new extras Second pass on landing/index.html after the earlier stale-count fix (4612 -> 5203 ×8). This pass catches the v0.21.0-specific content staleness that the test-count edit missed. Version strings (3 places) ------------------------- - Schema.org softwareVersion: 0.20.1 -> 0.21.0 - Hero status bar badge: v0.20.1 -> v0.21.0 - Footer comment: v0.20.1 -> v0.21.0 Azure OpenAI added to every provider enumeration (11 places) ------------------------------------------------------------ - <meta name="description"> SEO tag - <meta name="twitter:description"> social preview - Schema.org JSON-LD description field - Schema.org featureList item - FAQ item "Which LLM providers does selectools support?" — re-worded from "5 LLM providers: OpenAI, Anthropic, Gemini, Ollama, and FallbackProvider" to the correct 5 LLMs (OpenAI, Azure OpenAI, Anthropic, Gemini, Ollama) plus FallbackProvider as a wrapper - FAQ item "What's the license?" — added Azure to the token billing list - FAQ intro "What is selectools?" - Rendered FAQ in the HTML (not just the JSON-LD) - bento__desc on the fallback provider card - Five providers FAQ rendered answer - Visible <span class="provider"> tags in the hero "Works with" row — added an Azure OpenAI tag between OpenAI and Anthropic Vector store counts (4 -> 7, 4 places) -------------------------------------- - FAQ "Does it include RAG?" — "4 vector store backends" -> "7 vector store backends (memory, SQLite, Chroma, Pinecone, FAISS, Qdrant, pgvector)" - Same FAQ rendered in the HTML below the JSON-LD - Install FAQ answers updated to mention FAISS + Qdrant - Both RAG FAQ answers now mention the new CSV / JSON / HTML / URL document loaders Install extras (missing [observe] + [postgres]) ----------------------------------------------- - Install FAQ JSON-LD and rendered HTML now document: - pip install selectools[rag] (+ FAISS, Qdrant, beautifulsoup4) - pip install selectools[observe] (+ OpenTelemetry, Langfuse) - pip install selectools[postgres] (for pgvector) Verification ------------ - grep 4612 / 4,612 / 4960 / 4 vector store / 0.20.1 (excluding the one legitimate self-referential JS comment): clean - Count of "Azure OpenAI" occurrences: 0 -> 11 - No visual layout changes — text-only substitutions within existing elements. The hero provider row grows from 4 tags to 5, which is the only structural change and fits the existing flex layout.

…note Two small robustness items surfaced by an "anything even 1% uncertain?" audit pass before shipping v0.21.0. 1. Qdrant 404 detection was string-based --------------------------------------- The ``return []`` path in QdrantVectorStore.search that matches FAISSVectorStore's "search-after-clear returns empty" semantics was using:: if "404" in str(exc) or "not found" in str(exc).lower(): return [] This works against qdrant-client 1.16.1 (which embeds "404 (Not Found)" in UnexpectedResponse's string form), but it's fragile — any qdrant-client release that reformats the error message or wraps the exception would silently break the fallback. Verified on qdrant-client 1.16.1 that ``UnexpectedResponse`` instances carry a ``status_code`` attribute set from the constructor, so we can check that first and fall back to the string match only as a safety net. 2. image_message(url, …) reachability limitation ------------------------------------------------ Testing exposed that when the URL the user passes to image_message is an http/https URL, the provider backend (OpenAI, Anthropic, Gemini) is the one that fetches it — selectools just forwards the URL. Some hosts (e.g. Wikimedia Commons) block bot User-Agents and return 400/403, which surfaces as "Unable to download the file" (Anthropic) or "Cannot fetch content from the provided URL" (Gemini). Not a selectools bug, but worth warning about in the docs so users don't blame the wrapper. Added a ``!!! warning`` admonition to docs/modules/MULTIMODAL.md recommending local-file + base64 for host-independent delivery. Verification ------------ - tests/rag/test_qdrant_store.py (35 mock tests): all pass - tests/rag/test_e2e_qdrant_store.py: skipped (no Qdrant container running right now, but the code path is covered by the test_clear_empties_collection test I verified earlier with a live Qdrant 1.17.1 server) - Full non-e2e suite: 4961 passed, 0 regressions

The Bug 7/8 fix for content_parts in Gemini and Anthropic providers lives in _format_messages, which is shared between sync complete() and async acomplete() / astream(). The existing tier 2 multimodal tests only exercise sync, so a future change to the async-only path could silently regress vision input on agent.arun(). Manually verified that the existing fix already works async (all three providers correctly described a 4x4 red PNG via agent.arun()). This commit adds three regression tests that lock that in: - TestMultimodalRealProvidersAsync.test_openai_async_accepts_image - TestMultimodalRealProvidersAsync.test_anthropic_async_accepts_image - TestMultimodalRealProvidersAsync.test_gemini_async_accepts_image Each test asserts "red" appears in the response (same anchor-based assertion as the sync tests, so they catch silent image-drop failures). Verification: 6 tests passed in 7.88s (3 sync + 3 async, all real LLM calls).

Spec and implementation plan for bringing landing/examples/index.html into the same execution-pointer visual language as the redesigned landing page. Covers six sections: nav dot, terminal-session header, proportional category rail, search row, ls -la card rows, and $ cat card expansion. All implementation work edits scripts/build_examples_gallery.py (the generator) — the HTML at landing/examples/index.html is regenerated from it. Spec: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md Plan: docs/superpowers/plans/2026-04-08-examples-page-overdrive.md

Duplicate the landing page's design tokens, .exec-dot, .exec-caret, .exec-scan, .sr-only, @Keyframes exec-pulse/exec-blink/exec-scan-sweep/ exec-stamp, and prefers-reduced-motion fallbacks into the examples page generator's inline <style>. No visual change yet — these atoms become the foundation for the §1–§6 redesign in subsequent commits. Spec: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md

Adds a permanent cyan execution-pointer dot to the left of the selectools wordmark in the examples page nav. Matches the landing page's wordmark variant 1 — a user clicking between / and /examples/ now sees the same pulse in the same place. Respects prefers-reduced-motion (becomes a static glow). Spec §6: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md

Replaces the bare <h1> + paragraph with a full terminal-window panel that types out 'ls examples/' on page load and live-mirrors the search state into the prompt suffix as ' | grep -i <query>'. Counter format changes from 'N examples' to '# N files match' to match the monospace comment aesthetic. The category --tags suffix wiring lands in Task 4 once the rail exists. Adds typeLine() and syncPrompt() helpers and a bootPrompt() IIFE that respects prefers-reduced-motion. Mobile collapses to '$ ls examples/'. Both helpers write only to .textContent — no HTML rendering paths. Spec §1: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md

Removes the 18-pill .cb chip row and replaces it with a single bar of .ex-rail__seg segments sized proportionally to each category's count. Visually shows the shape of the catalog at a glance. On viewport entry an IntersectionObserver triggers a left-to-right stamp sweep (80ms stagger). Clicking a segment filters the list, re-stamps the segment, and rewrites the terminal prompt's --tags suffix. Mobile becomes a horizontal scroll-snap strip. Respects prefers-reduced-motion (no sweep, no on-click stamp). Spec §2: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md

@tool

…+ fixed Final thorough audit pass after the user asked "is there anything you feel even 1% not confident about?" with explicit instruction to verify AND fix everything. Nine residual concerns were addressed; two surfaced real shipping blockers that isolated testing had not caught. Verified as not a regression (no code change needed): - #12 RAGTool descriptor pickling: function-based @tool() also fails to serialize for the same reason (decorator replaces function in the module namespace). Pickling Tools/Agents has never been supported in selectools — only cache_redis.py uses pickle, and only for (Message, UsageStats) tuples. Documented the limitation in RAGTool's class docstring along with a thread-safety note. Fixes landed: Bug 9 — Langfuse 3.x rewrite (real shipping blocker) ---------------------------------------------------- mypy caught ``"Langfuse" has no attribute "trace"`` in src/selectools/observe/langfuse.py:65. Langfuse 3.x removed the top-level Langfuse.trace() / trace.generation() / trace.span() / trace.update() API and replaced it with start_span() / start_generation() / update_current_trace() / update_current_span(). The existing selectools LangfuseObserver was written for 2.x and would crash at runtime on every call against Langfuse 3.x (which pyproject.toml's langfuse>=2.0.0 constraint does not exclude). The existing mock-based test_langfuse_observer.py never caught it because mocks accept any method call. The e2e test in tests/test_e2e_langfuse_observer.py skipped due to missing LANGFUSE_PUBLIC_KEY env var, so the real code path had never executed. - Rewrote src/selectools/observe/langfuse.py for Langfuse 3.x API: on_run_start now creates a root span via client.start_span(); child generations and spans use root.start_generation() / root.start_span() (which attach to the same trace); usage info moved from usage= to usage_details=, with new cost_details= for dollar cost; every span now calls .end() explicitly since Langfuse 3.x is context-manager oriented; root span finalization uses update_trace() + update() + end(). - Updated 4 affected mock tests in tests/test_langfuse_observer.py to the v3 API (client.start_span, root.start_generation, root.start_span). 19 Langfuse mock tests now pass. #13 image_url e2e regression coverage ------------------------------------- Added TestMultimodalRealProvidersImageUrl in tests/test_e2e_multimodal.py with three new tests (one per provider) that send https://github.githubassets.com/favicons/favicon.png through the ContentPart(type="image_url") path. Verified that OpenAI, Anthropic, and Gemini all return "GitHub" in their reply. GitHub's CDN serves bot User-Agents unlike Wikipedia's CDN, which is documented separately in the MULTIMODAL.md URL-reachability warning. #14 CHANGELOG clarification --------------------------- Added a "Note on the three latent bugs below" block before the Fixed section explaining that bugs 6, 7, 8 (RAGTool @tool() on methods and both multimodal content_parts drops) were pre-existing in earlier releases but never surfaced because no test actually exercised them end-to-end. This pre-empts the reasonable reader question "why didn't earlier users report these?". #15 Pre-existing broken mkdocs anchors -------------------------------------- - QUICKSTART.md: #code-tools-2--v0210 (double dash) was wrong. mkdocs Material slugifies the em-dash in "Code Tools (2) — v0.21.0" to a single hyphen, producing code-tools-2-v0210. Fixed the link. - PARSER.md: both #parsing-strategy and #json-extraction anchors were broken because a stray unbalanced 3-backtick fence at line 124 was greedy-pairing with line 128, shifting every downstream fence pair by one and accidentally wrapping ## Parsing Strategy and ## JSON Extraction inside a code block. Deleting line 124 plus converting one 4-backtick close on line 205 to a 3-backtick close rebalanced all the fences. Both headings now render as real h2 elements and the TOC anchors resolve. mkdocs build: zero broken-anchor warnings. #16 README relative docs/ links ------------------------------- README.md is outside docs/ and must use absolute GitHub URLs per docs/CLAUDE.md. Batch-converted all 37 ](docs/*.md) relative links to ](https://github.com/johnnichev/selectools/blob/main/docs/*.md). #17 Pre-existing mypy errors — all 46 fixed, mypy src/ is now clean ------------------------------------------------------------------ Success: no issues found in 150 source files. - 20 no-any-return errors across 13 files: added # type: ignore[no-any-return] with explanatory context. These were all external-library Any leaks (json.loads, dict.get on Any, psycopg2, ollama client, openai SDK returns, etc.) where the runtime type is correct but the type-stub exposure is Any. - 14 no-untyped-def errors in observer.py SimpleStepObserver graph callbacks (lines 1634-1676): added full type annotations matching the AgentObserver base class signatures (str/int/float/Exception/List[str] per event). Fixed one Liskov substitution violation where my initial annotation used List[str] for new_plan but the base class uses str. - 8 no-untyped-def errors in serve/app.py BaseHTTPRequestHandler methods (do_GET, do_POST, do_OPTIONS, _json_response, _html_response, log_message, handle_stream, _stream): added -> None returns and Any / str parameter types. Imported Iterator and AsyncIterator from typing. - pipeline.py:439 astream: added -> AsyncIterator[Any]. - observe/trace_store.py:349 _iter_entries: added -> Iterator[Dict[str, Any]]. - agent/config.py:215 _unpack nested helper: added (Any, type) -> Any. - trace.py:506: ``dataclasses.asdict`` was rejecting ``DataclassInstance | type[DataclassInstance]`` (too wide). Narrowed with ``not isinstance(obj, type)`` so mypy sees a non-type dataclass. - providers/_openai_compat.py:560: expanded existing # type: ignore from [return-value] to [return-value,no-any-return] to cover the second error code. - serve/_starlette_app.py:105: eval_dashboard was declared to return HTMLResponse but the unauth-redirect branch returns a RedirectResponse. Widened the return type to Response to match the neighbouring handlers (builder, provider_health). #18 Landing page feature content for v0.21.0 --------------------------------------------- Three text-only bento card updates (no layout changes): - RAG card: "4 store backends" → "7 store backends" with the full list enumerated plus CSV/JSON/HTML/URL loaders mentioned. - Toolbox card: added explicit v0.21.0 additions (Python + shell execution, DuckDuckGo search, GitHub REST API, SQLite + Postgres). - Audit card retitled to "Audit + observability" and expanded to mention OTelObserver (GenAI semantic conventions) and LangfuseObserver as the new v0.21.0 shipping surfaces for trace export to Datadog / Jaeger / Langfuse Cloud / any OTLP backend. #19 FAISS variant of App 3 Knowledge Base Librarian --------------------------------------------------- Added TestApp3b_KnowledgeBaseLibrarianFAISS in tests/test_e2e_v0_21_0_apps.py — the same CSV + JSON + HTML librarian persona but backed by FAISSVectorStore instead of Qdrant. Runnable without Docker, and with different anchor phrases (OSPREY-88, CRESCENT, AURORA-SOUTH) so it doesn't shadow the Qdrant variant when both run. Three tests, all passing against real OpenAI embeddings + real OpenAI gpt-4o-mini. #20 RAGTool docstring notes --------------------------- Added a "Notes" block to RAGTool explaining: - Thread safety: the vector store handles its own locking, but mutating top_k / score_threshold / include_scores after attaching to an Agent is not thread-safe. - Cross-process serialization: not supported, same reason function-based @tool() tools aren't supported. Verification ------------ - mypy src/: Success: no issues found in 150 source files - Full non-e2e suite: 4961 passed, 3 skipped, 248 deselected (+9 from new image_url + async multimodal + FAISS librarian tests), 0 regressions - Full e2e suite with Qdrant + Postgres running: 70 collected, 64 passed, 6 skipped (Azure x2 + Langfuse x1 credential-dependent + 3 Qdrant tests when the container isn't running), 0 failures - mkdocs build: zero broken-anchor warnings (QUICKSTART + PARSER both clean now) - diff CHANGELOG.md docs/CHANGELOG.md: byte-identical

Updated the v0.21.0 section header from 🟡 to ✅ in both the timeline summary at the top of the file and the full section header below. Added a "Shipped" paragraph with the final stats (5215 tests, 88 examples, 5 LLM providers, 7 vector stores, 152 models) so readers can see what actually landed vs. the original planning matrix below. Fixed a stale path reference from .private/plans/07-... to the actual location .private/07-... The original planning matrices (new loaders, new vector stores, new toolbox modules) are preserved as-is so the v0.21.0 section remains a useful record of what was planned vs. what shipped — the CHANGELOG is the authoritative "what actually shipped" source.

johnnichev added 17 commits April 8, 2026 02:07

johnnichev merged commit 98c77b9 into main Apr 8, 2026
9 checks passed

johnnichev mentioned this pull request Apr 8, 2026

feat(examples): overdrive redesign for /examples/ (Tasks 4-8) #54

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.21.0 — Connector Expansion + Multimodal + Observability#53

release: v0.21.0 — Connector Expansion + Multimodal + Observability#53
johnnichev merged 17 commits intomainfrom
v0.21.0-connectors

johnnichev commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant