Skip to content

release: v0.21.0 — Connector Expansion + Multimodal + Observability#53

Merged
johnnichev merged 17 commits intomainfrom
v0.21.0-connectors
Apr 8, 2026
Merged

release: v0.21.0 — Connector Expansion + Multimodal + Observability#53
johnnichev merged 17 commits intomainfrom
v0.21.0-connectors

Conversation

@johnnichev
Copy link
Copy Markdown
Owner

Summary

v0.21.0 is the Connector Expansion release: three new vector stores, the Azure OpenAI provider, OpenTelemetry + Langfuse observers, multimodal image support across all four LLM providers, four new document loaders, and nine new toolbox tools.

Vector stores (3 new)

  • FAISSVectorStore — in-process FAISS with save/load persistence, thread-safe
  • QdrantVectorStore — REST + gRPC connector, auto-collection management
  • PgVectorStore — PostgreSQL pgvector extension, JSONB metadata, auto-table creation

Document loaders (4 new)

DocumentLoader.from_csv, from_json, from_html, from_url — stdlib-only by default, optional beautifulsoup4 for CSS selectors.

Toolbox (9 new tools)

  • Code: execute_python, execute_shell (subprocess-isolated, 10KB output cap, shell metacharacter blocklist)
  • Search: web_search (DuckDuckGo, no API key), scrape_url (SSRF guards)
  • GitHub: github_search_repos, github_get_file, github_list_issues (GITHUB_TOKEN optional)
  • Database: query_sqlite (PRAGMA query_only = ON), query_postgres

Multimodal messages

New ContentPart dataclass + image_message(image, prompt) helper. Message.content now accepts list[ContentPart] in addition to str. Works on OpenAI GPT-4o, Azure OpenAI, Anthropic Claude, Gemini 2.5 Flash, and Ollama vision models.

Providers (1 new)

  • AzureOpenAIProvider — extends OpenAIProvider with Azure deployment-name routing, AZURE_OPENAI_* env-var fallback, and AAD token auth.

Observers (2 new)

  • OTelObserver — emits GenAI semantic-convention spans to Jaeger / Tempo / Datadog / Honeycomb / any OTLP backend
  • LangfuseObserver — ships traces + generations + spans to Langfuse Cloud or self-hosted (rewritten for Langfuse 3.x APIstart_span / start_generation / update_current_trace)

Bug fixes (all pre-existing, surfaced by real-call simulations during release prep)

# Bug Impact if shipped
5 QdrantVectorStore.search() called the removed qdrant-client client.search() API Every user with qdrant-client >=1.13 would have hit AttributeError on first query
6 @tool() on class methods fundamentally broken (RAGTool / SemanticSearchTool / HybridSearchTool) RAG broken via the canonical documented API — every user hits TypeError
7 Gemini provider silently dropped content_parts images Every Gemini vision user gets "I cannot see images"
8 Anthropic provider silently dropped content_parts images Every Claude vision user gets "I don't see any image attached"
9 LangfuseObserver written for Langfuse 2.x, broken on Langfuse 3.x Every Langfuse user crashes at runtime

Fixes: _BoundMethodTool descriptor in tools/decorators.py, query_points() migration in rag/stores/qdrant.py, content_parts loop in both provider _format_messages(), full Langfuse 3.x rewrite with updated mock tests.

Tests

5215 tests (+603 since v0.20.1).

  • Unit (4961): all mock-based tests for every v0.21.0 subsystem
  • Per-subsystem e2e (43): real faiss-cpu bindings, real subprocess.run, real sqlite3, real HTTP, real opentelemetry-sdk, real Qdrant Docker container, real pgvector-enabled Postgres, real OpenAI + Anthropic + Gemini vision calls, real DuckDuckGo + GitHub REST API
  • Integration simulations (4): FAISS RAG + OpenAI + OTel; Gemini multimodal + code execution + OTel; Anthropic SQL + code chaining; Qdrant RAG + OpenAI + OTel
  • App-shaped simulations (10): Skylake docs Q&A bot (3 turns, ConversationMemory); sales data analyst bot (SQL → Python chaining); knowledge base librarian (Qdrant + FAISS variants covering CSV + JSON + HTML loaders)
  • Multimodal coverage: sync + async + image_url URL-path regression tests for all 3 cloud providers (9 tests)

Quality gate

  • mypy src/: zero errors across 150 files (cleaned up 46 pre-existing issues in serve/, mcp/, observer.py, trace.py, _starlette_app.py, etc.)
  • black + isort + flake8: clean
  • bandit -r src/: clean
  • mkdocs build: clean (fixed pre-existing broken anchors in QUICKSTART.md and PARSER.md)
  • Full e2e suite with Qdrant + Postgres running: 70 collected, 64 passed, 6 skipped (Azure OpenAI + Langfuse credential-dependent), 0 failed

Documentation

  • 7 new module docs: FAISS.md, QDRANT.md, PGVECTOR.md, MULTIMODAL.md, OTEL.md, AZURE_OPENAI.md, LANGFUSE.md
  • 12 new examples: 77_faiss_vector_store.py through 88_langfuse_observer.py
  • README.md, CHANGELOG.md, CONTRIBUTING.md, docs/index.md, docs/QUICKSTART.md, ROADMAP.md, docs/llms.txt, docs/llms-full.txt, landing/index.html all updated with v0.21.0 stats, features, and Azure OpenAI enumeration
  • 37 relative ](docs/*.md) links in README.md converted to absolute GitHub URLs per project convention

Test plan

  • CI passes on this PR (lint + tests + security)
  • Review the rewritten observe/langfuse.py for the Langfuse 3.x migration
  • Review the _BoundMethodTool descriptor in tools/decorators.py
  • After merge, tag v0.21.0 to trigger the PyPI publish workflow
  • Verify GitHub Pages docs auto-deploy
  • Verify pip install selectools==0.21.0 in a clean venv

…kers, version bump

- Bump version to 0.21.0 in pyproject.toml and __init__.py
- Add OTelObserver/LangfuseObserver lazy exports to observe/__init__.py
- Export AzureOpenAIProvider and observe submodule from package root
- Add ContentPart/image_message/text_content to public __all__
- Apply @beta to all 9 new toolbox tools (code, search, github, db)
- Extend stability.beta()/stable() with Any overload for Tool objects
- Add qdrant-client/faiss-cpu/beautifulsoup4 to [rag] extras
- Add new [observe] extras with opentelemetry-api/langfuse
…py fix

- CHANGELOG: 0.21.0 entry covering all 7 connector subsystems
- README: What's New in v0.21 section, Azure provider row, FAISS/Qdrant/pgvector imports, test count 4960
- 7 new module docs in docs/modules/: FAISS, QDRANT, PGVECTOR, MULTIMODAL, OTEL, AZURE_OPENAI, LANGFUSE
- mkdocs.yml nav: surfaced new pages in Core/Features/Reference sections
- llms.txt + llms-full.txt: 7 new module pointers, version bumped to v0.21.0, page count 32 -> 39
- Fix pre-existing mypy error in azure_openai_provider.py default_model assignment
Every existing v0.21.0 test file mocks its backend: test_faiss_store.py
injects a fake faiss module, test_code_tools.py mocks subprocess.run,
test_qdrant_store.py mocks qdrant_client, etc. That leaves the real wire
format, real C++ bindings, real subprocesses, real HTTP, and real vision
APIs completely unverified — if our assumptions differ from reality we
ship green tests and broken code.

This commit adds 12 new test files marked @pytest.mark.e2e that exercise
real backends:

Tier 1 — no external services (28 tests, all passing):
- tests/rag/test_e2e_faiss_store.py (real faiss-cpu, 5)
- tests/tools/test_e2e_code_tools.py (real subprocess.run, 8)
- tests/tools/test_e2e_db_tools.py (real sqlite3, 6)
- tests/rag/test_e2e_document_loaders.py (real files + example.com, 6)
- tests/test_e2e_otel_observer.py (real opentelemetry-sdk, 3)

Tier 2 — real API calls, credentials via .env (8 tests, all passing):
- tests/test_e2e_multimodal.py (real OpenAI gpt-4o-mini + Anthropic
  claude-haiku-4-5 + Gemini gemini-2.5-flash with an in-memory 4x4 PNG)
- tests/tools/test_e2e_search_tools.py (real DuckDuckGo + scrape)
- tests/tools/test_e2e_github_tools.py (real GitHub REST API)

Tier 3 — skip-if-missing-deps-or-credentials (7 tests, 2 passing + 5 skip):
- tests/rag/test_e2e_qdrant_store.py (skip if Qdrant not reachable)
- tests/rag/test_e2e_pgvector_store.py (passes against local pgvector)
- tests/providers/test_e2e_azure_openai.py (skip if AZURE_* not set)
- tests/test_e2e_langfuse_observer.py (skip if LANGFUSE_* not set)

Result: pytest --run-e2e → 38 passed, 5 skipped, 0 failed.

Also fix three v0.21.0 module docs whose quickstart examples showed the
wrong VectorStore.search() signature: search() takes a query embedding
(List[float]), not a string. Updated FAISS.md, QDRANT.md, PGVECTOR.md
to show the correct embed-first pattern (matches RAG.md).
…ints()

qdrant-client >=1.13 removed QdrantClient.search() in favour of
query_points(). The new API differs in two ways:

1. The kwarg is `query=` instead of `query_vector=`
2. The return value is a `QueryResponse` object whose `.points`
   attribute holds the list of `ScoredPoint`s, not a flat list

The mock-based unit tests in tests/rag/test_qdrant_store.py never
caught this regression because they mocked QdrantClient — the mock
had a `search` attribute that didn't exist on the real client. The
new e2e test in tests/rag/test_e2e_qdrant_store.py exposed the bug
on the first real call against Qdrant 1.17.1.

Also fix a second consistency bug exposed by the e2e test: after
clear() drops the collection, query_points() raises 404 instead of
returning empty results. Caught the 404 in search() and return [] to
match FAISSVectorStore semantics (search-after-clear → []).

Mock unit tests updated to mirror the new API:
- s/client.search/client.query_points/
- Mock return values now wrap a points list in a MagicMock with a
  .points attribute
- Assertions that checked call_kwargs["query_vector"] now check
  call_kwargs["query"]

After fix: 35 mock tests + 2 e2e tests against real Qdrant 1.17.1
all pass. Full e2e suite: 40 passed, 3 skipped (Azure + Langfuse,
no creds). Full non-e2e suite: 4961 passed, 0 regressions.
…ations

Adds four end-to-end integration scenarios in tests/test_e2e_v0_21_0_simulations.py
that wire multiple v0.21.0 features together with real LLM calls:

1. FAISS + real OpenAI embeddings + RAGTool + real OpenAI agent + OTel
2. Multimodal image + execute_python tool + real Gemini agent + OTel
3. query_sqlite + execute_python + real Anthropic Claude agent
4. Qdrant + real OpenAI embeddings + RAGTool + real OpenAI agent + OTel

Running the simulations surfaced three pre-existing shipping blockers that
the entire existing test suite (188 mock-based v0.21.0 tests + 4 "workflow"
tests that never actually call agent.run) had silently hidden:

Bug 6 — @tool() on class methods fundamentally broken
----------------------------------------------------
@tool() applied to a method (def f(self, query: str)) produced a class-level
Tool whose function was the unbound method. When the agent executor called
tool.function(**llm_kwargs) Python raised
TypeError: missing 1 required positional argument: 'self', so the LLM got
back a "Tool Execution Failed" string and gave up.

This broke the canonical RAG pattern documented everywhere in selectools:

    rag_tool = RAGTool(vector_store=store)
    agent = Agent(tools=[rag_tool.search_knowledge_base], provider=...)

RAGTool, SemanticSearchTool, and HybridSearchTool were all affected. The
existing tests/rag/test_rag_workflow.py tests that appeared to exercise
this path only asserted isinstance(agent, Agent) and never actually ran
the agent, so nobody noticed.

Fix: add a _BoundMethodTool descriptor to selectools/tools/decorators.py
that detects method-decorated tools (first param is self) and returns a
per-instance Tool on attribute access. The descriptor wraps the original
function in functools.partial(fn, instance) so the agent executor can
invoke it with only the LLM's kwargs. Class-level access falls through to
a template Tool for introspection (.name, .description, etc.).

Callers that previously worked around the bug by manually passing the
instance as the first argument to .function (test_rag_workflow.py,
test_hybrid_search.py, test_rag_regression_phase3.py) are updated to the
correct API.

Bug 7 — Gemini provider silently drops images from content_parts
---------------------------------------------------------------
GeminiProvider._format_messages only handled the legacy
message.image_base64 attribute. The v0.21.0 image_message() helper creates
a Message with content_parts=[ContentPart(type="image_base64", ...)] and
explicitly sets message.image_base64 = None, so Gemini received only the
text prompt and replied "I cannot see images".

Fix: add a content_parts loop to GeminiProvider that converts each
ContentPart to types.Part(inline_data=...) or file_data=... .

Bug 8 — Anthropic provider has the same bug
-------------------------------------------
Same pattern in AnthropicProvider. Claude replied "I don't see any image
attached". Fix: content_parts loop producing the Anthropic native
{type: image, source: {type: base64, ...}} shape.

OpenAI already had the right handling in providers/_openai_compat.py, so
only Gemini and Anthropic needed the fix.

Also: tighten tests/test_e2e_multimodal.py assertions so the provider can
never silently drop an image again. Previously the tests only asserted
result.content was non-empty, which passed on "I cannot see images" —
a classic false-green. Now each provider must actually say "red" in its
reply to a 4x4 red PNG.

Finally: move the shared otel_exporter fixture into tests/conftest.py so
every e2e file that needs OTel span capture uses the same singleton
TracerProvider. OpenTelemetry only allows one global TracerProvider per
process, and having each file install its own caused later-loaded files
to silently see empty span lists when run in the same suite.

Verification:
- 47 e2e tests collected → 44 passed, 3 skipped (Azure OpenAI x2 and
  Langfuse x1 skip cleanly when no credentials are set)
- Full non-e2e suite: 4961 passed, 3 skipped, 0 regressions
- The 4 full-release simulations in test_e2e_v0_21_0_simulations.py now
  verify every v0.21.0 subsystem works together with real LLM calls
The previous e2e work proved individual v0.21.0 subsystems work in
isolation (tests/test_e2e_*) and that multiple features compose (the
4 scenarios in tests/test_e2e_v0_21_0_simulations.py). Those are
integration tests — they prove the wiring doesn't throw.

This commit adds something different: **app-shaped** simulations that
match the idiom already used in tests/test_simulation_evals.py. Each
test sets up an agent with a realistic system prompt, drives it through
a plausible user workflow, and asserts on the behaviour a real app
author would care about.

App 1 — Documentation Q&A Bot
-----------------------------
A support bot for a fictional product called "Skylake" backed by a
FAQ CSV. The CSV is loaded via the new DocumentLoader.from_csv, embedded
with real OpenAI text-embedding-3-small, indexed in real FAISS, and
wrapped in a RAGTool. The bot runs on real OpenAI gpt-4o-mini with a
ConversationMemory so it can carry context across turns.

Three asserts:
- Turn 1: bot answers an in-KB install question by quoting KB facts
  (curl URL, version string)
- Turn 2: same agent instance answers a follow-up port question
  (8742) — proves memory + tool calling continue to work across turns
  on a memory-enabled agent
- Turn 3: bot refuses an out-of-KB WebSocket question instead of
  hallucinating a number

App 2 — Data Analyst Bot
------------------------
An analytics assistant over a small SQLite sales database. Real
Anthropic Claude agent with query_sqlite + execute_python. The user
asks a question whose answer requires *chaining*:
  1. SQL query to find the top region by total sales
  2. Python computation for the average
  3. Natural-language explanation
Asserts that "EU" and "2000" both appear in the final answer,
proving the LLM successfully chained two real tool calls end-to-end.

App 3 — Knowledge Base Librarian
---------------------------------
The only simulation that exercises ALL FOUR new document loaders in a
single workflow:
  - DocumentLoader.from_csv (product catalog)
  - DocumentLoader.from_json (release notes)
  - DocumentLoader.from_html (about page)
Real OpenAI embeddings, real Qdrant store, real Gemini gemini-2.5-flash
agent with a RAGTool. Three asserts, one per source format, each
asking for a deliberately unique anchor phrase (THUNDERCAT-7, MOONWALK,
VANTA-NORTH) that exists in exactly one of the loaded files. Proves
that every loader's output is actually retrievable through the full
embed → store → search → LLM pipeline.

Verification
------------
Solo run of tests/test_e2e_v0_21_0_apps.py:
  7 passed in 30.41s

Full e2e suite including new app sims:
  54 collected → 51 passed, 3 skipped (Azure OpenAI x2 + Langfuse x1,
  no creds), 0 failed, 50.67s total

Full non-e2e suite:
  4961 passed, 3 skipped, 239 deselected (+7 from the new app file),
  0 regressions
…oarding

Cross-reference audit run (via the project /audit and /doc-audit-skill
skills with 4 parallel QA sub-agents) found 13 MUST-FIX issues left over
after the earlier release-prep commit. This commit fixes all of them.

CHANGELOG.md
------------
- Add the missing ### Fixed section documenting bugs 6, 7, 8 (RAGTool
  @tool() on methods, Gemini + Anthropic content_parts image drop) and
  the Qdrant query_points() API migration. These landed in commits
  f4401f2 and b047c1a after the initial doc commit but never made it
  into the release notes.
- Add the missing ### Tests section documenting the 345 new e2e tests,
  4 integration simulations, and 7 app-shaped simulations.
- Update Stats: 4,960 -> 5,203 tests.

README.md
---------
- Line 489 and 1111: stale "4960 Tests" -> 5203.
- Line 133: restore the historical "4612 tests total" in the v0.19
  What's New section (I had over-corrected it to 4960 earlier).
- Line 460: "5 LLM Providers" enumeration was missing Azure OpenAI,
  even though it's claimed in the count. Added.
- Line 467: "4 Vector Stores" -> "7 Vector Stores" with FAISS, Qdrant,
  pgvector added to the list.
- Install section: added "pip install selectools[observe]" and
  "pip install selectools[postgres]" extras and updated the [rag]
  extras comment to mention FAISS, Qdrant, and beautifulsoup4.

CONTRIBUTING.md + docs/CONTRIBUTING.md
-------------------------------------
- Main file was stale: v0.20.1 / 4612 tests. Updated to v0.21.0 / 5203.
- docs/CONTRIBUTING.md was stale by TWO releases (v0.19.2, 61 examples,
  24 tools, 100% coverage, different release script examples). Fixed
  by re-copying from the updated CONTRIBUTING.md.

docs/llms.txt
-------------
- Line 3: "4960 tests at 95% coverage" -> "5203 tests at 95% coverage".

docs/QUICKSTART.md
------------------
- Added a v0.21.0 callout under Step 5 (RAG) linking to the new
  FAISS.md, QDRANT.md, and PGVECTOR.md module docs and mentioning the
  new DocumentLoader.from_csv / from_json / from_html / from_url
  loaders. Minimal addition — does not rewrite the working example.

docs/index.md
-------------
- RAG Pipeline feature card: "4 vector store backends" -> "7 vector
  store backends", listed all 7 explicitly, and mentioned the four new
  document loaders.

landing/index.html
------------------
- All 8 occurrences of "4612" / "4,612" in visible text, schema
  descriptions, animated counter targets, and FAQ answers -> "5203" /
  "5,203". Pure text substitution, no visual changes.

Verification
------------
- mkdocs build: clean (only the pre-existing Material "Excluding
  README.md" template warning, unrelated to this release)
- Full non-e2e suite: 4961 passed, 3 skipped, 239 deselected, 0 regressions
- diff CHANGELOG.md docs/CHANGELOG.md: byte-identical
- diff CONTRIBUTING.md docs/CONTRIBUTING.md: byte-identical
- grep for any remaining 4612 / 4960 in user-facing docs: clean
  (only legitimate "up from 4,612" delta reference in the 0.21.0 Stats
  block remains)
…tores, new extras

Second pass on landing/index.html after the earlier stale-count fix
(4612 -> 5203 ×8). This pass catches the v0.21.0-specific content
staleness that the test-count edit missed.

Version strings (3 places)
-------------------------
- Schema.org softwareVersion: 0.20.1 -> 0.21.0
- Hero status bar badge: v0.20.1 -> v0.21.0
- Footer comment: v0.20.1 -> v0.21.0

Azure OpenAI added to every provider enumeration (11 places)
------------------------------------------------------------
- <meta name="description"> SEO tag
- <meta name="twitter:description"> social preview
- Schema.org JSON-LD description field
- Schema.org featureList item
- FAQ item "Which LLM providers does selectools support?" — re-worded
  from "5 LLM providers: OpenAI, Anthropic, Gemini, Ollama, and
  FallbackProvider" to the correct 5 LLMs (OpenAI, Azure OpenAI,
  Anthropic, Gemini, Ollama) plus FallbackProvider as a wrapper
- FAQ item "What's the license?" — added Azure to the token billing
  list
- FAQ intro "What is selectools?"
- Rendered FAQ in the HTML (not just the JSON-LD)
- bento__desc on the fallback provider card
- Five providers FAQ rendered answer
- Visible <span class="provider"> tags in the hero "Works with" row —
  added an Azure OpenAI tag between OpenAI and Anthropic

Vector store counts (4 -> 7, 4 places)
--------------------------------------
- FAQ "Does it include RAG?" — "4 vector store backends" ->
  "7 vector store backends (memory, SQLite, Chroma, Pinecone, FAISS,
  Qdrant, pgvector)"
- Same FAQ rendered in the HTML below the JSON-LD
- Install FAQ answers updated to mention FAISS + Qdrant
- Both RAG FAQ answers now mention the new CSV / JSON / HTML / URL
  document loaders

Install extras (missing [observe] + [postgres])
-----------------------------------------------
- Install FAQ JSON-LD and rendered HTML now document:
  - pip install selectools[rag] (+ FAISS, Qdrant, beautifulsoup4)
  - pip install selectools[observe] (+ OpenTelemetry, Langfuse)
  - pip install selectools[postgres] (for pgvector)

Verification
------------
- grep 4612 / 4,612 / 4960 / 4 vector store / 0.20.1 (excluding the
  one legitimate self-referential JS comment): clean
- Count of "Azure OpenAI" occurrences: 0 -> 11
- No visual layout changes — text-only substitutions within existing
  elements. The hero provider row grows from 4 tags to 5, which is
  the only structural change and fits the existing flex layout.
…note

Two small robustness items surfaced by an "anything even 1% uncertain?"
audit pass before shipping v0.21.0.

1. Qdrant 404 detection was string-based
---------------------------------------
The ``return []`` path in QdrantVectorStore.search that matches
FAISSVectorStore's "search-after-clear returns empty" semantics was
using::

    if "404" in str(exc) or "not found" in str(exc).lower():
        return []

This works against qdrant-client 1.16.1 (which embeds "404 (Not Found)"
in UnexpectedResponse's string form), but it's fragile — any qdrant-client
release that reformats the error message or wraps the exception would
silently break the fallback. Verified on qdrant-client 1.16.1 that
``UnexpectedResponse`` instances carry a ``status_code`` attribute set
from the constructor, so we can check that first and fall back to the
string match only as a safety net.

2. image_message(url, …) reachability limitation
------------------------------------------------
Testing exposed that when the URL the user passes to image_message is
an http/https URL, the provider backend (OpenAI, Anthropic, Gemini) is
the one that fetches it — selectools just forwards the URL. Some hosts
(e.g. Wikimedia Commons) block bot User-Agents and return 400/403, which
surfaces as "Unable to download the file" (Anthropic) or "Cannot fetch
content from the provided URL" (Gemini). Not a selectools bug, but worth
warning about in the docs so users don't blame the wrapper. Added a
``!!! warning`` admonition to docs/modules/MULTIMODAL.md recommending
local-file + base64 for host-independent delivery.

Verification
------------
- tests/rag/test_qdrant_store.py (35 mock tests): all pass
- tests/rag/test_e2e_qdrant_store.py: skipped (no Qdrant container
  running right now, but the code path is covered by the
  test_clear_empties_collection test I verified earlier with a live
  Qdrant 1.17.1 server)
- Full non-e2e suite: 4961 passed, 0 regressions
The Bug 7/8 fix for content_parts in Gemini and Anthropic providers
lives in _format_messages, which is shared between sync complete()
and async acomplete() / astream(). The existing tier 2 multimodal
tests only exercise sync, so a future change to the async-only path
could silently regress vision input on agent.arun().

Manually verified that the existing fix already works async (all
three providers correctly described a 4x4 red PNG via agent.arun()).
This commit adds three regression tests that lock that in:

- TestMultimodalRealProvidersAsync.test_openai_async_accepts_image
- TestMultimodalRealProvidersAsync.test_anthropic_async_accepts_image
- TestMultimodalRealProvidersAsync.test_gemini_async_accepts_image

Each test asserts "red" appears in the response (same anchor-based
assertion as the sync tests, so they catch silent image-drop failures).

Verification: 6 tests passed in 7.88s (3 sync + 3 async, all real
LLM calls).
Spec and implementation plan for bringing landing/examples/index.html
into the same execution-pointer visual language as the redesigned
landing page. Covers six sections: nav dot, terminal-session header,
proportional category rail, search row, ls -la card rows, and
$ cat card expansion.

All implementation work edits scripts/build_examples_gallery.py (the
generator) — the HTML at landing/examples/index.html is regenerated
from it.

Spec: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md
Plan: docs/superpowers/plans/2026-04-08-examples-page-overdrive.md
Duplicate the landing page's design tokens, .exec-dot, .exec-caret,
.exec-scan, .sr-only, @Keyframes exec-pulse/exec-blink/exec-scan-sweep/
exec-stamp, and prefers-reduced-motion fallbacks into the examples page
generator's inline <style>. No visual change yet — these atoms become
the foundation for the §1–§6 redesign in subsequent commits.

Spec: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md
Adds a permanent cyan execution-pointer dot to the left of the
selectools wordmark in the examples page nav. Matches the landing
page's wordmark variant 1 — a user clicking between / and /examples/
now sees the same pulse in the same place.

Respects prefers-reduced-motion (becomes a static glow).

Spec §6: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md
Replaces the bare <h1> + paragraph with a full terminal-window panel
that types out 'ls examples/' on page load and live-mirrors the search
state into the prompt suffix as ' | grep -i <query>'.

Counter format changes from 'N examples' to '# N files match' to
match the monospace comment aesthetic.

The category --tags suffix wiring lands in Task 4 once the rail exists.

Adds typeLine() and syncPrompt() helpers and a bootPrompt() IIFE that
respects prefers-reduced-motion. Mobile collapses to '$ ls examples/'.
Both helpers write only to .textContent — no HTML rendering paths.

Spec §1: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md
Removes the 18-pill .cb chip row and replaces it with a single bar
of .ex-rail__seg segments sized proportionally to each category's
count. Visually shows the shape of the catalog at a glance.

On viewport entry an IntersectionObserver triggers a left-to-right
stamp sweep (80ms stagger). Clicking a segment filters the list,
re-stamps the segment, and rewrites the terminal prompt's --tags suffix.

Mobile becomes a horizontal scroll-snap strip. Respects
prefers-reduced-motion (no sweep, no on-click stamp).

Spec §2: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md
…+ fixed

Final thorough audit pass after the user asked "is there anything you
feel even 1% not confident about?" with explicit instruction to verify
AND fix everything. Nine residual concerns were addressed; two surfaced
real shipping blockers that isolated testing had not caught.

Verified as not a regression (no code change needed):
- #12 RAGTool descriptor pickling: function-based @tool() also fails
  to serialize for the same reason (decorator replaces function in the
  module namespace). Pickling Tools/Agents has never been supported in
  selectools — only cache_redis.py uses pickle, and only for
  (Message, UsageStats) tuples. Documented the limitation in RAGTool's
  class docstring along with a thread-safety note.

Fixes landed:

Bug 9 — Langfuse 3.x rewrite (real shipping blocker)
----------------------------------------------------
mypy caught ``"Langfuse" has no attribute "trace"`` in
src/selectools/observe/langfuse.py:65. Langfuse 3.x removed the top-level
Langfuse.trace() / trace.generation() / trace.span() / trace.update()
API and replaced it with start_span() / start_generation() /
update_current_trace() / update_current_span(). The existing
selectools LangfuseObserver was written for 2.x and would crash at
runtime on every call against Langfuse 3.x (which pyproject.toml's
langfuse>=2.0.0 constraint does not exclude). The existing mock-based
test_langfuse_observer.py never caught it because mocks accept any
method call. The e2e test in tests/test_e2e_langfuse_observer.py
skipped due to missing LANGFUSE_PUBLIC_KEY env var, so the real code
path had never executed.

- Rewrote src/selectools/observe/langfuse.py for Langfuse 3.x API:
  on_run_start now creates a root span via client.start_span(); child
  generations and spans use root.start_generation() / root.start_span()
  (which attach to the same trace); usage info moved from usage= to
  usage_details=, with new cost_details= for dollar cost; every span
  now calls .end() explicitly since Langfuse 3.x is context-manager
  oriented; root span finalization uses update_trace() + update() + end().
- Updated 4 affected mock tests in tests/test_langfuse_observer.py to
  the v3 API (client.start_span, root.start_generation, root.start_span).
  19 Langfuse mock tests now pass.

#13 image_url e2e regression coverage
-------------------------------------
Added TestMultimodalRealProvidersImageUrl in
tests/test_e2e_multimodal.py with three new tests (one per provider)
that send https://github.githubassets.com/favicons/favicon.png through
the ContentPart(type="image_url") path. Verified that OpenAI, Anthropic,
and Gemini all return "GitHub" in their reply. GitHub's CDN serves bot
User-Agents unlike Wikipedia's CDN, which is documented separately in
the MULTIMODAL.md URL-reachability warning.

#14 CHANGELOG clarification
---------------------------
Added a "Note on the three latent bugs below" block before the Fixed
section explaining that bugs 6, 7, 8 (RAGTool @tool() on methods and
both multimodal content_parts drops) were pre-existing in earlier
releases but never surfaced because no test actually exercised them
end-to-end. This pre-empts the reasonable reader question "why didn't
earlier users report these?".

#15 Pre-existing broken mkdocs anchors
--------------------------------------
- QUICKSTART.md: #code-tools-2--v0210 (double dash) was wrong. mkdocs
  Material slugifies the em-dash in "Code Tools (2) — v0.21.0" to a
  single hyphen, producing code-tools-2-v0210. Fixed the link.
- PARSER.md: both #parsing-strategy and #json-extraction anchors were
  broken because a stray unbalanced 3-backtick fence at line 124 was
  greedy-pairing with line 128, shifting every downstream fence pair by
  one and accidentally wrapping ## Parsing Strategy and ## JSON
  Extraction inside a code block. Deleting line 124 plus converting one
  4-backtick close on line 205 to a 3-backtick close rebalanced all the
  fences. Both headings now render as real h2 elements and the
  TOC anchors resolve. mkdocs build: zero broken-anchor warnings.

#16 README relative docs/ links
-------------------------------
README.md is outside docs/ and must use absolute GitHub URLs per
docs/CLAUDE.md. Batch-converted all 37 ](docs/*.md) relative links to
](https://github.com/johnnichev/selectools/blob/main/docs/*.md).

#17 Pre-existing mypy errors — all 46 fixed, mypy src/ is now clean
------------------------------------------------------------------
Success: no issues found in 150 source files.

- 20 no-any-return errors across 13 files: added
  # type: ignore[no-any-return] with explanatory context. These were
  all external-library Any leaks (json.loads, dict.get on Any, psycopg2,
  ollama client, openai SDK returns, etc.) where the runtime type is
  correct but the type-stub exposure is Any.
- 14 no-untyped-def errors in observer.py SimpleStepObserver graph
  callbacks (lines 1634-1676): added full type annotations matching the
  AgentObserver base class signatures (str/int/float/Exception/List[str]
  per event). Fixed one Liskov substitution violation where my initial
  annotation used List[str] for new_plan but the base class uses str.
- 8 no-untyped-def errors in serve/app.py BaseHTTPRequestHandler methods
  (do_GET, do_POST, do_OPTIONS, _json_response, _html_response,
  log_message, handle_stream, _stream): added -> None returns and Any /
  str parameter types. Imported Iterator and AsyncIterator from typing.
- pipeline.py:439 astream: added -> AsyncIterator[Any].
- observe/trace_store.py:349 _iter_entries: added -> Iterator[Dict[str, Any]].
- agent/config.py:215 _unpack nested helper: added (Any, type) -> Any.
- trace.py:506: ``dataclasses.asdict`` was rejecting
  ``DataclassInstance | type[DataclassInstance]`` (too wide). Narrowed
  with ``not isinstance(obj, type)`` so mypy sees a non-type dataclass.
- providers/_openai_compat.py:560: expanded existing # type: ignore
  from [return-value] to [return-value,no-any-return] to cover the
  second error code.
- serve/_starlette_app.py:105: eval_dashboard was declared to return
  HTMLResponse but the unauth-redirect branch returns a RedirectResponse.
  Widened the return type to Response to match the neighbouring
  handlers (builder, provider_health).

#18 Landing page feature content for v0.21.0
---------------------------------------------
Three text-only bento card updates (no layout changes):

- RAG card: "4 store backends" → "7 store backends" with the full list
  enumerated plus CSV/JSON/HTML/URL loaders mentioned.
- Toolbox card: added explicit v0.21.0 additions (Python + shell
  execution, DuckDuckGo search, GitHub REST API, SQLite + Postgres).
- Audit card retitled to "Audit + observability" and expanded to
  mention OTelObserver (GenAI semantic conventions) and
  LangfuseObserver as the new v0.21.0 shipping surfaces for trace
  export to Datadog / Jaeger / Langfuse Cloud / any OTLP backend.

#19 FAISS variant of App 3 Knowledge Base Librarian
---------------------------------------------------
Added TestApp3b_KnowledgeBaseLibrarianFAISS in
tests/test_e2e_v0_21_0_apps.py — the same CSV + JSON + HTML librarian
persona but backed by FAISSVectorStore instead of Qdrant. Runnable
without Docker, and with different anchor phrases (OSPREY-88,
CRESCENT, AURORA-SOUTH) so it doesn't shadow the Qdrant variant when
both run. Three tests, all passing against real OpenAI embeddings +
real OpenAI gpt-4o-mini.

#20 RAGTool docstring notes
---------------------------
Added a "Notes" block to RAGTool explaining:
- Thread safety: the vector store handles its own locking, but mutating
  top_k / score_threshold / include_scores after attaching to an Agent
  is not thread-safe.
- Cross-process serialization: not supported, same reason function-based
  @tool() tools aren't supported.

Verification
------------
- mypy src/: Success: no issues found in 150 source files
- Full non-e2e suite: 4961 passed, 3 skipped, 248 deselected (+9 from
  new image_url + async multimodal + FAISS librarian tests), 0 regressions
- Full e2e suite with Qdrant + Postgres running: 70 collected, 64 passed,
  6 skipped (Azure x2 + Langfuse x1 credential-dependent + 3 Qdrant
  tests when the container isn't running), 0 failures
- mkdocs build: zero broken-anchor warnings (QUICKSTART + PARSER both
  clean now)
- diff CHANGELOG.md docs/CHANGELOG.md: byte-identical
Updated the v0.21.0 section header from 🟡 to ✅ in both the timeline
summary at the top of the file and the full section header below.
Added a "Shipped" paragraph with the final stats (5215 tests, 88
examples, 5 LLM providers, 7 vector stores, 152 models) so readers
can see what actually landed vs. the original planning matrix below.

Fixed a stale path reference from .private/plans/07-... to the actual
location .private/07-...

The original planning matrices (new loaders, new vector stores, new
toolbox modules) are preserved as-is so the v0.21.0 section remains a
useful record of what was planned vs. what shipped — the CHANGELOG is
the authoritative "what actually shipped" source.
@johnnichev johnnichev merged commit 98c77b9 into main Apr 8, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant