release: v0.21.0 — Connector Expansion + Multimodal + Observability#53
Merged
johnnichev merged 17 commits intomainfrom Apr 8, 2026
Merged
release: v0.21.0 — Connector Expansion + Multimodal + Observability#53johnnichev merged 17 commits intomainfrom
johnnichev merged 17 commits intomainfrom
Conversation
…kers, version bump - Bump version to 0.21.0 in pyproject.toml and __init__.py - Add OTelObserver/LangfuseObserver lazy exports to observe/__init__.py - Export AzureOpenAIProvider and observe submodule from package root - Add ContentPart/image_message/text_content to public __all__ - Apply @beta to all 9 new toolbox tools (code, search, github, db) - Extend stability.beta()/stable() with Any overload for Tool objects - Add qdrant-client/faiss-cpu/beautifulsoup4 to [rag] extras - Add new [observe] extras with opentelemetry-api/langfuse
…py fix - CHANGELOG: 0.21.0 entry covering all 7 connector subsystems - README: What's New in v0.21 section, Azure provider row, FAISS/Qdrant/pgvector imports, test count 4960 - 7 new module docs in docs/modules/: FAISS, QDRANT, PGVECTOR, MULTIMODAL, OTEL, AZURE_OPENAI, LANGFUSE - mkdocs.yml nav: surfaced new pages in Core/Features/Reference sections - llms.txt + llms-full.txt: 7 new module pointers, version bumped to v0.21.0, page count 32 -> 39 - Fix pre-existing mypy error in azure_openai_provider.py default_model assignment
Every existing v0.21.0 test file mocks its backend: test_faiss_store.py injects a fake faiss module, test_code_tools.py mocks subprocess.run, test_qdrant_store.py mocks qdrant_client, etc. That leaves the real wire format, real C++ bindings, real subprocesses, real HTTP, and real vision APIs completely unverified — if our assumptions differ from reality we ship green tests and broken code. This commit adds 12 new test files marked @pytest.mark.e2e that exercise real backends: Tier 1 — no external services (28 tests, all passing): - tests/rag/test_e2e_faiss_store.py (real faiss-cpu, 5) - tests/tools/test_e2e_code_tools.py (real subprocess.run, 8) - tests/tools/test_e2e_db_tools.py (real sqlite3, 6) - tests/rag/test_e2e_document_loaders.py (real files + example.com, 6) - tests/test_e2e_otel_observer.py (real opentelemetry-sdk, 3) Tier 2 — real API calls, credentials via .env (8 tests, all passing): - tests/test_e2e_multimodal.py (real OpenAI gpt-4o-mini + Anthropic claude-haiku-4-5 + Gemini gemini-2.5-flash with an in-memory 4x4 PNG) - tests/tools/test_e2e_search_tools.py (real DuckDuckGo + scrape) - tests/tools/test_e2e_github_tools.py (real GitHub REST API) Tier 3 — skip-if-missing-deps-or-credentials (7 tests, 2 passing + 5 skip): - tests/rag/test_e2e_qdrant_store.py (skip if Qdrant not reachable) - tests/rag/test_e2e_pgvector_store.py (passes against local pgvector) - tests/providers/test_e2e_azure_openai.py (skip if AZURE_* not set) - tests/test_e2e_langfuse_observer.py (skip if LANGFUSE_* not set) Result: pytest --run-e2e → 38 passed, 5 skipped, 0 failed. Also fix three v0.21.0 module docs whose quickstart examples showed the wrong VectorStore.search() signature: search() takes a query embedding (List[float]), not a string. Updated FAISS.md, QDRANT.md, PGVECTOR.md to show the correct embed-first pattern (matches RAG.md).
…ints() qdrant-client >=1.13 removed QdrantClient.search() in favour of query_points(). The new API differs in two ways: 1. The kwarg is `query=` instead of `query_vector=` 2. The return value is a `QueryResponse` object whose `.points` attribute holds the list of `ScoredPoint`s, not a flat list The mock-based unit tests in tests/rag/test_qdrant_store.py never caught this regression because they mocked QdrantClient — the mock had a `search` attribute that didn't exist on the real client. The new e2e test in tests/rag/test_e2e_qdrant_store.py exposed the bug on the first real call against Qdrant 1.17.1. Also fix a second consistency bug exposed by the e2e test: after clear() drops the collection, query_points() raises 404 instead of returning empty results. Caught the 404 in search() and return [] to match FAISSVectorStore semantics (search-after-clear → []). Mock unit tests updated to mirror the new API: - s/client.search/client.query_points/ - Mock return values now wrap a points list in a MagicMock with a .points attribute - Assertions that checked call_kwargs["query_vector"] now check call_kwargs["query"] After fix: 35 mock tests + 2 e2e tests against real Qdrant 1.17.1 all pass. Full e2e suite: 40 passed, 3 skipped (Azure + Langfuse, no creds). Full non-e2e suite: 4961 passed, 0 regressions.
…ations Adds four end-to-end integration scenarios in tests/test_e2e_v0_21_0_simulations.py that wire multiple v0.21.0 features together with real LLM calls: 1. FAISS + real OpenAI embeddings + RAGTool + real OpenAI agent + OTel 2. Multimodal image + execute_python tool + real Gemini agent + OTel 3. query_sqlite + execute_python + real Anthropic Claude agent 4. Qdrant + real OpenAI embeddings + RAGTool + real OpenAI agent + OTel Running the simulations surfaced three pre-existing shipping blockers that the entire existing test suite (188 mock-based v0.21.0 tests + 4 "workflow" tests that never actually call agent.run) had silently hidden: Bug 6 — @tool() on class methods fundamentally broken ---------------------------------------------------- @tool() applied to a method (def f(self, query: str)) produced a class-level Tool whose function was the unbound method. When the agent executor called tool.function(**llm_kwargs) Python raised TypeError: missing 1 required positional argument: 'self', so the LLM got back a "Tool Execution Failed" string and gave up. This broke the canonical RAG pattern documented everywhere in selectools: rag_tool = RAGTool(vector_store=store) agent = Agent(tools=[rag_tool.search_knowledge_base], provider=...) RAGTool, SemanticSearchTool, and HybridSearchTool were all affected. The existing tests/rag/test_rag_workflow.py tests that appeared to exercise this path only asserted isinstance(agent, Agent) and never actually ran the agent, so nobody noticed. Fix: add a _BoundMethodTool descriptor to selectools/tools/decorators.py that detects method-decorated tools (first param is self) and returns a per-instance Tool on attribute access. The descriptor wraps the original function in functools.partial(fn, instance) so the agent executor can invoke it with only the LLM's kwargs. Class-level access falls through to a template Tool for introspection (.name, .description, etc.). Callers that previously worked around the bug by manually passing the instance as the first argument to .function (test_rag_workflow.py, test_hybrid_search.py, test_rag_regression_phase3.py) are updated to the correct API. Bug 7 — Gemini provider silently drops images from content_parts --------------------------------------------------------------- GeminiProvider._format_messages only handled the legacy message.image_base64 attribute. The v0.21.0 image_message() helper creates a Message with content_parts=[ContentPart(type="image_base64", ...)] and explicitly sets message.image_base64 = None, so Gemini received only the text prompt and replied "I cannot see images". Fix: add a content_parts loop to GeminiProvider that converts each ContentPart to types.Part(inline_data=...) or file_data=... . Bug 8 — Anthropic provider has the same bug ------------------------------------------- Same pattern in AnthropicProvider. Claude replied "I don't see any image attached". Fix: content_parts loop producing the Anthropic native {type: image, source: {type: base64, ...}} shape. OpenAI already had the right handling in providers/_openai_compat.py, so only Gemini and Anthropic needed the fix. Also: tighten tests/test_e2e_multimodal.py assertions so the provider can never silently drop an image again. Previously the tests only asserted result.content was non-empty, which passed on "I cannot see images" — a classic false-green. Now each provider must actually say "red" in its reply to a 4x4 red PNG. Finally: move the shared otel_exporter fixture into tests/conftest.py so every e2e file that needs OTel span capture uses the same singleton TracerProvider. OpenTelemetry only allows one global TracerProvider per process, and having each file install its own caused later-loaded files to silently see empty span lists when run in the same suite. Verification: - 47 e2e tests collected → 44 passed, 3 skipped (Azure OpenAI x2 and Langfuse x1 skip cleanly when no credentials are set) - Full non-e2e suite: 4961 passed, 3 skipped, 0 regressions - The 4 full-release simulations in test_e2e_v0_21_0_simulations.py now verify every v0.21.0 subsystem works together with real LLM calls
The previous e2e work proved individual v0.21.0 subsystems work in isolation (tests/test_e2e_*) and that multiple features compose (the 4 scenarios in tests/test_e2e_v0_21_0_simulations.py). Those are integration tests — they prove the wiring doesn't throw. This commit adds something different: **app-shaped** simulations that match the idiom already used in tests/test_simulation_evals.py. Each test sets up an agent with a realistic system prompt, drives it through a plausible user workflow, and asserts on the behaviour a real app author would care about. App 1 — Documentation Q&A Bot ----------------------------- A support bot for a fictional product called "Skylake" backed by a FAQ CSV. The CSV is loaded via the new DocumentLoader.from_csv, embedded with real OpenAI text-embedding-3-small, indexed in real FAISS, and wrapped in a RAGTool. The bot runs on real OpenAI gpt-4o-mini with a ConversationMemory so it can carry context across turns. Three asserts: - Turn 1: bot answers an in-KB install question by quoting KB facts (curl URL, version string) - Turn 2: same agent instance answers a follow-up port question (8742) — proves memory + tool calling continue to work across turns on a memory-enabled agent - Turn 3: bot refuses an out-of-KB WebSocket question instead of hallucinating a number App 2 — Data Analyst Bot ------------------------ An analytics assistant over a small SQLite sales database. Real Anthropic Claude agent with query_sqlite + execute_python. The user asks a question whose answer requires *chaining*: 1. SQL query to find the top region by total sales 2. Python computation for the average 3. Natural-language explanation Asserts that "EU" and "2000" both appear in the final answer, proving the LLM successfully chained two real tool calls end-to-end. App 3 — Knowledge Base Librarian --------------------------------- The only simulation that exercises ALL FOUR new document loaders in a single workflow: - DocumentLoader.from_csv (product catalog) - DocumentLoader.from_json (release notes) - DocumentLoader.from_html (about page) Real OpenAI embeddings, real Qdrant store, real Gemini gemini-2.5-flash agent with a RAGTool. Three asserts, one per source format, each asking for a deliberately unique anchor phrase (THUNDERCAT-7, MOONWALK, VANTA-NORTH) that exists in exactly one of the loaded files. Proves that every loader's output is actually retrievable through the full embed → store → search → LLM pipeline. Verification ------------ Solo run of tests/test_e2e_v0_21_0_apps.py: 7 passed in 30.41s Full e2e suite including new app sims: 54 collected → 51 passed, 3 skipped (Azure OpenAI x2 + Langfuse x1, no creds), 0 failed, 50.67s total Full non-e2e suite: 4961 passed, 3 skipped, 239 deselected (+7 from the new app file), 0 regressions
…oarding Cross-reference audit run (via the project /audit and /doc-audit-skill skills with 4 parallel QA sub-agents) found 13 MUST-FIX issues left over after the earlier release-prep commit. This commit fixes all of them. CHANGELOG.md ------------ - Add the missing ### Fixed section documenting bugs 6, 7, 8 (RAGTool @tool() on methods, Gemini + Anthropic content_parts image drop) and the Qdrant query_points() API migration. These landed in commits f4401f2 and b047c1a after the initial doc commit but never made it into the release notes. - Add the missing ### Tests section documenting the 345 new e2e tests, 4 integration simulations, and 7 app-shaped simulations. - Update Stats: 4,960 -> 5,203 tests. README.md --------- - Line 489 and 1111: stale "4960 Tests" -> 5203. - Line 133: restore the historical "4612 tests total" in the v0.19 What's New section (I had over-corrected it to 4960 earlier). - Line 460: "5 LLM Providers" enumeration was missing Azure OpenAI, even though it's claimed in the count. Added. - Line 467: "4 Vector Stores" -> "7 Vector Stores" with FAISS, Qdrant, pgvector added to the list. - Install section: added "pip install selectools[observe]" and "pip install selectools[postgres]" extras and updated the [rag] extras comment to mention FAISS, Qdrant, and beautifulsoup4. CONTRIBUTING.md + docs/CONTRIBUTING.md ------------------------------------- - Main file was stale: v0.20.1 / 4612 tests. Updated to v0.21.0 / 5203. - docs/CONTRIBUTING.md was stale by TWO releases (v0.19.2, 61 examples, 24 tools, 100% coverage, different release script examples). Fixed by re-copying from the updated CONTRIBUTING.md. docs/llms.txt ------------- - Line 3: "4960 tests at 95% coverage" -> "5203 tests at 95% coverage". docs/QUICKSTART.md ------------------ - Added a v0.21.0 callout under Step 5 (RAG) linking to the new FAISS.md, QDRANT.md, and PGVECTOR.md module docs and mentioning the new DocumentLoader.from_csv / from_json / from_html / from_url loaders. Minimal addition — does not rewrite the working example. docs/index.md ------------- - RAG Pipeline feature card: "4 vector store backends" -> "7 vector store backends", listed all 7 explicitly, and mentioned the four new document loaders. landing/index.html ------------------ - All 8 occurrences of "4612" / "4,612" in visible text, schema descriptions, animated counter targets, and FAQ answers -> "5203" / "5,203". Pure text substitution, no visual changes. Verification ------------ - mkdocs build: clean (only the pre-existing Material "Excluding README.md" template warning, unrelated to this release) - Full non-e2e suite: 4961 passed, 3 skipped, 239 deselected, 0 regressions - diff CHANGELOG.md docs/CHANGELOG.md: byte-identical - diff CONTRIBUTING.md docs/CONTRIBUTING.md: byte-identical - grep for any remaining 4612 / 4960 in user-facing docs: clean (only legitimate "up from 4,612" delta reference in the 0.21.0 Stats block remains)
…tores, new extras Second pass on landing/index.html after the earlier stale-count fix (4612 -> 5203 ×8). This pass catches the v0.21.0-specific content staleness that the test-count edit missed. Version strings (3 places) ------------------------- - Schema.org softwareVersion: 0.20.1 -> 0.21.0 - Hero status bar badge: v0.20.1 -> v0.21.0 - Footer comment: v0.20.1 -> v0.21.0 Azure OpenAI added to every provider enumeration (11 places) ------------------------------------------------------------ - <meta name="description"> SEO tag - <meta name="twitter:description"> social preview - Schema.org JSON-LD description field - Schema.org featureList item - FAQ item "Which LLM providers does selectools support?" — re-worded from "5 LLM providers: OpenAI, Anthropic, Gemini, Ollama, and FallbackProvider" to the correct 5 LLMs (OpenAI, Azure OpenAI, Anthropic, Gemini, Ollama) plus FallbackProvider as a wrapper - FAQ item "What's the license?" — added Azure to the token billing list - FAQ intro "What is selectools?" - Rendered FAQ in the HTML (not just the JSON-LD) - bento__desc on the fallback provider card - Five providers FAQ rendered answer - Visible <span class="provider"> tags in the hero "Works with" row — added an Azure OpenAI tag between OpenAI and Anthropic Vector store counts (4 -> 7, 4 places) -------------------------------------- - FAQ "Does it include RAG?" — "4 vector store backends" -> "7 vector store backends (memory, SQLite, Chroma, Pinecone, FAISS, Qdrant, pgvector)" - Same FAQ rendered in the HTML below the JSON-LD - Install FAQ answers updated to mention FAISS + Qdrant - Both RAG FAQ answers now mention the new CSV / JSON / HTML / URL document loaders Install extras (missing [observe] + [postgres]) ----------------------------------------------- - Install FAQ JSON-LD and rendered HTML now document: - pip install selectools[rag] (+ FAISS, Qdrant, beautifulsoup4) - pip install selectools[observe] (+ OpenTelemetry, Langfuse) - pip install selectools[postgres] (for pgvector) Verification ------------ - grep 4612 / 4,612 / 4960 / 4 vector store / 0.20.1 (excluding the one legitimate self-referential JS comment): clean - Count of "Azure OpenAI" occurrences: 0 -> 11 - No visual layout changes — text-only substitutions within existing elements. The hero provider row grows from 4 tags to 5, which is the only structural change and fits the existing flex layout.
…note
Two small robustness items surfaced by an "anything even 1% uncertain?"
audit pass before shipping v0.21.0.
1. Qdrant 404 detection was string-based
---------------------------------------
The ``return []`` path in QdrantVectorStore.search that matches
FAISSVectorStore's "search-after-clear returns empty" semantics was
using::
if "404" in str(exc) or "not found" in str(exc).lower():
return []
This works against qdrant-client 1.16.1 (which embeds "404 (Not Found)"
in UnexpectedResponse's string form), but it's fragile — any qdrant-client
release that reformats the error message or wraps the exception would
silently break the fallback. Verified on qdrant-client 1.16.1 that
``UnexpectedResponse`` instances carry a ``status_code`` attribute set
from the constructor, so we can check that first and fall back to the
string match only as a safety net.
2. image_message(url, …) reachability limitation
------------------------------------------------
Testing exposed that when the URL the user passes to image_message is
an http/https URL, the provider backend (OpenAI, Anthropic, Gemini) is
the one that fetches it — selectools just forwards the URL. Some hosts
(e.g. Wikimedia Commons) block bot User-Agents and return 400/403, which
surfaces as "Unable to download the file" (Anthropic) or "Cannot fetch
content from the provided URL" (Gemini). Not a selectools bug, but worth
warning about in the docs so users don't blame the wrapper. Added a
``!!! warning`` admonition to docs/modules/MULTIMODAL.md recommending
local-file + base64 for host-independent delivery.
Verification
------------
- tests/rag/test_qdrant_store.py (35 mock tests): all pass
- tests/rag/test_e2e_qdrant_store.py: skipped (no Qdrant container
running right now, but the code path is covered by the
test_clear_empties_collection test I verified earlier with a live
Qdrant 1.17.1 server)
- Full non-e2e suite: 4961 passed, 0 regressions
The Bug 7/8 fix for content_parts in Gemini and Anthropic providers lives in _format_messages, which is shared between sync complete() and async acomplete() / astream(). The existing tier 2 multimodal tests only exercise sync, so a future change to the async-only path could silently regress vision input on agent.arun(). Manually verified that the existing fix already works async (all three providers correctly described a 4x4 red PNG via agent.arun()). This commit adds three regression tests that lock that in: - TestMultimodalRealProvidersAsync.test_openai_async_accepts_image - TestMultimodalRealProvidersAsync.test_anthropic_async_accepts_image - TestMultimodalRealProvidersAsync.test_gemini_async_accepts_image Each test asserts "red" appears in the response (same anchor-based assertion as the sync tests, so they catch silent image-drop failures). Verification: 6 tests passed in 7.88s (3 sync + 3 async, all real LLM calls).
Spec and implementation plan for bringing landing/examples/index.html into the same execution-pointer visual language as the redesigned landing page. Covers six sections: nav dot, terminal-session header, proportional category rail, search row, ls -la card rows, and $ cat card expansion. All implementation work edits scripts/build_examples_gallery.py (the generator) — the HTML at landing/examples/index.html is regenerated from it. Spec: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md Plan: docs/superpowers/plans/2026-04-08-examples-page-overdrive.md
Duplicate the landing page's design tokens, .exec-dot, .exec-caret, .exec-scan, .sr-only, @Keyframes exec-pulse/exec-blink/exec-scan-sweep/ exec-stamp, and prefers-reduced-motion fallbacks into the examples page generator's inline <style>. No visual change yet — these atoms become the foundation for the §1–§6 redesign in subsequent commits. Spec: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md
Adds a permanent cyan execution-pointer dot to the left of the selectools wordmark in the examples page nav. Matches the landing page's wordmark variant 1 — a user clicking between / and /examples/ now sees the same pulse in the same place. Respects prefers-reduced-motion (becomes a static glow). Spec §6: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md
Replaces the bare <h1> + paragraph with a full terminal-window panel that types out 'ls examples/' on page load and live-mirrors the search state into the prompt suffix as ' | grep -i <query>'. Counter format changes from 'N examples' to '# N files match' to match the monospace comment aesthetic. The category --tags suffix wiring lands in Task 4 once the rail exists. Adds typeLine() and syncPrompt() helpers and a bootPrompt() IIFE that respects prefers-reduced-motion. Mobile collapses to '$ ls examples/'. Both helpers write only to .textContent — no HTML rendering paths. Spec §1: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md
Removes the 18-pill .cb chip row and replaces it with a single bar of .ex-rail__seg segments sized proportionally to each category's count. Visually shows the shape of the catalog at a glance. On viewport entry an IntersectionObserver triggers a left-to-right stamp sweep (80ms stagger). Clicking a segment filters the list, re-stamps the segment, and rewrites the terminal prompt's --tags suffix. Mobile becomes a horizontal scroll-snap strip. Respects prefers-reduced-motion (no sweep, no on-click stamp). Spec §2: docs/superpowers/specs/2026-04-08-examples-page-overdrive-design.md
…+ fixed Final thorough audit pass after the user asked "is there anything you feel even 1% not confident about?" with explicit instruction to verify AND fix everything. Nine residual concerns were addressed; two surfaced real shipping blockers that isolated testing had not caught. Verified as not a regression (no code change needed): - #12 RAGTool descriptor pickling: function-based @tool() also fails to serialize for the same reason (decorator replaces function in the module namespace). Pickling Tools/Agents has never been supported in selectools — only cache_redis.py uses pickle, and only for (Message, UsageStats) tuples. Documented the limitation in RAGTool's class docstring along with a thread-safety note. Fixes landed: Bug 9 — Langfuse 3.x rewrite (real shipping blocker) ---------------------------------------------------- mypy caught ``"Langfuse" has no attribute "trace"`` in src/selectools/observe/langfuse.py:65. Langfuse 3.x removed the top-level Langfuse.trace() / trace.generation() / trace.span() / trace.update() API and replaced it with start_span() / start_generation() / update_current_trace() / update_current_span(). The existing selectools LangfuseObserver was written for 2.x and would crash at runtime on every call against Langfuse 3.x (which pyproject.toml's langfuse>=2.0.0 constraint does not exclude). The existing mock-based test_langfuse_observer.py never caught it because mocks accept any method call. The e2e test in tests/test_e2e_langfuse_observer.py skipped due to missing LANGFUSE_PUBLIC_KEY env var, so the real code path had never executed. - Rewrote src/selectools/observe/langfuse.py for Langfuse 3.x API: on_run_start now creates a root span via client.start_span(); child generations and spans use root.start_generation() / root.start_span() (which attach to the same trace); usage info moved from usage= to usage_details=, with new cost_details= for dollar cost; every span now calls .end() explicitly since Langfuse 3.x is context-manager oriented; root span finalization uses update_trace() + update() + end(). - Updated 4 affected mock tests in tests/test_langfuse_observer.py to the v3 API (client.start_span, root.start_generation, root.start_span). 19 Langfuse mock tests now pass. #13 image_url e2e regression coverage ------------------------------------- Added TestMultimodalRealProvidersImageUrl in tests/test_e2e_multimodal.py with three new tests (one per provider) that send https://github.githubassets.com/favicons/favicon.png through the ContentPart(type="image_url") path. Verified that OpenAI, Anthropic, and Gemini all return "GitHub" in their reply. GitHub's CDN serves bot User-Agents unlike Wikipedia's CDN, which is documented separately in the MULTIMODAL.md URL-reachability warning. #14 CHANGELOG clarification --------------------------- Added a "Note on the three latent bugs below" block before the Fixed section explaining that bugs 6, 7, 8 (RAGTool @tool() on methods and both multimodal content_parts drops) were pre-existing in earlier releases but never surfaced because no test actually exercised them end-to-end. This pre-empts the reasonable reader question "why didn't earlier users report these?". #15 Pre-existing broken mkdocs anchors -------------------------------------- - QUICKSTART.md: #code-tools-2--v0210 (double dash) was wrong. mkdocs Material slugifies the em-dash in "Code Tools (2) — v0.21.0" to a single hyphen, producing code-tools-2-v0210. Fixed the link. - PARSER.md: both #parsing-strategy and #json-extraction anchors were broken because a stray unbalanced 3-backtick fence at line 124 was greedy-pairing with line 128, shifting every downstream fence pair by one and accidentally wrapping ## Parsing Strategy and ## JSON Extraction inside a code block. Deleting line 124 plus converting one 4-backtick close on line 205 to a 3-backtick close rebalanced all the fences. Both headings now render as real h2 elements and the TOC anchors resolve. mkdocs build: zero broken-anchor warnings. #16 README relative docs/ links ------------------------------- README.md is outside docs/ and must use absolute GitHub URLs per docs/CLAUDE.md. Batch-converted all 37 ](docs/*.md) relative links to ](https://github.com/johnnichev/selectools/blob/main/docs/*.md). #17 Pre-existing mypy errors — all 46 fixed, mypy src/ is now clean ------------------------------------------------------------------ Success: no issues found in 150 source files. - 20 no-any-return errors across 13 files: added # type: ignore[no-any-return] with explanatory context. These were all external-library Any leaks (json.loads, dict.get on Any, psycopg2, ollama client, openai SDK returns, etc.) where the runtime type is correct but the type-stub exposure is Any. - 14 no-untyped-def errors in observer.py SimpleStepObserver graph callbacks (lines 1634-1676): added full type annotations matching the AgentObserver base class signatures (str/int/float/Exception/List[str] per event). Fixed one Liskov substitution violation where my initial annotation used List[str] for new_plan but the base class uses str. - 8 no-untyped-def errors in serve/app.py BaseHTTPRequestHandler methods (do_GET, do_POST, do_OPTIONS, _json_response, _html_response, log_message, handle_stream, _stream): added -> None returns and Any / str parameter types. Imported Iterator and AsyncIterator from typing. - pipeline.py:439 astream: added -> AsyncIterator[Any]. - observe/trace_store.py:349 _iter_entries: added -> Iterator[Dict[str, Any]]. - agent/config.py:215 _unpack nested helper: added (Any, type) -> Any. - trace.py:506: ``dataclasses.asdict`` was rejecting ``DataclassInstance | type[DataclassInstance]`` (too wide). Narrowed with ``not isinstance(obj, type)`` so mypy sees a non-type dataclass. - providers/_openai_compat.py:560: expanded existing # type: ignore from [return-value] to [return-value,no-any-return] to cover the second error code. - serve/_starlette_app.py:105: eval_dashboard was declared to return HTMLResponse but the unauth-redirect branch returns a RedirectResponse. Widened the return type to Response to match the neighbouring handlers (builder, provider_health). #18 Landing page feature content for v0.21.0 --------------------------------------------- Three text-only bento card updates (no layout changes): - RAG card: "4 store backends" → "7 store backends" with the full list enumerated plus CSV/JSON/HTML/URL loaders mentioned. - Toolbox card: added explicit v0.21.0 additions (Python + shell execution, DuckDuckGo search, GitHub REST API, SQLite + Postgres). - Audit card retitled to "Audit + observability" and expanded to mention OTelObserver (GenAI semantic conventions) and LangfuseObserver as the new v0.21.0 shipping surfaces for trace export to Datadog / Jaeger / Langfuse Cloud / any OTLP backend. #19 FAISS variant of App 3 Knowledge Base Librarian --------------------------------------------------- Added TestApp3b_KnowledgeBaseLibrarianFAISS in tests/test_e2e_v0_21_0_apps.py — the same CSV + JSON + HTML librarian persona but backed by FAISSVectorStore instead of Qdrant. Runnable without Docker, and with different anchor phrases (OSPREY-88, CRESCENT, AURORA-SOUTH) so it doesn't shadow the Qdrant variant when both run. Three tests, all passing against real OpenAI embeddings + real OpenAI gpt-4o-mini. #20 RAGTool docstring notes --------------------------- Added a "Notes" block to RAGTool explaining: - Thread safety: the vector store handles its own locking, but mutating top_k / score_threshold / include_scores after attaching to an Agent is not thread-safe. - Cross-process serialization: not supported, same reason function-based @tool() tools aren't supported. Verification ------------ - mypy src/: Success: no issues found in 150 source files - Full non-e2e suite: 4961 passed, 3 skipped, 248 deselected (+9 from new image_url + async multimodal + FAISS librarian tests), 0 regressions - Full e2e suite with Qdrant + Postgres running: 70 collected, 64 passed, 6 skipped (Azure x2 + Langfuse x1 credential-dependent + 3 Qdrant tests when the container isn't running), 0 failures - mkdocs build: zero broken-anchor warnings (QUICKSTART + PARSER both clean now) - diff CHANGELOG.md docs/CHANGELOG.md: byte-identical
Updated the v0.21.0 section header from 🟡 to ✅ in both the timeline summary at the top of the file and the full section header below. Added a "Shipped" paragraph with the final stats (5215 tests, 88 examples, 5 LLM providers, 7 vector stores, 152 models) so readers can see what actually landed vs. the original planning matrix below. Fixed a stale path reference from .private/plans/07-... to the actual location .private/07-... The original planning matrices (new loaders, new vector stores, new toolbox modules) are preserved as-is so the v0.21.0 section remains a useful record of what was planned vs. what shipped — the CHANGELOG is the authoritative "what actually shipped" source.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v0.21.0 is the Connector Expansion release: three new vector stores, the Azure OpenAI provider, OpenTelemetry + Langfuse observers, multimodal image support across all four LLM providers, four new document loaders, and nine new toolbox tools.
Vector stores (3 new)
FAISSVectorStore— in-process FAISS with save/load persistence, thread-safeQdrantVectorStore— REST + gRPC connector, auto-collection managementPgVectorStore— PostgreSQL pgvector extension, JSONB metadata, auto-table creationDocument loaders (4 new)
DocumentLoader.from_csv,from_json,from_html,from_url— stdlib-only by default, optionalbeautifulsoup4for CSS selectors.Toolbox (9 new tools)
execute_python,execute_shell(subprocess-isolated, 10KB output cap, shell metacharacter blocklist)web_search(DuckDuckGo, no API key),scrape_url(SSRF guards)github_search_repos,github_get_file,github_list_issues(GITHUB_TOKENoptional)query_sqlite(PRAGMA query_only = ON),query_postgresMultimodal messages
New
ContentPartdataclass +image_message(image, prompt)helper.Message.contentnow acceptslist[ContentPart]in addition tostr. Works on OpenAI GPT-4o, Azure OpenAI, Anthropic Claude, Gemini 2.5 Flash, and Ollama vision models.Providers (1 new)
AzureOpenAIProvider— extendsOpenAIProviderwith Azure deployment-name routing,AZURE_OPENAI_*env-var fallback, and AAD token auth.Observers (2 new)
OTelObserver— emits GenAI semantic-convention spans to Jaeger / Tempo / Datadog / Honeycomb / any OTLP backendLangfuseObserver— ships traces + generations + spans to Langfuse Cloud or self-hosted (rewritten for Langfuse 3.x API —start_span/start_generation/update_current_trace)Bug fixes (all pre-existing, surfaced by real-call simulations during release prep)
QdrantVectorStore.search()called the removedqdrant-clientclient.search()APIqdrant-client >=1.13would have hitAttributeErroron first query@tool()on class methods fundamentally broken (RAGTool/SemanticSearchTool/HybridSearchTool)TypeErrorcontent_partsimagescontent_partsimagesLangfuseObserverwritten for Langfuse 2.x, broken on Langfuse 3.xFixes:
_BoundMethodTooldescriptor intools/decorators.py,query_points()migration inrag/stores/qdrant.py,content_partsloop in both provider_format_messages(), full Langfuse 3.x rewrite with updated mock tests.Tests
5215 tests (+603 since v0.20.1).
faiss-cpubindings, realsubprocess.run, realsqlite3, real HTTP, realopentelemetry-sdk, real Qdrant Docker container, real pgvector-enabled Postgres, real OpenAI + Anthropic + Gemini vision calls, real DuckDuckGo + GitHub REST APIConversationMemory); sales data analyst bot (SQL → Python chaining); knowledge base librarian (Qdrant + FAISS variants covering CSV + JSON + HTML loaders)image_urlURL-path regression tests for all 3 cloud providers (9 tests)Quality gate
mypy src/: zero errors across 150 files (cleaned up 46 pre-existing issues inserve/,mcp/,observer.py,trace.py,_starlette_app.py, etc.)black+isort+flake8: cleanbandit -r src/: cleanmkdocs build: clean (fixed pre-existing broken anchors inQUICKSTART.mdandPARSER.md)Documentation
FAISS.md,QDRANT.md,PGVECTOR.md,MULTIMODAL.md,OTEL.md,AZURE_OPENAI.md,LANGFUSE.md77_faiss_vector_store.pythrough88_langfuse_observer.pyREADME.md,CHANGELOG.md,CONTRIBUTING.md,docs/index.md,docs/QUICKSTART.md,ROADMAP.md,docs/llms.txt,docs/llms-full.txt,landing/index.htmlall updated with v0.21.0 stats, features, and Azure OpenAI enumeration](docs/*.md)links inREADME.mdconverted to absolute GitHub URLs per project conventionTest plan
observe/langfuse.pyfor the Langfuse 3.x migration_BoundMethodTooldescriptor intools/decorators.pyv0.21.0to trigger the PyPI publish workflowpip install selectools==0.21.0in a clean venv