parallel-web · NormallyGaussian · Apr 28, 2026 · Apr 27, 2026 · Apr 27, 2026 · Apr 27, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,56 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.3.0] - 2026-04-27
+
+This release migrates Search and Extract to Parallel's v1 GA endpoints, surfaces citations + structured output on the chat model, and bumps the SDK to `0.5.1`.
+
+### Added
+
+- **Canonical naming**: new aliases `ChatParallel` and `ParallelSearchTool` are the recommended names going forward; the previous `ChatParallelWeb` and `ParallelWebSearchTool` continue to work indefinitely as aliases (same class objects).
+- **Search/Extract GA endpoints**: `ParallelSearchTool` and `ParallelExtractTool` now call `client.search` / `client.extract` (the `/v1` GA paths). New parameters surfaced from the GA contract on both tools: `max_chars_total`, `client_model`, `session_id`, `location` (Search). The `advanced_settings` envelope is built automatically from the existing flat fields.
+- **`ChatParallel.with_structured_output()`**: returns a `Runnable` producing a typed object (pydantic model or dict) via Parallel's `response_format` JSON-schema on the research models (`lite`, `base`, `core`). `method="json_schema"` (default), `method="json_mode"`, and `method="function_calling"` (routed to `json_schema` for cross-provider compatibility) are supported. Raises a clear `ValueError` on `model="speed"` since that model silently ignores structured-output requests. `include_raw=True` returns `{"raw", "parsed", "parsing_error"}` and properly captures parser failures.
+- **Citations on chat responses**: for the research models, `AIMessage.response_metadata["basis"]` carries the API's per-field citations / reasoning / confidence list. `response_metadata["interaction_id"]` is surfaced for multi-turn context chaining; `system_fingerprint` is forwarded when present.
+- **`SourcePolicy` pydantic model** in `langchain_parallel._types` mirroring the API's `include_domains` / `exclude_domains` / `after_date`. Both `SourcePolicy(...)` and a raw dict are accepted on `ParallelSearchTool`.
+
+### Deprecated
+
+- **Search without `search_queries`**: calls passing only `objective` route to the deprecated `/v1beta` endpoint with a `DeprecationWarning`. The fallback will be **removed in 0.4.0**; the Parallel API itself sunsets `/v1beta` no earlier than June 2026. Pair `objective` with `search_queries=[...]` (1-5 keyword strings, 3-6 words each) to use the GA `/v1` endpoint.
+- **Legacy `mode` values**: `"fast"`, `"one-shot"`, and `"agentic"` continue to call the API correctly with a `DeprecationWarning` mapping them to the GA values (`"fast"` / `"one-shot"` → `"basic"`, `"agentic"` → `"advanced"`). The GA values `"basic"` and `"advanced"` are now the canonical set.
+- **`Extract.excerpts=False`**: the GA Extract API always returns excerpts and has no flag to disable them; passing `False` is accepted with a `DeprecationWarning` and ignored. Use `ExcerptSettings(max_chars_per_result=…)` to control per-result size.
+
+### Changed
+
+- **`response_metadata["model_name"]`**: chat completions now emit the LangChain 1.x standard key `model_name` (was `model`). Tracing systems and `langchain-tests`' standard suite check for this name.
+- **`parallel-web` SDK bumped** from `^0.3.3` to `^0.5.1`. Brings in the v1 GA Search/Extract types, `AdvancedSearchSettingsParam` / `AdvancedExtractSettingsParam`, and the FindAll / Task Group / Monitor surfaces (not yet exposed by this integration — see `IMPROVEMENT_PLAN.md` Phase 2).
+- **Slimmed `_client.py`**: the four hand-rolled `ParallelSearchClient` / `AsyncParallelSearchClient` / `ParallelExtractClient` / `AsyncParallelExtractClient` wrapper classes have been removed in favor of using `parallel.Parallel` / `parallel.AsyncParallel` directly. Internal change; no public surface impact.
+- **`_run`/`_arun` deduped**: extracted `_finalize_response`, `_start_text`, and `_completion_text` helpers on both tools so the sync and async bodies are now ~25 lines each instead of ~50.
+- `ParallelExtractTool.full_content` precedence is now explicit: an explicit `FullContentSettings` (or dict) on the call always wins over the tool-level `max_chars_per_extract`; the latter only applies when `full_content=True` is passed as a plain bool.
+
+### Fixed
+
+- `ChatParallel(model="lite")` now actually selects the `lite` model. Pre-0.3.0 the `Field(alias="model_name")` on the `model` field silently swallowed the `model=` kwarg and forced callers into the default `"speed"`. Both `ChatParallel(model="lite")` and `ChatParallel(model_name="lite")` work in 0.3.0 — the latter via a `model_validator` that maps `model_name=` to `model=` for back-compat. `lc_attributes` still serializes the field as `model_name` for tracing parity.
+- `py.typed` is now bundled into the wheel via the `[tool.poetry] include` directive, so downstream `mypy` runs see the package's type information.
+- `with_structured_output(include_raw=True)` correctly populates `parsing_error` on parse failure (previously always `None`).
+
+### Migration
+
+For most users, **no code changes are required**. The recommended-but-optional updates to silence deprecation warnings:
+
+- **Search**: add `search_queries=[…]` (1-5 keyword strings, 3-6 words each) to use the GA `/v1` endpoint.
+  ```python
+  # 0.2.x (still works in 0.3.x with a DeprecationWarning; will break in 0.4.0)
+  tool.invoke({"objective": "What are the latest AI breakthroughs?"})
+
+  # 0.3.x preferred (GA /v1 endpoint)
+  tool.invoke({
+      "search_queries": ["latest AI breakthroughs", "AI advances 2026"],
+      "objective": "What are the latest AI breakthroughs?",
+  })
+  ```
+- **Search mode**: rename `mode="one-shot"`/`"fast"` → `mode="basic"` and `mode="agentic"` → `mode="advanced"`.
+- **Chat**: prefer `ChatParallel(model="lite")` (or `"base"` / `"core"`) over `model_name="..."`. Read citations from `response.response_metadata["basis"]` and structured outputs via `chat.with_structured_output(MyPydanticModel)`. The old class name `ChatParallelWeb` continues to work.
+
 ## [0.2.0] - 2025-12-01
 
 ### Changed

diff --git a/README.md b/README.md
@@ -4,13 +4,14 @@ This package provides LangChain integrations for [Parallel](https://docs.paralle
 
 ## Features
 
-- **Chat Models**: `ChatParallelWeb` - Real-time web research chat completions
-- **Search Tools**: `ParallelWebSearchTool` - Direct access to Parallel's Search API
-- **Extract Tools**: `ParallelExtractTool` - Clean content extraction from web pages
-- **Streaming Support**: Real-time response streaming
-- **Async/Await**: Full asynchronous operation support
-- **OpenAI Compatible**: Uses familiar OpenAI SDK patterns
-- **LangChain Integration**: Seamless integration with LangChain ecosystem
+- **Chat Models**: `ChatParallel` (formerly `ChatParallelWeb`) — real-time web research chat completions, with citations and structured output on the research models.
+- **Search Tool**: `ParallelSearchTool` (formerly `ParallelWebSearchTool`) — direct access to Parallel's GA `/v1/search` endpoint.
+- **Extract Tool**: `ParallelExtractTool` — clean content extraction from web pages via `/v1/extract`.
+- **Streaming Support**: Real-time response streaming on chat.
+- **Async/Await**: Full asynchronous operation support.
+- **LangChain Integration**: Pydantic input schemas, `bind`-able tools, `with_structured_output()`, `lc_serializable`.
+
+> Note: the older names (`ChatParallelWeb`, `ParallelWebSearchTool`) continue to work as aliases.
 
 ## Installation
 
@@ -33,28 +34,48 @@ export PARALLEL_API_KEY="your-api-key-here"
 
 The `ChatParallelWeb` class provides access to Parallel's Chat API, which combines language models with real-time web research capabilities.
 
+#### Picking a model
+
+| Model | Latency | Citations (`response_metadata["basis"]`) | Structured output |
+|-------|---------|------------------------------------------|-------------------|
+| `speed` (default) | ~3s | none | not supported |
+| `lite` | seconds | yes | `with_structured_output()` |
+| `base` | seconds–minutes | yes | `with_structured_output()` |
+| `core` | minutes | yes (most thorough) | `with_structured_output()` |
+
 #### Basic Usage
 
 ```python
 from langchain_core.messages import HumanMessage, SystemMessage
 from langchain_parallel.chat_models import ChatParallelWeb
 
-# Initialize the chat model
-chat = ChatParallelWeb(
-    model="speed",  # Parallel's chat model
-    temperature=0.7,  # Optional: ignored by Parallel
-    max_tokens=None,  # Optional: ignored by Parallel
-)
+chat = ChatParallelWeb(model="speed")
 
-# Create messages
 messages = [
     SystemMessage(content="You are a helpful assistant with access to real-time web information."),
-    HumanMessage(content="What are the latest developments in artificial intelligence?")
+    HumanMessage(content="What are the latest developments in artificial intelligence?"),
 ]
 
-# Get response
 response = chat.invoke(messages)
 print(response.content)
+# Citations on the research models (lite/base/core):
+print(response.response_metadata.get("basis"))
+```
+
+#### Structured output (research models)
+
+```python
+from pydantic import BaseModel, Field
+from langchain_parallel import ChatParallelWeb
+
+class Founder(BaseModel):
+    name: str = Field(description="Full name of the founder")
+    company: str = Field(description="Company they founded")
+
+structured = ChatParallelWeb(model="lite").with_structured_output(Founder)
+result = structured.invoke([("human", "Who founded SpaceX?")])
+print(result)
+# Founder(name='Elon Musk', company='SpaceX')
 ```
 
 #### Streaming Responses
@@ -160,21 +181,7 @@ print(result)
 
 ### Agents
 
-```python
-from langchain.agents import create_openai_functions_agent, AgentExecutor
-from langchain_core.prompts import ChatPromptTemplate
-
-# Create an agent with web research capabilities
-prompt = ChatPromptTemplate.from_messages([
-    ("system", "You are a helpful assistant with access to real-time web information."),
-    ("human", "{input}"),
-    ("placeholder", "{agent_scratchpad}"),
-])
-
-# Use with tools for additional capabilities
-# agent = create_openai_functions_agent(chat, tools, prompt)
-# agent_executor = AgentExecutor(agent=agent, tools=tools)
-```
+Parallel's Chat API does not support tool calling, so `ChatParallelWeb` cannot be the LLM that drives an agent. Use it as a research assistant inside a chain (above), or use Parallel's tools (`ParallelWebSearchTool`, `ParallelExtractTool`) with a tool-calling chat model (Anthropic, OpenAI, etc.) — see the **Tool Usage in Agents** section below.
 
 ## Search API
 
@@ -187,29 +194,16 @@ The search tool provides direct access to Parallel's Search API:
 ```python
 from langchain_parallel import ParallelWebSearchTool
 
-# Initialize the search tool
 search_tool = ParallelWebSearchTool()
 
-# Search with an objective
 result = search_tool.invoke({
-    "objective": "What are the latest developments in renewable energy?",
-    "max_results": 5
+    "search_queries": ["renewable energy 2026", "solar power developments"],
+    "max_results": 5,
 })
 
-print(result)
-# {
-#     "search_id": "search_123...",
-#     "results": [
-#         {
-#             "url": "https://example.com/renewable-energy",
-#             "title": "Latest Renewable Energy Developments",
-#             "excerpts": [
-#                 "Solar energy has seen remarkable growth...",
-#                 "Wind power capacity increased by 15%..."
-#             ]
-#         }
-#     ]
-# }
+print(result["search_id"], len(result["results"]))
+for r in result["results"]:
+    print(r["title"], "-", r["url"])
 ```
 
 
@@ -220,14 +214,19 @@ print(result)
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
-| `objective` | `Optional[str]` | `None` | Natural-language description of research goal |
-| `search_queries` | `Optional[List[str]]` | `None` | Specific search queries (max 5, 200 chars each) |
-| `max_results` | `int` | `10` | Maximum results to return (1-40) |
-| `excerpts` | `Optional[dict]` | `None` | Excerpt settings (e.g., {'max_chars_per_result': 1500}) |
-| `mode` | `Optional[str]` | `None` | Search mode: 'one-shot' for comprehensive results, 'agentic' for token-efficient results |
-| `fetch_policy` | `Optional[dict]` | `None` | Policy for cached vs live content (e.g., {'max_age_seconds': 86400, 'timeout_seconds': 60}) |
-| `api_key` | `Optional[SecretStr]` | `None` | API key (uses env var if not provided) |
-| `base_url` | `str` | `"https://api.parallel.ai"` | API base URL |
+| `objective` | `Optional[str]` | `None` | Natural-language description of research goal (≤5000 chars). |
+| `search_queries` | `Optional[List[str]]` | `None` | 1-5 keyword queries (3-6 words each, ≤200 chars). Required by the GA `/v1` endpoint; if omitted, the call routes to the deprecated `/v1beta` endpoint with a `DeprecationWarning` (slated for removal in 0.4.0). Pair with an optional `objective` for best results. |
+| `max_results` | `int` | `10` | Maximum results to return (1–40). |
+| `excerpts` | `Optional[ExcerptSettings]` | `None` | Per-result excerpt-size cap. |
+| `max_chars_total` | `Optional[int]` | `None` | Cap on total excerpt characters across all results. |
+| `mode` | `Optional[Literal["basic", "advanced"]]` | `None` (API default `advanced`) | `basic` is lower-latency; `advanced` is higher quality with more retrieval and compression. Legacy values `fast`, `one-shot` (→ `basic`) and `agentic` (→ `advanced`) are accepted with a `DeprecationWarning`. |
+| `source_policy` | `Optional[SourcePolicy]` | `None` | Domain include/exclude lists and freshness floor (`after_date`). |
+| `fetch_policy` | `Optional[FetchPolicy]` | `None` | Cache vs live-fetch policy (e.g. `FetchPolicy(max_age_seconds=86400, timeout_seconds=60)`). |
+| `location` | `Optional[str]` | `None` | ISO 3166-1 alpha-2 country code (e.g. `"us"`, `"gb"`). |
+| `client_model` | `Optional[str]` | `None` | Identifier of the calling LLM, used for model-specific result optimizations. |
+| `session_id` | `Optional[str]` | `None` | Shared id grouping related Search/Extract calls in one task. |
+| `api_key` | `Optional[SecretStr]` | `None` | API key (uses `PARALLEL_API_KEY` env var if not provided). |
+| `base_url` | `str` | `"https://api.parallel.ai"` | API base URL. |
 
 ### Search with Specific Queries
 
@@ -247,31 +246,27 @@ result = search_tool.invoke({
 
 ### Tool Usage in Agents
 
-The search tool works seamlessly with LangChain agents:
+Use the search tool with a tool-calling chat model (e.g. Anthropic Claude or OpenAI) and `create_agent`. Note that Parallel's own Chat API does not currently support tool calling, so use a different model class for the agent's LLM and use Parallel as a tool.
 
 ```python
-from langchain.agents import create_openai_functions_agent, AgentExecutor
-from langchain_core.prompts import ChatPromptTemplate
-
-# Create agent with search capabilities
-tools = [search_tool]
-
-prompt = ChatPromptTemplate.from_messages([
-    ("system", "You are a research assistant. Use the search tool to find current information."),
-    ("human", "{input}"),
-    ("placeholder", "{agent_scratchpad}"),
-])
-
-agent = create_openai_functions_agent(chat, tools, prompt)
-agent_executor = AgentExecutor(agent=agent, tools=tools)
+from langchain.agents import create_agent
+from langchain_parallel import ParallelWebSearchTool, ParallelExtractTool
+
+agent = create_agent(
+    "anthropic:claude-haiku-4-5",
+    tools=[ParallelWebSearchTool(), ParallelExtractTool()],
+    system_prompt=(
+        "You are a research assistant. Use parallel_web_search to find "
+        "current information and parallel_extract to read specific pages."
+    ),
+)
 
-# Run the agent
-result = agent_executor.invoke({
-    "input": "What are the latest developments in artificial intelligence?"
-})
-print(result["output"])
+result = agent.invoke({"messages": [("human", "Latest AI breakthroughs?")]})
+print(result["messages"][-1].content)
 ```
 
+See `docs/demo_agent.ipynb` for a full walkthrough.
+
 ## Extract API
 
 The Extract API provides clean content extraction from web pages, returning structured markdown-formatted content optimized for LLM consumption.
@@ -361,15 +356,18 @@ print(f"Content length: {len(result[0]['content'])} characters")
 
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
-| `urls` | `List[str]` | Required | List of URLs to extract content from |
-| `search_objective` | `Optional[str]` | `None` | Natural language objective to focus extraction |
-| `search_queries` | `Optional[List[str]]` | `None` | Specific keyword queries to focus extraction |
-| `excerpts` | `Union[bool, ExcerptSettings]` | `True` | Include relevant excerpts (focused on objective/queries if provided) |
-| `full_content` | `Union[bool, FullContentSettings]` | `False` | Include full page content |
-| `fetch_policy` | `Optional[FetchPolicy]` | `None` | Cache vs live content policy |
-| `max_chars_per_extract` | `Optional[int]` | `None` | Maximum characters per extraction (tool-level setting) |
-| `api_key` | `Optional[SecretStr]` | `None` | API key (uses env var if not provided) |
-| `base_url` | `str` | `"https://api.parallel.ai"` | API base URL |
+| `urls` | `List[str]` | Required | List of URLs to extract content from (up to 20 per request). |
+| `search_objective` | `Optional[str]` | `None` | Natural language objective to focus extraction (≤5000 chars). |
+| `search_queries` | `Optional[List[str]]` | `None` | Specific keyword queries to focus extraction. |
+| `excerpts` | `Union[bool, ExcerptSettings]` | `True` | In v1 GA, excerpts are always returned; the bool is kept for backward compatibility, and `ExcerptSettings(max_chars_per_result=…)` controls per-result size. |
+| `full_content` | `Union[bool, FullContentSettings]` | `False` | Include full page content in addition to excerpts. |
+| `max_chars_total` | `Optional[int]` | `None` | Cap on total excerpt characters across all results. Does not affect `full_content`. |
+| `fetch_policy` | `Optional[FetchPolicy]` | `None` | Cache vs live content policy. |
+| `client_model` | `Optional[str]` | `None` | Identifier of the calling LLM, used for model-specific result optimizations. |
+| `session_id` | `Optional[str]` | `None` | Shared id grouping related Search/Extract calls in one task. |
+| `max_chars_per_extract` | `Optional[int]` | `None` | Tool-level default cap on `full_content` size; only applied when `full_content=True`. |
+| `api_key` | `Optional[SecretStr]` | `None` | API key (uses `PARALLEL_API_KEY` env var if not provided). |
+| `base_url` | `str` | `"https://api.parallel.ai"` | API base URL. |
 
 ### Error Handling
 
@@ -500,11 +498,4 @@ This project is licensed under the MIT License - see the LICENSE file for detail
 
 ## Changelog
 
-### v0.1.0
-- Initial release
-- **Chat Models**: ChatParallelWeb with real-time web research
-- **Search Tools**: ParallelWebSearchTool for direct API access
-- **Extract Tools**: ParallelExtractTool for clean content extraction
-- Streaming and async/await support
-- Batch URL extraction with error handling
-- Full LangChain ecosystem compatibility
+See [`CHANGELOG.md`](./CHANGELOG.md) for the full version history.