diff --git a/docs/open-api-docs.yaml b/docs/open-api-docs.yaml index 2130ed03..cd12d948 100644 --- a/docs/open-api-docs.yaml +++ b/docs/open-api-docs.yaml @@ -2,7 +2,7 @@ openapi: 3.0.3 info: title: The Agent's user-facing API description: The user-facing parts of The Agent's API service (excluding system-level endpoints, chat completion, maintenance endpoints, etc.) - version: 5.18.0 + version: 5.19.0 license: name: MIT url: https://opensource.org/licenses/MIT diff --git a/openspec/changes/archive/2026-06-09-add-grok-search-tools/.openspec.yaml b/openspec/changes/archive/2026-06-09-add-grok-search-tools/.openspec.yaml new file mode 100644 index 00000000..57354460 --- /dev/null +++ b/openspec/changes/archive/2026-06-09-add-grok-search-tools/.openspec.yaml @@ -0,0 +1,2 @@ +schema: spec-driven +created: 2026-06-09 diff --git a/openspec/changes/archive/2026-06-09-add-grok-search-tools/design.md b/openspec/changes/archive/2026-06-09-add-grok-search-tools/design.md new file mode 100644 index 00000000..4ac03395 --- /dev/null +++ b/openspec/changes/archive/2026-06-09-add-grok-search-tools/design.md @@ -0,0 +1,61 @@ +## Context + +The search tool currently routes configured search requests through `AIWebSearch`. That executor supports Perplexity through LangChain and Google through the Gemini SDK with `GoogleSearch` grounding. Grok/xAI models are present in the external tool library, and `xai-sdk` is already installed, but Grok models are not selectable as `ToolType.search` and `AIWebSearch` has no xAI provider branch. + +xAI supports non-streaming Grok calls with server-side `web_search` and `x_search` tools enabled in the same request. The SDK exposes these tools as descriptors via `xai_sdk.tools.web_search()` and `xai_sdk.tools.x_search()`. The actual search work happens remotely during the Grok request. + +xAI also returns exact provider cost through `response.usage.cost_in_usd_ticks`, where `10_000_000_000` ticks equals one US dollar. Those ticks are monetary precision units, not runtime units. The returned cost includes token charges, prompt caching effects, and server-side tool invocation costs for that request. + +## Goals / Non-Goals + +**Goals:** +- Make selected Grok chat models usable as configured search tools. +- Run Grok search as a single non-streaming xAI request with both web and X search tools enabled. +- Preserve the existing `AIWebSearch` call contract for callers. +- Track xAI search cost from provider-reported cost ticks instead of estimating token and tool-call costs. +- Keep pre-flight credit checks conservative using existing `CostEstimate` fields. + +**Non-Goals:** +- Do not add streaming search behavior. +- Do not expose separate user-facing "web search" and "X search" modes. +- Do not add a new database column unless implementation proves existing usage record fields are insufficient. +- Do not change Google or Perplexity search behavior. +- Do not replace the separate X/Twitter API post reader. + +## Decisions + +1. Use one xAI branch in `AIWebSearch`. + + Add `XAI` to the provider dispatch in `AIWebSearch.execute()` and implement an xAI-specific path. The path should create an xAI chat with the configured Grok model, append the existing search prompt/query, and call `sample()` rather than `stream()`. + + Alternative considered: use the OpenAI-compatible Responses API. The installed xAI SDK already exists in the project and exposes the required tools, so using it avoids adding another client path. + +2. Enable both `web_search` and `x_search` for Grok search. + + From the product perspective, the app exposes one search tool. The xAI request should therefore provide both remote search tools and let Grok decide which to call. + + Alternative considered: only enable `x_search`. That would make Grok search narrower than Google/Perplexity search and would not match the user's expectation that X search is part of the same search experience. + +3. Track exact xAI cost through a separate usage tracking method. + + Add a method to `UsageTrackingService` for provider-reported request costs. It should create one usage record using the exact converted xAI cost, plus the normal maintenance fee. With the current DB shape, store the provider-reported request cost in `model_cost_credits` and keep `api_call_cost_credits` and `remote_runtime_cost_credits` at zero. + + Alternative considered: reconstruct cost from input tokens, output tokens, and server-side tool usage. xAI's own response already returns the actual billed cost after discounts and all server-side tool calls, so reconstruction would be less accurate and could double-count. + +4. Keep `server_side_tool_usage` out of billing. + + The implementation may log server-side tool usage for diagnostics, but billing should use `cost_in_usd_ticks` when present. + + Alternative considered: emit separate usage records per xAI server-side tool call. This matches Google's current query-count approach but is not appropriate when xAI returns an all-inclusive provider cost. + +5. Preserve existing pre-flight validation. + + Since exact xAI cost is only known after the response, Grok search models should still have approximate `CostEstimate` values sufficient for pre-flight credit validation. The estimate does not need to exactly match the final provider-reported charge. + +## Risks / Trade-offs + +- Provider cost metadata missing -> Treat as an external empty/unexpected response for platform-billed calls, or fall back to estimate only if the implementation deliberately accepts estimate drift. +- Exact xAI cost exceeds pre-flight estimate -> Existing spending deduction can make the payer balance negative and logs a warning; keep estimates conservative. +- Source formatting may differ from Google/Perplexity -> Prefer response citations or inline citation metadata if available; otherwise return the answer without a sources section rather than failing an otherwise valid answer. +- Existing `XAIUsageTrackingDecorator` only wraps image calls -> Extending it for chat search must avoid changing image generation accounting unintentionally. +- xAI SDK response shape may vary by model/tool output -> Add focused unit tests around cost extraction and response validation using lightweight fakes. diff --git a/openspec/changes/archive/2026-06-09-add-grok-search-tools/proposal.md b/openspec/changes/archive/2026-06-09-add-grok-search-tools/proposal.md new file mode 100644 index 00000000..2da3320c --- /dev/null +++ b/openspec/changes/archive/2026-06-09-add-grok-search-tools/proposal.md @@ -0,0 +1,31 @@ +## Why + +Users can already choose Google Gemini Flash and Perplexity models for the app's search tool, but Grok models are only available for chat/vision/image workflows. xAI now supports non-streaming server-side `web_search` and `x_search` tools in the same Grok request, which lets Grok provide search behavior through the existing user-facing search abstraction. + +## What Changes + +- Allow selected Grok/xAI chat models to be configured as `ToolType.search` tools. +- Add an xAI search execution path that performs one non-streaming Grok call with both `web_search` and `x_search` enabled. +- Return the Grok answer through the same `AIWebSearch` interface used by Google and Perplexity search. +- Track xAI search usage using xAI's provider-reported `cost_in_usd_ticks` value instead of estimating separate token/tool invocation costs. +- Keep rough xAI cost estimates only for pre-flight credit validation. + +## Capabilities + +### New Capabilities +- `grok-search-tools`: Covers using Grok/xAI models as configured search tools with server-side web and X search, including exact provider-reported cost tracking. + +### Modified Capabilities + +None. + +## Impact + +- Affected code: + - `src/features/external_tools/external_tool_library.py` + - `src/features/web_browsing/ai_web_search.py` + - `src/features/accounting/usage/usage_tracking_service.py` + - `src/features/accounting/usage/decorators/x_ai_usage_tracking_decorator.py` + - `src/di/di.py` +- No database migration is expected if provider-reported xAI cost is stored in existing usage record cost fields. +- No new external dependency is expected; `xai-sdk` is already installed and exposes `web_search()` and `x_search()`. diff --git a/openspec/changes/archive/2026-06-09-add-grok-search-tools/specs/grok-search-tools/spec.md b/openspec/changes/archive/2026-06-09-add-grok-search-tools/specs/grok-search-tools/spec.md new file mode 100644 index 00000000..d32b2f3c --- /dev/null +++ b/openspec/changes/archive/2026-06-09-add-grok-search-tools/specs/grok-search-tools/spec.md @@ -0,0 +1,53 @@ +## ADDED Requirements + +### Requirement: Grok models are selectable for search +The system SHALL allow supported Grok/xAI chat models to be configured as search tools through the existing search tool selection mechanism. + +#### Scenario: Supported Grok model appears as search-capable +- **WHEN** the system builds the list of available search tools +- **THEN** supported Grok/xAI chat models are included with `ToolType.search` + +#### Scenario: Unsupported xAI tools remain unavailable for search +- **WHEN** the system builds the list of available search tools +- **THEN** xAI image-generation tools are not included as search tools + +### Requirement: Grok search uses web and X search in one request +The system SHALL execute Grok-backed search with one non-streaming xAI request that enables both server-side web search and X search tools. + +#### Scenario: Grok search execution +- **WHEN** a user invokes the search tool with a Grok/xAI model selected +- **THEN** the system sends one non-streaming Grok request with both `web_search` and `x_search` enabled + +#### Scenario: Grok search returns an answer +- **WHEN** xAI returns a non-empty search response +- **THEN** the system returns the answer through the existing AI web search result interface + +#### Scenario: Empty Grok search response +- **WHEN** xAI returns no answer content for a Grok search request +- **THEN** the system raises a structured external service error + +### Requirement: Grok search uses provider-reported cost +The system SHALL track Grok search usage using xAI's provider-reported per-request cost when `cost_in_usd_ticks` is available. + +#### Scenario: Provider-reported cost is recorded +- **WHEN** a Grok search request succeeds and includes `cost_in_usd_ticks` +- **THEN** the system records one usage entry using the converted provider-reported cost + +#### Scenario: Server-side tool usage is not double-counted +- **WHEN** a Grok search response includes server-side web or X tool usage counts +- **THEN** the system does not create additional billable usage entries for those internal tool calls + +#### Scenario: Cost metadata is missing +- **WHEN** a Grok search response is missing provider-reported cost metadata +- **THEN** the system handles the response according to structured external-service error handling + +### Requirement: Existing search providers are unchanged +The system SHALL preserve current Google and Perplexity search behavior while adding Grok-backed search. + +#### Scenario: Google search still uses Google grounding +- **WHEN** a user invokes search with a Google search model selected +- **THEN** the system uses the existing Google search execution path + +#### Scenario: Perplexity search still uses Perplexity +- **WHEN** a user invokes search with a Perplexity search model selected +- **THEN** the system uses the existing Perplexity search execution path diff --git a/openspec/changes/archive/2026-06-09-add-grok-search-tools/tasks.md b/openspec/changes/archive/2026-06-09-add-grok-search-tools/tasks.md new file mode 100644 index 00000000..3f760ad0 --- /dev/null +++ b/openspec/changes/archive/2026-06-09-add-grok-search-tools/tasks.md @@ -0,0 +1,28 @@ +## 1. Tool Catalog + +- [x] 1.1 Add `ToolType.search` to supported Grok/xAI chat model definitions in `external_tool_library.py`. +- [x] 1.2 Add conservative xAI search cost estimates for pre-flight credit checks without using them for final billing. +- [x] 1.3 Verify xAI image tools remain excluded from search-capable tool lists. + +## 2. xAI Search Execution + +- [x] 2.1 Add `XAI` provider dispatch to `AIWebSearch.execute()`. +- [x] 2.2 Implement non-streaming Grok search using `xai_sdk.tools.web_search()` and `xai_sdk.tools.x_search()` in one request. +- [x] 2.3 Validate Grok search responses and raise structured `ExternalServiceError` errors for empty or unexpected responses. +- [x] 2.4 Add source formatting for xAI citations or inline citation metadata when available, while allowing answers without sources if xAI provides no source data. + +## 3. Provider-Reported Cost Tracking + +- [x] 3.1 Add a separate `UsageTrackingService` method for provider-reported request costs. +- [x] 3.2 Convert xAI `cost_in_usd_ticks` to credits using the existing project credit scale. +- [x] 3.3 Store the provider-reported request cost in existing usage record cost fields without introducing a migration. +- [x] 3.4 Extend `XAIUsageTrackingDecorator` to wrap the chat search call while preserving existing image tracking behavior. +- [x] 3.5 Log `server_side_tool_usage` for diagnostics without creating additional billable usage records. + +## 4. Tests and Verification + +- [x] 4.1 Update `test/features/web_browsing/test_ai_web_search.py` for xAI provider routing, both enabled tools, non-streaming execution, and empty-response handling. +- [x] 4.2 Update `test/features/accounting/usage/test_usage_tracking_service.py` for provider-reported cost records and maintenance fee behavior. +- [x] 4.3 Update `test/features/accounting/usage/decorators/test_x_ai_usage_tracking_decorator.py` for chat search tracking and image tracking regression coverage. +- [x] 4.4 Verify Grok search availability through xAI search provider routing coverage without adding catalog-constant assertions. +- [x] 4.5 Run the focused test files with `pipenv run`, then run `pipenv run pre-commit run --all-files --show-diff-on-failure`. diff --git a/openspec/specs/grok-search-tools/spec.md b/openspec/specs/grok-search-tools/spec.md new file mode 100644 index 00000000..669c1bf7 --- /dev/null +++ b/openspec/specs/grok-search-tools/spec.md @@ -0,0 +1,57 @@ +# grok-search-tools Specification + +## Purpose +Define how Grok/xAI models participate in the existing search tool abstraction, including server-side web/X search execution and provider-reported cost tracking. + +## Requirements +### Requirement: Grok models are selectable for search +The system SHALL allow supported Grok/xAI chat models to be configured as search tools through the existing search tool selection mechanism. + +#### Scenario: Supported Grok model appears as search-capable +- **WHEN** the system builds the list of available search tools +- **THEN** supported Grok/xAI chat models are included with `ToolType.search` + +#### Scenario: Unsupported xAI tools remain unavailable for search +- **WHEN** the system builds the list of available search tools +- **THEN** xAI image-generation tools are not included as search tools + +### Requirement: Grok search uses web and X search in one request +The system SHALL execute Grok-backed search with one non-streaming xAI request that enables both server-side web search and X search tools. + +#### Scenario: Grok search execution +- **WHEN** a user invokes the search tool with a Grok/xAI model selected +- **THEN** the system sends one non-streaming Grok request with both `web_search` and `x_search` enabled + +#### Scenario: Grok search returns an answer +- **WHEN** xAI returns a non-empty search response +- **THEN** the system returns the answer through the existing AI web search result interface + +#### Scenario: Empty Grok search response +- **WHEN** xAI returns no answer content for a Grok search request +- **THEN** the system raises a structured external service error + +### Requirement: Grok search uses provider-reported cost +The system SHALL track Grok search usage using xAI's provider-reported per-request cost when `cost_in_usd_ticks` is available. + +#### Scenario: Provider-reported cost is recorded +- **WHEN** a Grok search request succeeds and includes `cost_in_usd_ticks` +- **THEN** the system records one usage entry using the converted provider-reported cost + +#### Scenario: Server-side tool usage is not double-counted +- **WHEN** a Grok search response includes server-side web or X tool usage counts +- **THEN** the system does not create additional billable usage entries for those internal tool calls + +#### Scenario: Cost metadata is missing +- **WHEN** a Grok search response is missing provider-reported cost metadata +- **THEN** the system handles the response according to structured external-service error handling + +### Requirement: Existing search providers are unchanged +The system SHALL preserve current Google and Perplexity search behavior while adding Grok-backed search. + +#### Scenario: Google search still uses Google grounding +- **WHEN** a user invokes search with a Google search model selected +- **THEN** the system uses the existing Google search execution path + +#### Scenario: Perplexity search still uses Perplexity +- **WHEN** a user invokes search with a Perplexity search model selected +- **THEN** the system uses the existing Perplexity search execution path diff --git a/pyproject.toml b/pyproject.toml index 31e2a878..4e30741c 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "the-agent" -version = "5.18.0" +version = "5.19.0" [tool.setuptools] package-dir = {"" = "src"} diff --git a/src/features/accounting/usage/decorators/x_ai_usage_tracking_decorator.py b/src/features/accounting/usage/decorators/x_ai_usage_tracking_decorator.py index 04d57664..38f99535 100644 --- a/src/features/accounting/usage/decorators/x_ai_usage_tracking_decorator.py +++ b/src/features/accounting/usage/decorators/x_ai_usage_tracking_decorator.py @@ -8,6 +8,10 @@ from features.accounting.usage.usage_tracking_service import UsageTrackingService from features.external_tools.configured_tool import ConfiguredTool from util import log +from util.error_codes import LLM_UNEXPECTED_RESPONSE +from util.errors import ExternalServiceError + +X_AI_USD_TICKS_PER_CREDIT = 100_000_000 class XAIUsageTrackingDecorator: @@ -42,12 +46,24 @@ def image(self) -> Any: self.__intercept_image_call, ) + @property + def chat(self) -> Any: + return NamespaceProxy( + self.__wrapped_client.chat, + self.__intercept_chat_call, + ) + def __intercept_image_call(self, name: str, attr: Any) -> Any: if name == "sample": - return self.__wrap_sample(attr) + return self.__wrap_image_sample(attr) + return attr + + def __intercept_chat_call(self, name: str, attr: Any) -> Any: + if name == "create": + return self.__wrap_chat_create(attr) return attr - def __wrap_sample(self, original_method: Callable[..., Any]) -> Callable[..., Any]: + def __wrap_image_sample(self, original_method: Callable[..., Any]) -> Callable[..., Any]: def wrapper(*args: Any, **kwargs: Any) -> Any: self.__spending_service.validate_pre_flight( self.__configured_tool, @@ -58,15 +74,41 @@ def wrapper(*args: Any, **kwargs: Any) -> Any: try: response = original_method(*args, **kwargs) runtime_seconds = time() - start_time - self.__track_usage(runtime_seconds) + self.__track_image_usage(runtime_seconds) return response except Exception: runtime_seconds = time() - start_time - self.__track_failed_usage(runtime_seconds) + self.__track_failed_image_usage(runtime_seconds) raise return wrapper - def __track_usage(self, runtime_seconds: float) -> None: + def __wrap_chat_create(self, original_method: Callable[..., Any]) -> Callable[..., Any]: + def wrapper(*args: Any, **kwargs: Any) -> Any: + chat = original_method(*args, **kwargs) + return NamespaceProxy(chat, self.__intercept_chat_instance_call) + return wrapper + + def __intercept_chat_instance_call(self, name: str, attr: Any) -> Any: + if name == "sample": + return self.__wrap_chat_sample(attr) + return attr + + def __wrap_chat_sample(self, original_method: Callable[..., Any]) -> Callable[..., Any]: + def wrapper(*args: Any, **kwargs: Any) -> Any: + self.__spending_service.validate_pre_flight(self.__configured_tool) + start_time = time() + try: + response = original_method(*args, **kwargs) + runtime_seconds = time() - start_time + self.__track_chat_usage(response, runtime_seconds) + return response + except Exception: + runtime_seconds = time() - start_time + self.__track_failed_chat_usage(runtime_seconds) + raise + return wrapper + + def __track_image_usage(self, runtime_seconds: float) -> None: record = self.__tracking_service.track_image_model( tool = self.__configured_tool.definition, tool_purpose = self.__configured_tool.purpose, @@ -78,7 +120,33 @@ def __track_usage(self, runtime_seconds: float) -> None: ) self.__spending_service.deduct(self.__configured_tool, record.total_cost_credits) - def __track_failed_usage(self, runtime_seconds: float) -> None: + def __track_chat_usage(self, response: Any, runtime_seconds: float) -> None: + usage = getattr(response, "usage", None) + cost_ticks = getattr(usage, "cost_in_usd_ticks", None) + if cost_ticks is None: + raise ExternalServiceError( + "xAI response did not include provider-reported cost", + LLM_UNEXPECTED_RESPONSE, + ) + + server_side_tool_usage = getattr(response, "server_side_tool_usage", None) + if server_side_tool_usage: + log.d(f"xAI server-side tools used: {server_side_tool_usage}") + + record = self.__tracking_service.track_provider_reported_cost( + tool = self.__configured_tool.definition, + tool_purpose = self.__configured_tool.purpose, + runtime_seconds = runtime_seconds, + payer_id = self.__configured_tool.payer_id, + uses_credits = self.__configured_tool.uses_credits, + provider_cost_credits = float(cost_ticks) / X_AI_USD_TICKS_PER_CREDIT, + input_tokens = self.__get_usage_value(usage, "input_tokens", "prompt_tokens"), + output_tokens = self.__get_usage_value(usage, "output_tokens", "completion_tokens"), + total_tokens = self.__get_usage_value(usage, "total_tokens"), + ) + self.__spending_service.deduct(self.__configured_tool, record.total_cost_credits) + + def __track_failed_image_usage(self, runtime_seconds: float) -> None: log.w(f"Tool call failed for {self.__configured_tool.definition.id}, tracking without deduction") self.__tracking_service.track_image_model( tool = self.__configured_tool.definition, @@ -91,5 +159,23 @@ def __track_failed_usage(self, runtime_seconds: float) -> None: is_failed = True, ) + def __track_failed_chat_usage(self, runtime_seconds: float) -> None: + log.w(f"Tool call failed for {self.__configured_tool.definition.id}, tracking without deduction") + self.__tracking_service.track_text_model( + tool = self.__configured_tool.definition, + tool_purpose = self.__configured_tool.purpose, + runtime_seconds = runtime_seconds, + payer_id = self.__configured_tool.payer_id, + uses_credits = self.__configured_tool.uses_credits, + is_failed = True, + ) + + def __get_usage_value(self, usage: Any, *names: str) -> int | None: + for name in names: + value = getattr(usage, name, None) + if isinstance(value, int): + return value + return None + def __getattr__(self, name: str) -> Any: return getattr(self.__wrapped_client, name) diff --git a/src/features/accounting/usage/usage_tracking_service.py b/src/features/accounting/usage/usage_tracking_service.py index f1f7000a..76f97902 100644 --- a/src/features/accounting/usage/usage_tracking_service.py +++ b/src/features/accounting/usage/usage_tracking_service.py @@ -156,6 +156,44 @@ def track_web_search_query( records.append(self.__di.usage_record_repo.create(record)) return records + def track_provider_reported_cost( + self, + tool: ExternalTool, + tool_purpose: ToolType, + runtime_seconds: float, + payer_id: UUID, + uses_credits: bool, + provider_cost_credits: float, + input_tokens: int | None = None, + output_tokens: int | None = None, + total_tokens: int | None = None, + is_failed: bool = False, + ) -> UsageRecord: + maintenance_fee_credits: float = config.usage_maintenance_fee_credits + total_cost_credits: float = provider_cost_credits + maintenance_fee_credits + + record = UsageRecord( + user_id = self.__di.invoker.id, + payer_id = payer_id, + uses_credits = uses_credits, + is_failed = is_failed, + chat_id = self.__di.invoker_chat.chat_id if self.__di.invoker_chat else None, + tool = tool, + tool_purpose = tool_purpose, + timestamp = datetime.now(timezone.utc), + model_cost_credits = provider_cost_credits, + remote_runtime_cost_credits = 0.0, + api_call_cost_credits = 0.0, + maintenance_fee_credits = maintenance_fee_credits, + total_cost_credits = total_cost_credits, + runtime_seconds = runtime_seconds, + input_tokens = input_tokens, + output_tokens = output_tokens, + total_tokens = total_tokens, + participant_details = self.__build_participant_details(payer_id), + ) + return self.__di.usage_record_repo.create(record) + def track_api_call( self, tool: ExternalTool, diff --git a/src/features/external_tools/external_tool_library.py b/src/features/external_tools/external_tool_library.py index 6831a1f7..10e8600c 100644 --- a/src/features/external_tools/external_tool_library.py +++ b/src/features/external_tools/external_tool_library.py @@ -355,10 +355,11 @@ id = "grok-4.20-non-reasoning", name = "Grok 4.20", provider = XAI, - types = [ToolType.chat, ToolType.copywriting, ToolType.vision], + types = [ToolType.chat, ToolType.copywriting, ToolType.vision, ToolType.search], cost_estimate = CostEstimate( input_1m_tokens = 200, output_1m_tokens = 600, + web_search_query = 5, ), ) @@ -366,10 +367,11 @@ id = "grok-4.20-reasoning", name = "Grok 4.20 (Reasoning)", provider = XAI, - types = [ToolType.chat, ToolType.reasoning, ToolType.copywriting, ToolType.vision], + types = [ToolType.chat, ToolType.reasoning, ToolType.copywriting, ToolType.vision, ToolType.search], cost_estimate = CostEstimate( input_1m_tokens = 200, output_1m_tokens = 600, + web_search_query = 5, ), ) @@ -377,10 +379,11 @@ id = "grok-4.3", name = "Grok 4.3", provider = XAI, - types = [ToolType.chat, ToolType.reasoning, ToolType.copywriting, ToolType.vision], + types = [ToolType.chat, ToolType.reasoning, ToolType.copywriting, ToolType.vision, ToolType.search], cost_estimate = CostEstimate( input_1m_tokens = 200, output_1m_tokens = 600, + web_search_query = 5, ), ) diff --git a/src/features/web_browsing/ai_web_search.py b/src/features/web_browsing/ai_web_search.py index 96f34f8d..6a224cce 100644 --- a/src/features/web_browsing/ai_web_search.py +++ b/src/features/web_browsing/ai_web_search.py @@ -1,16 +1,20 @@ from google.genai.types import GenerateContentConfig, GoogleSearch, Tool from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage +from xai_sdk.chat import system, user +from xai_sdk.tools import web_search, x_search from di.di import DI from features.external_tools.configured_tool import ConfiguredTool from features.external_tools.external_tool import ToolType -from features.external_tools.external_tool_provider_library import GOOGLE_AI, PERPLEXITY +from features.external_tools.external_tool_provider_library import GOOGLE_AI, PERPLEXITY, XAI from features.integrations import prompt_resolvers from features.web_browsing.search_source_formatter import ( format_sources_from_google, format_sources_from_perplexity, + format_sources_from_xai, ) from util import log +from util.config import config from util.error_codes import EXTERNAL_EMPTY_RESPONSE, LLM_UNEXPECTED_RESPONSE, UNSUPPORTED_PROVIDER from util.errors import ConfigurationError, ExternalServiceError @@ -35,6 +39,8 @@ def execute(self) -> AIMessage: return self.__search_with_perplexity() elif provider == GOOGLE_AI: return self.__search_with_google() + elif provider == XAI: + return self.__search_with_xai() else: raise ConfigurationError(f"Unsupported search provider: '{provider.name}'", UNSUPPORTED_PROVIDER) @@ -76,3 +82,23 @@ def __search_with_google(self) -> AIMessage: content = answer_text + sources log.d(f"Finished Google web search, result size is {len(content)} characters") return AIMessage(content = content) + + def __search_with_xai(self) -> AIMessage: + system_prompt = prompt_resolvers.sentient_web_search(self.__di.invoker_chat) + client = self.__di.x_ai_client(self.__configured_tool, config.web_timeout_s * 6) + chat = client.chat.create( + model = self.__configured_tool.definition.id, + messages = [system(system_prompt), user(self.__search_query)], + tools = [web_search(), x_search()], + include = ["inline_citations"], + ) + response = chat.sample() + + answer_text = getattr(response, "content", None) or "" + if not answer_text: + raise ExternalServiceError("xAI search returned empty answer", EXTERNAL_EMPTY_RESPONSE) + + sources = format_sources_from_xai(response, self.__di) + content = answer_text + sources + log.d(f"Finished xAI web search, result size is {len(content)} characters") + return AIMessage(content = content) diff --git a/src/features/web_browsing/search_source_formatter.py b/src/features/web_browsing/search_source_formatter.py index 84f9ca4c..4aafe7fb 100644 --- a/src/features/web_browsing/search_source_formatter.py +++ b/src/features/web_browsing/search_source_formatter.py @@ -44,6 +44,66 @@ def format_sources_from_google(grounding_chunks: list, di: DI) -> str: return __render_sources(raw_sources, di) +def format_sources_from_xai(response: object, di: DI) -> str: + raw_sources: list[tuple[str, str]] = [] + + citations = getattr(response, "citations", None) or [] + if __is_iterable(citations): + for url in citations: + if url: + url = str(url) + domain = urlparse(url).netloc or simplify_url(url) + raw_sources.append((domain, simplify_url(url, strip_subdomains = False))) + + inline_citations = getattr(response, "inline_citations", None) or [] + if __is_iterable(inline_citations): + for citation in inline_citations: + source = __extract_xai_inline_source(citation) + if source: + raw_sources.append(source) + + return __render_sources(raw_sources, di) + + +def __is_iterable(value: object) -> bool: + return not isinstance(value, str | bytes) and hasattr(value, "__iter__") + + +def __extract_xai_inline_source(citation: object) -> tuple[str, str] | None: + for field_name in ["web_citation", "x_citation"]: + if not __has_xai_citation_field(citation, field_name): + continue + source = getattr(citation, field_name, None) + if not source: + continue + url = None + for url_attr in ["url", "uri", "post_url", "tweet_url"]: + url = getattr(source, url_attr, None) + if url: + break + if not url: + continue + url = str(url) + title = None + for title_attr in ["title", "name", "handle", "username"]: + title = getattr(source, title_attr, None) + if title: + break + domain = str(title) if title else (urlparse(url).netloc or simplify_url(url)) + return domain, simplify_url(url, strip_subdomains = False) + return None + + +def __has_xai_citation_field(citation: object, field_name: str) -> bool: + has_field = getattr(citation, "HasField", None) + if callable(has_field): + try: + return bool(has_field(field_name)) + except ValueError: + return False + return bool(getattr(citation, field_name, None)) + + def __render_sources(raw_sources: list[tuple[str, str]], di: DI) -> str: if not raw_sources: return "" diff --git a/test/features/accounting/usage/decorators/test_x_ai_usage_tracking_decorator.py b/test/features/accounting/usage/decorators/test_x_ai_usage_tracking_decorator.py index 7a894386..6a2d8722 100644 --- a/test/features/accounting/usage/decorators/test_x_ai_usage_tracking_decorator.py +++ b/test/features/accounting/usage/decorators/test_x_ai_usage_tracking_decorator.py @@ -9,6 +9,7 @@ from features.accounting.usage.usage_tracking_service import UsageTrackingService from features.external_tools.configured_tool import ConfiguredTool from features.external_tools.external_tool import ExternalTool, ToolType +from util.errors import ExternalServiceError class XAIUsageTrackingDecoratorTest(unittest.TestCase): @@ -19,6 +20,12 @@ def setUp(self): self.mock_tracking_service.track_image_model = Mock( return_value = Mock(spec = UsageRecord, total_cost_credits = 2.0), ) + self.mock_tracking_service.track_provider_reported_cost = Mock( + return_value = Mock(spec = UsageRecord, total_cost_credits = 3.5), + ) + self.mock_tracking_service.track_text_model = Mock( + return_value = Mock(spec = UsageRecord, total_cost_credits = 0.0), + ) self.mock_spending_service = Mock(spec = SpendingService) self.tool_purpose = ToolType.images_gen self.external_tool = Mock(spec = ExternalTool) @@ -66,6 +73,67 @@ def test_sample_tracks_usage_by_image_size(self): self.assertGreater(call_args.kwargs["runtime_seconds"], 0) self.assertEqual(call_args.kwargs["uses_credits"], False) + def test_chat_property_returns_proxy(self): + chat = self.decorator.chat + + self.assertIsNotNone(chat) + + def test_chat_sample_tracks_provider_reported_cost(self): + usage = Mock() + usage.cost_in_usd_ticks = 25_000_000 + usage.input_tokens = 10 + usage.output_tokens = 20 + usage.total_tokens = 30 + response = Mock() + response.usage = usage + response.server_side_tool_usage = {"WEB_SEARCH": 1, "X_SEARCH": 1} + mock_chat = Mock() + mock_chat.sample.return_value = response + self.mock_client.chat.create.return_value = mock_chat + + result = self.decorator.chat.create(model = "grok-4.3").sample() + + self.assertEqual(result, response) + self.mock_tracking_service.track_provider_reported_cost.assert_called_once() + call_kwargs = self.mock_tracking_service.track_provider_reported_cost.call_args.kwargs + self.assertEqual(call_kwargs["tool"], self.external_tool) + self.assertEqual(call_kwargs["provider_cost_credits"], 0.25) + self.assertEqual(call_kwargs["input_tokens"], 10) + self.assertEqual(call_kwargs["output_tokens"], 20) + self.assertEqual(call_kwargs["total_tokens"], 30) + self.mock_spending_service.deduct.assert_called_once_with(self.mock_configured_tool, 3.5) + + def test_chat_sample_calls_validate_pre_flight(self): + usage = Mock() + usage.cost_in_usd_ticks = 1 + response = Mock() + response.usage = usage + response.server_side_tool_usage = {} + mock_chat = Mock() + mock_chat.sample.return_value = response + self.mock_client.chat.create.return_value = mock_chat + + self.decorator.chat.create(model = "grok-4.3").sample() + + self.mock_spending_service.validate_pre_flight.assert_called_once_with(self.mock_configured_tool) + + def test_chat_sample_missing_cost_tracks_failure_without_deduction(self): + usage = Mock() + usage.cost_in_usd_ticks = None + response = Mock() + response.usage = usage + mock_chat = Mock() + mock_chat.sample.return_value = response + self.mock_client.chat.create.return_value = mock_chat + + with self.assertRaises(ExternalServiceError): + self.decorator.chat.create(model = "grok-4.3").sample() + + self.mock_tracking_service.track_provider_reported_cost.assert_not_called() + self.mock_tracking_service.track_text_model.assert_called_once() + self.assertTrue(self.mock_tracking_service.track_text_model.call_args.kwargs["is_failed"]) + self.mock_spending_service.deduct.assert_not_called() + def test_sample_deducts_credits(self): mock_response = Mock() self.mock_client.image.sample = Mock(return_value = mock_response) diff --git a/test/features/accounting/usage/test_usage_tracking_service.py b/test/features/accounting/usage/test_usage_tracking_service.py index e17ba08b..97349221 100644 --- a/test/features/accounting/usage/test_usage_tracking_service.py +++ b/test/features/accounting/usage/test_usage_tracking_service.py @@ -750,3 +750,41 @@ def test_track_web_search_query_payer_id_stored(self): ) self.assertEqual(records[0].payer_id, self.payer_id) self.assertTrue(records[0].uses_credits) + + def test_track_provider_reported_cost_uses_exact_cost_as_model_cost(self): + tool = self._create_search_tool() + record = self.service.track_provider_reported_cost( + tool = tool, + tool_purpose = ToolType.search, + runtime_seconds = 2.0, + payer_id = self.payer_id, + uses_credits = True, + provider_cost_credits = 3.75, + input_tokens = 10, + output_tokens = 20, + total_tokens = 30, + ) + + self.assertEqual(record.model_cost_credits, 3.75) + self.assertEqual(record.api_call_cost_credits, 0.0) + self.assertEqual(record.remote_runtime_cost_credits, 0.0) + self.assertEqual(record.maintenance_fee_credits, 1.0) + self.assertEqual(record.total_cost_credits, 4.75) + self.assertEqual(record.input_tokens, 10) + self.assertEqual(record.output_tokens, 20) + self.assertEqual(record.total_tokens, 30) + self.assertTrue(record.uses_credits) + + def test_track_provider_reported_cost_persists_to_repo(self): + tool = self._create_search_tool() + + self.service.track_provider_reported_cost( + tool = tool, + tool_purpose = ToolType.search, + runtime_seconds = 1.0, + payer_id = self.payer_id, + uses_credits = False, + provider_cost_credits = 1.25, + ) + + self.mock_di.usage_record_repo.create.assert_called_once() diff --git a/test/features/web_browsing/test_ai_web_search.py b/test/features/web_browsing/test_ai_web_search.py index fa12699a..99a08ca6 100644 --- a/test/features/web_browsing/test_ai_web_search.py +++ b/test/features/web_browsing/test_ai_web_search.py @@ -6,7 +6,7 @@ from di.di import DI from features.external_tools.configured_tool import ConfiguredTool from features.external_tools.external_tool import CostEstimate, ExternalTool, ExternalToolProvider, ToolType -from features.external_tools.external_tool_provider_library import GOOGLE_AI, PERPLEXITY +from features.external_tools.external_tool_provider_library import GOOGLE_AI, PERPLEXITY, XAI from features.web_browsing.ai_web_search import AIWebSearch from util.errors import ConfigurationError, ExternalServiceError @@ -136,6 +136,74 @@ def test_google_raises_on_empty_answer(self, mock_sources): AIWebSearch("query", self.configured_tool, self.di).execute() +class AIWebSearchXAITest(unittest.TestCase): + + def setUp(self): + self.di = _make_di() + self.configured_tool = _make_provider(XAI) + + def _make_client(self, content: str = "xai answer") -> Mock: + response = Mock() + response.content = content + response.citations = [] + response.inline_citations = [] + mock_chat = Mock() + mock_chat.sample.return_value = response + mock_client = Mock() + mock_client.chat.create.return_value = mock_chat + self.di.x_ai_client.return_value = mock_client + return mock_client + + @patch("features.web_browsing.ai_web_search.format_sources_from_xai", return_value = "\n\nSources:\n- [x](http://s)") + @patch("features.web_browsing.ai_web_search.x_search", return_value = "x-search-tool") + @patch("features.web_browsing.ai_web_search.web_search", return_value = "web-search-tool") + @patch("features.web_browsing.ai_web_search.user", return_value = "user-message") + @patch("features.web_browsing.ai_web_search.system", return_value = "system-message") + @patch("features.web_browsing.ai_web_search.prompt_resolvers") + def test_xai_path_uses_both_search_tools( + self, + mock_resolvers, + mock_system, + mock_user, + mock_web_search, + mock_x_search, + mock_sources, + ): + mock_resolvers.sentient_web_search.return_value = "system prompt" + mock_client = self._make_client() + + result = AIWebSearch("query", self.configured_tool, self.di).execute() + + self.assertIsInstance(result, AIMessage) + self.assertIn("xai answer", result.content) + mock_client.chat.create.assert_called_once() + call_kwargs = mock_client.chat.create.call_args.kwargs + self.assertEqual(call_kwargs["model"], self.configured_tool.definition.id) + self.assertEqual(call_kwargs["messages"], ["system-message", "user-message"]) + self.assertEqual(call_kwargs["tools"], ["web-search-tool", "x-search-tool"]) + self.assertEqual(call_kwargs["include"], ["inline_citations"]) + + @patch("features.web_browsing.ai_web_search.format_sources_from_xai", return_value = "") + @patch("features.web_browsing.ai_web_search.prompt_resolvers") + def test_xai_uses_xai_client(self, mock_resolvers, mock_sources): + mock_resolvers.sentient_web_search.return_value = "system prompt" + self._make_client() + + AIWebSearch("query", self.configured_tool, self.di).execute() + + self.di.x_ai_client.assert_called_once() + self.assertEqual(self.di.x_ai_client.call_args.args[0], self.configured_tool) + + @patch("features.web_browsing.ai_web_search.format_sources_from_xai", return_value = "") + @patch("features.web_browsing.ai_web_search.prompt_resolvers") + def test_xai_raises_on_empty_answer(self, mock_resolvers, mock_sources): + mock_resolvers.sentient_web_search.return_value = "system prompt" + self._make_client(content = "") + + with self.assertRaises(ExternalServiceError): + AIWebSearch("query", self.configured_tool, self.di).execute() + + class AIWebSearchProviderBranchingTest(unittest.TestCase): def test_unsupported_provider_raises_configuration_error(self): diff --git a/test/features/web_browsing/test_search_source_formatter.py b/test/features/web_browsing/test_search_source_formatter.py index 50a257fc..79146be1 100644 --- a/test/features/web_browsing/test_search_source_formatter.py +++ b/test/features/web_browsing/test_search_source_formatter.py @@ -5,6 +5,7 @@ from features.web_browsing.search_source_formatter import ( format_sources_from_google, format_sources_from_perplexity, + format_sources_from_xai, ) from util.errors import ExternalServiceError @@ -131,3 +132,25 @@ def test_google_empty_chunks_returns_empty(self): di = self._make_di() output = format_sources_from_google([], di) self.assertEqual(output, "") + + def test_xai_sources_from_citations(self): + di = self._make_di() + response = Mock() + response.citations = ["https://example.com/page"] + response.inline_citations = [] + + output = format_sources_from_xai(response, di) + + self.assertIn("Sources:", output) + self.assertIn("example.com", output) + self.assertIn("https://short.ly/abc", output) + + def test_xai_empty_sources_returns_empty(self): + di = self._make_di() + response = Mock() + response.citations = [] + response.inline_citations = [] + + output = format_sources_from_xai(response, di) + + self.assertEqual(output, "")