Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion docs/_advanced/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,12 @@ This guide focuses on upgrade-impacting changes: migrations, token semantics, de

## How to Upgrade

No generator is required for the token and cost API changes in 1.15.
No generator is required for the token, cost, and tool-search API changes in 1.15.

If you use the Rails integration and already ran the v1.9 migration, no new columns are needed. The new `cache_read_tokens` and `cache_write_tokens` helpers use the existing `cached_tokens` and `cache_creation_tokens` columns.

Tool search is fully additive — if you don't use `defer:` or `deferred`, nothing changes.

## Token Semantics Changed

RubyLLM now normalizes prompt cache usage before exposing token counts. From 1.15 onward, `response.tokens.input` means standard input tokens. When a provider includes cache reads or cache writes in its raw prompt token total, RubyLLM subtracts those cache buckets and exposes them separately.
Expand Down Expand Up @@ -79,6 +81,10 @@ Cost helpers are available from 1.15 onward. They return `nil` for any cost buck

See [Tracking Token Usage]({% link _core_features/chat.md %}#tracking-token-usage) for the provider comparison table and the exact normalized token semantics RubyLLM exposes.

## Tool Search (Anthropic)

1.15 adds tool search in a fully additive way. `RubyLLM::Chat#with_tool` / `#with_tools` accept a new `defer:` keyword argument, and `RubyLLM::Tool` exposes a class-level `deferred` DSL. On Anthropic this translates to the native `defer_loading: true` flag plus the `tool_search_tool_bm25_20251119` primitive: deferred tools stay out of the system-prompt prefix and Claude loads the ones it actually needs server-side. On other providers `defer:` is ignored with a one-time warning. If you don't use `defer:` or `deferred`, nothing changes. See [Tool Search]({% link _core_features/tool-search.md %}).

# Upgrade to 1.14

## How to Upgrade
Expand Down
113 changes: 113 additions & 0 deletions docs/_core_features/tool-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
layout: default
title: Tool Search
nav_order: 9
description: Keep large tool catalogs out of Claude's prompt prefix. Mark tools as deferred and let Anthropic's server-side tool-search primitive load them on demand.
redirect_from:
- /guides/tool-search
---

# {{ page.title }}
{: .d-inline-block .no_toc }

New in 1.15
{: .label .label-green }

{{ page.description }}
{: .fs-6 .fw-300 }

## Table of contents
{: .no_toc .text-delta }

1. TOC
{:toc}

---

After reading this guide, you will know:

* When deferred tool loading helps.
* How to mark tools as deferred.
* How Anthropic loads deferred tools at runtime.
* How to observe which tools the model loaded.

## When to use it

When a `RubyLLM::Chat` is wired to many tools — especially across one or more MCP servers — every tool's full JSON Schema ships in the system-prompt prefix on every turn. Three real costs follow:

1. **Token bloat.** Hundreds of tools can add tens of thousands of tokens per request.
2. **Prompt-cache eviction.** Adding or removing tools changes the prefix and invalidates the cache.
3. **Selection accuracy.** Models choose worse tools when the menu is long.

This translates Anthropic's [tool search tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool) feature: mark tools as `deferred` and RubyLLM forwards `defer_loading: true` to Anthropic's API, which hides the schemas from Claude until a server-side BM25 primitive loads the tools the conversation actually needs.

**This feature currently only supports Anthropic.** On other providers, `defer: true` is silently coerced to regular registration (a warning is logged once).

## Marking tools as deferred

### Per-class DSL

```ruby
class DeepResearchTool < RubyLLM::Tool
description "Runs a multi-step web search..."
deferred # class-level DSL

param :query, desc: "..."
def execute(query:); ...; end
end
```

### Per-call, for bulk registration (MCP case)

```ruby
chat = RubyLLM.chat(model: "claude-sonnet-4-6")
chat.with_tools(*mcp_client.tools, defer: true)
```

Per-call `defer: true` overrides a non-deferred class; `defer: false` overrides a `deferred` class.

## How Claude loads deferred tools

On Anthropic, `defer: true` translates to two things in the request payload:

1. `defer_loading: true` on each deferred tool's function entry.
2. A `tool_search_tool_bm25_20251119` primitive appended to the tools array.

Claude then runs the search server-side, loads the matching tools via a `tool_reference` mechanism, and calls them directly. RubyLLM parses the `tool_search_tool_result` blocks and moves the referenced tools from `chat.tool_catalog.deferred_tools` into the active `chat.tools` so the next turn can dispatch them normally.

## Observing what was loaded

```ruby
chat.after_tool_search do |event|
# event.query # nil for Anthropic-native — Claude runs the search server-side
# event.results # Array of promoted tool name Symbols
Rails.logger.info("tool_search loaded: #{event.results}")
end
```

Inspect state:

```ruby
chat.tool_catalog # => #<RubyLLM::ToolCatalog deferred=42 loaded=3>
chat.tool_catalog.deferred_tools # Hash of deferred tool name => Tool
chat.tool_catalog.loaded_tools # Set of promoted tool name symbols
```

## Kill switch

```ruby
RubyLLM.configure do |c|
c.tool_search_enabled = false # default true
end
```

When false, `defer: true` is coerced to regular registration and a warning is logged once per chat.

## Non-Anthropic providers

On OpenAI, Gemini, and Bedrock, `defer: true` is ignored and a warning is logged once — the tool registers normally. A follow-up release may add client-side emulation for these providers.

## Further reading

* [Anthropic tool search tool](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool)
* [Tools guide]({% link _core_features/tools.md %})
2 changes: 2 additions & 0 deletions docs/_core_features/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -527,6 +527,8 @@ end

For MCP server integration, check out the community-maintained [`ruby_llm-mcp`](https://github.com/patvice/ruby_llm-mcp) gem.

When a chat is wired to many tools — especially across MCP servers — see [Tool Search]({% link _core_features/tool-search.md %}) for how to defer tool schemas and let the model load only the ones it needs.

## Debugging Tools

Set the `RUBYLLM_DEBUG` environment variable to see detailed logging, including tool calls and results.
Expand Down
96 changes: 85 additions & 11 deletions lib/ruby_llm/chat.rb
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
# frozen_string_literal: true

require 'set'

module RubyLLM
# Represents a conversation with an AI model
class Chat
include Enumerable

attr_reader :model, :messages, :tools, :tool_prefs, :params, :headers, :schema
attr_reader :model, :messages, :tools, :tool_prefs, :params, :headers, :schema, :tool_catalog

def initialize(model: nil, provider: nil, assume_model_exists: false, context: nil)
if assume_model_exists && !provider
Expand All @@ -19,6 +21,7 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
@temperature = nil
@messages = []
@tools = {}
@tool_catalog = ToolCatalog.new
@tool_prefs = { choice: nil, calls: nil }
@params = {}
@headers = {}
Expand All @@ -28,7 +31,8 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
new_message: nil,
end_message: nil,
tool_call: nil,
tool_result: nil
tool_result: nil,
tool_search: nil
}
@callbacks = Hash.new { |callbacks, name| callbacks[name] = [] }
end
Expand All @@ -52,18 +56,18 @@ def with_instructions(instructions, append: false, replace: nil)
self
end

def with_tool(tool, choice: nil, calls: nil)
unless tool.nil?
tool_instance = tool.is_a?(Class) ? tool.new : tool
@tools[tool_instance.name.to_sym] = tool_instance
end
def with_tool(tool, defer: nil, choice: nil, calls: nil)
register_tool(tool, defer: defer) unless tool.nil?
update_tool_options(choice:, calls:)
self
end

def with_tools(*tools, replace: false, choice: nil, calls: nil)
@tools.clear if replace
tools.compact.each { |tool| with_tool tool }
def with_tools(*tools, replace: false, defer: nil, choice: nil, calls: nil)
if replace
@tools.clear
@tool_catalog = ToolCatalog.new
end
tools.compact.each { |tool| with_tool tool, defer: defer }
update_tool_options(choice:, calls:)
self
end
Expand Down Expand Up @@ -145,6 +149,14 @@ def after_tool_result(&)
add_callback(:after_tool_result, &)
end

def on_tool_search(&)
set_legacy_callback(:tool_search, :on_tool_search, :after_tool_search, &)
end

def after_tool_search(&)
add_callback(:after_tool_search, &)
end

def each(&)
messages.each(&)
end
Expand All @@ -156,7 +168,7 @@ def cost
def complete(&)
response = @provider.complete(
messages,
tools: @tools,
tools: effective_tools,
tool_prefs: @tool_prefs,
temperature: @temperature,
model: @model,
Expand All @@ -178,6 +190,7 @@ def complete(&)
end

add_message response
promote_from_tool_references(response)
run_callbacks(:after_message, :end_message, response)

if response.tool_call?
Expand All @@ -203,6 +216,25 @@ def instance_variables

private

# Promotes deferred tools that a provider's native tool-search primitive
# loaded via +message.tool_references+. The resulting +SearchEvent+
# carries +query: nil+ to signal the native path.
def promote_from_tool_references(message)
names = Array(message.tool_references)
return self if names.empty? || @tool_catalog.empty?

promoted = names.filter_map do |name|
tool = @tool_catalog.promote(name)
next unless tool

@tools[tool.name.to_sym] = tool
tool.name.to_sym
end

run_callbacks(:after_tool_search, :tool_search, Tool::SearchEvent.new(nil, promoted)) unless promoted.empty?
self
end

def normalize_schema_payload(raw_schema)
return nil if raw_schema.nil?
return raw_schema unless raw_schema.is_a?(Hash)
Expand Down Expand Up @@ -370,6 +402,48 @@ def content_like?(object)
object.is_a?(Content) || object.is_a?(Content::Raw)
end

def effective_tools
active = @tools.transform_values { |t| Tool::Registration.new(t, deferred: false) }
return active if @tool_catalog.empty?

deferred = @tool_catalog.available.transform_values { |t| Tool::Registration.new(t, deferred: true) }
deferred.merge(active)
end

def register_tool(tool, defer:)
tool_instance = tool.is_a?(Class) ? tool.new : tool

if defer_allowed?(tool_instance, defer)
@tool_catalog.add(tool_instance)
else
@tools[tool_instance.name.to_sym] = tool_instance
end
end

def defer_allowed?(tool, explicit)
return false unless explicit.nil? ? tool.deferred? : explicit == true

unless @config.tool_search_enabled
warn_deferred_ignored('tool_search_enabled is false')
return false
end

unless @provider.supports_deferred_loading?
warn_deferred_ignored("provider #{@provider.slug} does not support deferred tool loading")
return false
end

true
end

def warn_deferred_ignored(reason)
@deferred_warnings ||= Set.new
return if @deferred_warnings.include?(reason)

@deferred_warnings << reason
RubyLLM.logger.warn("Ignoring defer: true — #{reason}")
end

def append_system_instruction(instructions)
system_messages, non_system_messages = @messages.partition { |msg| msg.role == :system }
system_messages << Message.new(role: :system, content: instructions)
Expand Down
2 changes: 2 additions & 0 deletions lib/ruby_llm/configuration.rb
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ def defaults = @defaults ||= {}
option :log_stream_debug, -> { ENV['RUBYLLM_STREAM_DEBUG'] == 'true' }
option :log_regexp_timeout, -> { Regexp.respond_to?(:timeout) ? (Regexp.timeout || 1.0) : nil }

option :tool_search_enabled, true

def initialize
self.class.send(:defaults).each do |key, default|
value = default.respond_to?(:call) ? instance_exec(&default) : default
Expand Down
3 changes: 2 additions & 1 deletion lib/ruby_llm/message.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ module RubyLLM
class Message
ROLES = %i[system user assistant tool].freeze

attr_reader :role, :model_id, :tool_calls, :tool_call_id, :raw, :thinking, :tokens
attr_reader :role, :model_id, :tool_calls, :tool_call_id, :raw, :thinking, :tokens, :tool_references
attr_writer :content

def initialize(options = {})
Expand All @@ -24,6 +24,7 @@ def initialize(options = {})
)
@raw = options[:raw]
@thinking = options[:thinking]
@tool_references = Array(options[:tool_references])

ensure_valid_role
end
Expand Down
4 changes: 4 additions & 0 deletions lib/ruby_llm/provider.rb
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,10 @@ def assume_models_exist?
self.class.assume_models_exist?
end

def supports_deferred_loading?
false
end

def parse_error(response)
return if response.body.empty?

Expand Down
4 changes: 4 additions & 0 deletions lib/ruby_llm/providers/anthropic.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ def headers
}
end

def supports_deferred_loading?
true
end

class << self
def capabilities
Anthropic::Capabilities
Expand Down
Loading
Loading