-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Summary
Add support for returning images (and other binary content) from tool results, enabling a fetch_screenshot tool that returns page screenshots for Claude to interpret visually.
Background
While implementing Chrome profile support for fetch_html, we discovered that:
- The readability algorithm strips content from feed-style pages (Twitter/X, etc.)
- Screenshots would be ideal for visual content Claude can interpret directly
- chromiumoxide supports full-page screenshots via
page.screenshot()
Current Limitation
The genai library's ToolResponse only supports string content:
// lib/genai/src/chat/tool/tool_response.rs
pub struct ToolResponse {
pub call_id: String,
pub content: String, // <-- String only
}And the Anthropic adapter serializes it as a simple string:
// lib/genai/src/adapter/adapters/anthropic/adapter_impl.rs:570-574
values.push(json!({
"type": "tool_result",
"content": tool_response.content, // <-- Just a string
"tool_use_id": tool_response.call_id,
}));What Anthropic API Actually Supports
Anthropic's API accepts rich content in tool_result, including images:
{
"type": "tool_result",
"tool_use_id": "toolu_...",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": "iVBORw0KGgo..."
}
},
{
"type": "text",
"text": "Screenshot of the page"
}
]
}Proposed Changes
1. Modify genai's ToolResponse
Change content from String to support rich content:
pub struct ToolResponse {
pub call_id: String,
pub content: ToolResponseContent,
}
pub enum ToolResponseContent {
Text(String),
Parts(Vec<ContentPart>),
}Or simply use MessageContent:
pub struct ToolResponse {
pub call_id: String,
pub content: MessageContent,
}2. Update Anthropic Adapter
Serialize images properly when present in tool results.
3. Update Codey's Agent
Change submit_tool_result signature:
// Current
pub fn submit_tool_result(&mut self, call_id: &str, content: String)
// New
pub fn submit_tool_result(&mut self, call_id: &str, content: impl Into<MessageContent>)4. Add fetch_screenshot Tool
pub async fn fetch_screenshot(url: &str) -> Result<Vec<u8>, String> {
// Uses same browser infrastructure as fetch_html
// Returns PNG bytes
}Use Cases
- Twitter/X feeds - Readability strips most content; screenshots preserve full context
- Dashboards - Visual layouts don't convert well to markdown
- Charts/graphs - Better interpreted visually
- Any SPA - Complex rendered content
Related
- PR Add Chrome profile support to fetch_html tool #45: Chrome profile support for fetch_html
- Issue Refactor tool config: replace global state with proper dependency injection #46: Refactor tool config (global state)
References
- Anthropic Tool Use Docs
- genai ContentPart already supports Binary:
ContentPart::from_binary_base64("image/png", data, None) - chromiumoxide screenshot:
page.screenshot(ScreenshotParams::builder().full_page(true).build())
Metadata
Metadata
Assignees
Labels
No labels