Skip to content

Support images in tool results (for fetch_screenshot) #47

@tcdent

Description

@tcdent

Summary

Add support for returning images (and other binary content) from tool results, enabling a fetch_screenshot tool that returns page screenshots for Claude to interpret visually.

Background

While implementing Chrome profile support for fetch_html, we discovered that:

  1. The readability algorithm strips content from feed-style pages (Twitter/X, etc.)
  2. Screenshots would be ideal for visual content Claude can interpret directly
  3. chromiumoxide supports full-page screenshots via page.screenshot()

Current Limitation

The genai library's ToolResponse only supports string content:

// lib/genai/src/chat/tool/tool_response.rs
pub struct ToolResponse {
    pub call_id: String,
    pub content: String,  // <-- String only
}

And the Anthropic adapter serializes it as a simple string:

// lib/genai/src/adapter/adapters/anthropic/adapter_impl.rs:570-574
values.push(json!({
    "type": "tool_result",
    "content": tool_response.content,  // <-- Just a string
    "tool_use_id": tool_response.call_id,
}));

What Anthropic API Actually Supports

Anthropic's API accepts rich content in tool_result, including images:

{
  "type": "tool_result",
  "tool_use_id": "toolu_...",
  "content": [
    {
      "type": "image",
      "source": {
        "type": "base64",
        "media_type": "image/png",
        "data": "iVBORw0KGgo..."
      }
    },
    {
      "type": "text",
      "text": "Screenshot of the page"
    }
  ]
}

Proposed Changes

1. Modify genai's ToolResponse

Change content from String to support rich content:

pub struct ToolResponse {
    pub call_id: String,
    pub content: ToolResponseContent,
}

pub enum ToolResponseContent {
    Text(String),
    Parts(Vec<ContentPart>),
}

Or simply use MessageContent:

pub struct ToolResponse {
    pub call_id: String,
    pub content: MessageContent,
}

2. Update Anthropic Adapter

Serialize images properly when present in tool results.

3. Update Codey's Agent

Change submit_tool_result signature:

// Current
pub fn submit_tool_result(&mut self, call_id: &str, content: String)

// New
pub fn submit_tool_result(&mut self, call_id: &str, content: impl Into<MessageContent>)

4. Add fetch_screenshot Tool

pub async fn fetch_screenshot(url: &str) -> Result<Vec<u8>, String> {
    // Uses same browser infrastructure as fetch_html
    // Returns PNG bytes
}

Use Cases

  • Twitter/X feeds - Readability strips most content; screenshots preserve full context
  • Dashboards - Visual layouts don't convert well to markdown
  • Charts/graphs - Better interpreted visually
  • Any SPA - Complex rendered content

Related

References

  • Anthropic Tool Use Docs
  • genai ContentPart already supports Binary: ContentPart::from_binary_base64("image/png", data, None)
  • chromiumoxide screenshot: page.screenshot(ScreenshotParams::builder().full_page(true).build())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions