Skip to content

feat: BlockSender interface for multi-modal input (images via content blocks) #48

@dmora

Description

@dmora

Context

agentrun's Process.Send(ctx, string) limits input to text. Modern LLMs support images, audio, and structured tool results. The Claude Code CLI already accepts multi-modal content blocks via stdin today — verified in production.

Tracked upstream in dmora/dors-orchestrator#81. This issue is the agentrun-specific implementation.

Verified wire format

Tested against Claude Code v2.1.72 with --input-format stream-json. The content field accepts either a string (text-only, current behavior) or an array of content blocks (Anthropic API format):

{"type":"user","message":{"role":"user","content":[
  {"type":"text","text":"What color is this single pixel image?"},
  {"type":"image","source":{"type":"base64","media_type":"image/png","data":"iVBOR..."}}
]}}

Result: Claude correctly identified a 1x1 red pixel PNG. No special flags needed beyond --input-format stream-json.

Prior art

The Rust claude-cli-sdk (v0.5.1) ships production multi-modal via UserContent::Image(ImageBlock) with Base64ImageSource{media_type, data} and UrlImageSource. Limits: 15 MiB max base64 payload, MIME types: image/jpeg, image/png, image/gif, image/webp.

Proposal

Add an optional BlockSender interface using the existing type-assertion extensibility pattern (same as Streamer, Resumer, InputFormatter).

Root package additions

// ContentBlock is a single content element in a prompt.
type ContentBlock struct {
    Type     string          `json:"type"`
    Text     string          `json:"text,omitempty"`
    Source   json.RawMessage `json:"source,omitempty"` // image source (base64/URL) — format follows Anthropic API
    MimeType string          `json:"mime_type,omitempty"`
}

// TextBlock creates a text-only ContentBlock.
func TextBlock(s string) ContentBlock

// ImageBase64Block creates an image ContentBlock from base64 data.
func ImageBase64Block(mimeType, data string) ContentBlock

// TextFromBlocks extracts concatenated text from blocks.
// Backends that only support text use this for graceful degradation.
func TextFromBlocks(blocks []ContentBlock) string

// BlockSender is an optional interface for processes that support structured content.
// Discovered via type assertion on Process:
//
//     if bs, ok := proc.(BlockSender); ok {
//         bs.SendBlocks(ctx, TextBlock("describe this"), ImageBase64Block("image/png", data))
//     }
type BlockSender interface {
    SendBlocks(ctx context.Context, blocks ...ContentBlock) error
}

Backend implementation

Backend Support How
Claude CLI (streaming) Full FormatInput encodes content as array of content blocks instead of string
Claude CLI (spawn-per-turn) Text-only TextFromBlocks() fallback — prompt is a CLI arg, not stdin
ACP Full ACP spec already supports content blocks in wire format
Codex/OpenCode Text-only TextFromBlocks() graceful degradation

Claude FormatInput change

Current:

stdinMsg := map[string]any{
    "type": "user",
    "message": map[string]any{
        "role":    "user",
        "content": message, // string
    },
}

With BlockSender:

stdinMsg := map[string]any{
    "type": "user",
    "message": map[string]any{
        "role":    "user",
        "content": blocks, // []ContentBlock — Anthropic API format
    },
}

Implementation notes

  • BlockSender is optional — Process.Send(string) continues to work for text-only
  • Type assertion pattern: if bs, ok := proc.(BlockSender); ok { ... }
  • CLI engine wraps streaming processes that implement InputFormatter with block support
  • Spawn-per-turn processes cannot support blocks (prompt is a CLI positional arg)
  • ContentBlock.Source is json.RawMessage to avoid importing image-specific types into root package
  • Validation: reject blocks with empty Type, enforce 15 MiB limit on base64 data

Out of scope

  • Audio content blocks (no backend supports them yet)
  • resource_link ACP content type (defer until needed)
  • URL image sources (base64 first, URL can follow)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions