Context
agentrun's Process.Send(ctx, string) limits input to text. Modern LLMs support images, audio, and structured tool results. The Claude Code CLI already accepts multi-modal content blocks via stdin today — verified in production.
Tracked upstream in dmora/dors-orchestrator#81. This issue is the agentrun-specific implementation.
Verified wire format
Tested against Claude Code v2.1.72 with --input-format stream-json. The content field accepts either a string (text-only, current behavior) or an array of content blocks (Anthropic API format):
{"type":"user","message":{"role":"user","content":[
{"type":"text","text":"What color is this single pixel image?"},
{"type":"image","source":{"type":"base64","media_type":"image/png","data":"iVBOR..."}}
]}}
Result: Claude correctly identified a 1x1 red pixel PNG. No special flags needed beyond --input-format stream-json.
Prior art
The Rust claude-cli-sdk (v0.5.1) ships production multi-modal via UserContent::Image(ImageBlock) with Base64ImageSource{media_type, data} and UrlImageSource. Limits: 15 MiB max base64 payload, MIME types: image/jpeg, image/png, image/gif, image/webp.
Proposal
Add an optional BlockSender interface using the existing type-assertion extensibility pattern (same as Streamer, Resumer, InputFormatter).
Root package additions
// ContentBlock is a single content element in a prompt.
type ContentBlock struct {
Type string `json:"type"`
Text string `json:"text,omitempty"`
Source json.RawMessage `json:"source,omitempty"` // image source (base64/URL) — format follows Anthropic API
MimeType string `json:"mime_type,omitempty"`
}
// TextBlock creates a text-only ContentBlock.
func TextBlock(s string) ContentBlock
// ImageBase64Block creates an image ContentBlock from base64 data.
func ImageBase64Block(mimeType, data string) ContentBlock
// TextFromBlocks extracts concatenated text from blocks.
// Backends that only support text use this for graceful degradation.
func TextFromBlocks(blocks []ContentBlock) string
// BlockSender is an optional interface for processes that support structured content.
// Discovered via type assertion on Process:
//
// if bs, ok := proc.(BlockSender); ok {
// bs.SendBlocks(ctx, TextBlock("describe this"), ImageBase64Block("image/png", data))
// }
type BlockSender interface {
SendBlocks(ctx context.Context, blocks ...ContentBlock) error
}
Backend implementation
| Backend |
Support |
How |
| Claude CLI (streaming) |
Full |
FormatInput encodes content as array of content blocks instead of string |
| Claude CLI (spawn-per-turn) |
Text-only |
TextFromBlocks() fallback — prompt is a CLI arg, not stdin |
| ACP |
Full |
ACP spec already supports content blocks in wire format |
| Codex/OpenCode |
Text-only |
TextFromBlocks() graceful degradation |
Claude FormatInput change
Current:
stdinMsg := map[string]any{
"type": "user",
"message": map[string]any{
"role": "user",
"content": message, // string
},
}
With BlockSender:
stdinMsg := map[string]any{
"type": "user",
"message": map[string]any{
"role": "user",
"content": blocks, // []ContentBlock — Anthropic API format
},
}
Implementation notes
BlockSender is optional — Process.Send(string) continues to work for text-only
- Type assertion pattern:
if bs, ok := proc.(BlockSender); ok { ... }
- CLI engine wraps streaming processes that implement
InputFormatter with block support
- Spawn-per-turn processes cannot support blocks (prompt is a CLI positional arg)
ContentBlock.Source is json.RawMessage to avoid importing image-specific types into root package
- Validation: reject blocks with empty Type, enforce 15 MiB limit on base64 data
Out of scope
- Audio content blocks (no backend supports them yet)
resource_link ACP content type (defer until needed)
- URL image sources (base64 first, URL can follow)
Context
agentrun's
Process.Send(ctx, string)limits input to text. Modern LLMs support images, audio, and structured tool results. The Claude Code CLI already accepts multi-modal content blocks via stdin today — verified in production.Tracked upstream in dmora/dors-orchestrator#81. This issue is the agentrun-specific implementation.
Verified wire format
Tested against Claude Code v2.1.72 with
--input-format stream-json. Thecontentfield accepts either a string (text-only, current behavior) or an array of content blocks (Anthropic API format):{"type":"user","message":{"role":"user","content":[ {"type":"text","text":"What color is this single pixel image?"}, {"type":"image","source":{"type":"base64","media_type":"image/png","data":"iVBOR..."}} ]}}Result: Claude correctly identified a 1x1 red pixel PNG. No special flags needed beyond
--input-format stream-json.Prior art
The Rust claude-cli-sdk (v0.5.1) ships production multi-modal via
UserContent::Image(ImageBlock)withBase64ImageSource{media_type, data}andUrlImageSource. Limits: 15 MiB max base64 payload, MIME types: image/jpeg, image/png, image/gif, image/webp.Proposal
Add an optional
BlockSenderinterface using the existing type-assertion extensibility pattern (same asStreamer,Resumer,InputFormatter).Root package additions
Backend implementation
FormatInputencodescontentas array of content blocks instead of stringTextFromBlocks()fallback — prompt is a CLI arg, not stdinTextFromBlocks()graceful degradationClaude
FormatInputchangeCurrent:
With BlockSender:
Implementation notes
BlockSenderis optional —Process.Send(string)continues to work for text-onlyif bs, ok := proc.(BlockSender); ok { ... }InputFormatterwith block supportContentBlock.Sourceisjson.RawMessageto avoid importing image-specific types into root packageOut of scope
resource_linkACP content type (defer until needed)