Skip to content

fix(adapter): handle array/object content in messagesToPrompt ([object Object] bug)#3

Open
byungsker wants to merge 1 commit into
joesobo:mainfrom
byungsker:fix/content-parts-array
Open

fix(adapter): handle array/object content in messagesToPrompt ([object Object] bug)#3
byungsker wants to merge 1 commit into
joesobo:mainfrom
byungsker:fix/content-parts-array

Conversation

@byungsker

Copy link
Copy Markdown

Problem

messagesToPrompt assumes msg.content is always a string. The OpenAI Chat Completions spec also permits content as an array of content parts for multimodal / structured messages:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "hello" },
    { "type": "image_url", "image_url": { "url": "https://…" } }
  ]
}

When a client sends the array form, the existing code path (parts.push(msg.content) / \${msg.content}`) hits Array#toString→ each element's defaultObject#toString"[object Object]"joined by commas. The Claude CLI then receives[object Object],[object Object]as its prompt and the model responds about *"receiving[object Object]` messages"* instead of answering.

Reproduction

curl -s -X POST http://localhost:3456/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer not-needed" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [{
      "role": "user",
      "content": [{ "type": "text", "text": "reply with FIXED" }]
    }],
    "max_tokens": 20
  }'

Before: Claude replies about [object Object].
After: Claude replies FIXED.

I hit this wiring the proxy into OpenClaw — its Discord/Telegram pipeline legitimately emits array-form content for every inbound user message.

Summary

  • Add a contentToString(content) helper that handles the string / array-of-parts / bare-object / null cases the OpenAI spec (and real-world clients) actually emit.
  • Use it in messagesToPrompt for all three roles (system / user / assistant).
  • Widen OpenAIChatMessage.content to string | OpenAIContentPart[] | null so TypeScript consumers stop having to narrow manually.
  • Add an OpenAIContentPart type that covers text and image_url parts, with an open-ended fallback for forward compatibility.

Behavior

Input shape Result
"hi" "hi" (unchanged)
[{ type: "text", text: "hi" }] "hi"
[{ type: "text", text: "a" }, { type: "image_url", ... }] "a\n[image]"
[{ type: "custom", foo: 1 }] '{"type":"custom","foo":1}' (no silent drop)
null / undefined "" (filtered before join)

Kept the [image] placeholder rather than dropping image parts entirely — the claude --print path is text-only, but losing the signal that an image was in the original message felt worse than keeping a breadcrumb.

Test plan

  • npm run build — clean, no type errors.
  • Manual sanity run of the built messagesToPrompt against mixed messages (string content, text-only array, text+image array, null) confirms no [object Object] in the output and each role renders as expected.
  • End-to-end: same curl repro against a locally-running patched proxy returns the intended reply, not the [object Object] complaint.
  • String-content requests (the existing happy path) unchanged — all three role branches still emit the same prefix/suffix wrapping.

No existing tests in the repo; I kept the diff surgical and skipped adding a test harness to avoid scope creep. Happy to follow up with a dedicated test file if you'd like.

Notes

  • Upstream issues are disabled so I couldn't file one to link here — full context lives in this PR body.
  • dist/ is gitignored so only the two source files are touched.

OpenAI Chat Completions `content` can be either a string or a content-parts
array (e.g. `[{ type: "text", text: "..." }, { type: "image_url", ... }]`).
`messagesToPrompt` previously interpolated `msg.content` directly into template
literals, so the array form stringified via `Array#toString` -> each element's
default `Object#toString` -> `"[object Object]"` separated by commas.

Downstream clients that send structured content (e.g. OpenClaw's inbound
message pipeline) end up feeding Claude CLI prompts like
`[object Object],[object Object]`, and the model naturally responds about
"receiving [object Object] messages".

Introduce a `contentToString` helper and thread it through `messagesToPrompt`:

- strings pass through unchanged
- array parts: extract `text` for text parts, `"[image]"` placeholder for
  `image_url` parts (Claude CLI --print mode is text-only), JSON.stringify
  for unknown shapes so no info is silently dropped
- null/undefined -> empty string (and filtered out before join)
- bare objects -> JSON.stringify as a defensive fallback

Also widen `OpenAIChatMessage.content` to the union type permitted by the
OpenAI spec, so TypeScript consumers stop having to narrow manually.
nayrosk added a commit to nayrosk/claude-max-api-proxy that referenced this pull request Apr 21, 2026
Apply upstream PR #2 on top of existing content-parts fix (PR joesobo#3).

Encoder (openai-to-cli):
- Inject Tool-Use Protocol into system prompt when request has tools.
- Map tool_choice (auto/none/required/specific) to explicit policy text.
- Render role=tool as <tool_result id="..."> and assistant tool_calls as
  <tool_call> blocks so multi-turn tool flows re-hydrate.
- Preserve contentToString from PR joesobo#3 and apply it across all branches so
  array/multimodal content keeps working alongside the new flow.

Decoder (cli-to-openai):
- extractToolCalls scans Claude's text for <tool_call> blocks, returns
  OpenAI tool_calls plus cleaned text. Robust to missing id/args,
  malformed JSON, and unterminated blocks.
- cliResultToOpenai / cliToOpenaiChunk accept extractTools flag and flip
  finish_reason to "tool_calls" when any call is parsed.

Routes: pass hasTools through, buffer stream content when tools active
(partial JSON unsafe to stream), emit parsed tool_calls in final chunk.
Adds per-request timing logs + opt-in DEBUG=1 prompt dump.

Types: full OpenAI v1 alignment — discriminated message union, tool /
tool_choice / tool_call / delta, nullable assistant content, "tool_calls"
finish reason. OpenAIContentPart kept and widened across system / user /
assistant content.

Tests: 14 new cases in src/adapter/tool-calling.test.ts cover encoder
(hasTools, protocol injection, tool_choice variants, multi-turn) and
decoder (single/multiple blocks, missing id/args, custom id, malformed,
unterminated, plain-text passthrough). All pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant