Summary
The streaming ResponseAccumulator relies on response.output_item.added to create in-flight state before processing deltas. GPT-oss (harmony path) doesn't emit this event for the message item after reasoning — it transitions directly:
response.output_item.added → type: "reasoning"
response.reasoning_text.delta → ...
response.reasoning_text.done
response.output_item.done → reasoning completed
response.output_text.done → "HELLO" ← no preceding output_item.added for message
response.output_item.done → type: "message"
response.completed → full output in payload
Qwen3 emits output_item.added for both reasoning and message items, so it works correctly.
Impact
- Streaming accumulation produces
[Reasoning(...)] but drops the message text
- The
response.completed payload still contains the full output (both items), so non-streaming and the final response are correct
- Only affects GPT-oss (harmony path); Qwen3 (non-harmony) works correctly
Possible fixes
- Handle
output_text.done without active message — auto-create a message item when text arrives with no in-flight message
- Handle
output_item.done with type "message" — extract the completed item from the done event payload
- Extract output from
response.completed — use the completed event's full output array as fallback when streaming accumulation missed items
Option 1 seems most consistent with the existing pattern. Would like to hear thoughts on preferred approach.
Discovered in
PR #59 cassette tests — reasoning-single-openai-gpt-oss-20b-streaming.yaml
cc @maralbahari @franciscojavierarceo
Summary
The streaming
ResponseAccumulatorrelies onresponse.output_item.addedto create in-flight state before processing deltas. GPT-oss (harmony path) doesn't emit this event for the message item after reasoning — it transitions directly:Qwen3 emits
output_item.addedfor both reasoning and message items, so it works correctly.Impact
[Reasoning(...)]but drops the message textresponse.completedpayload still contains the full output (both items), so non-streaming and the final response are correctPossible fixes
output_text.donewithout active message — auto-create a message item when text arrives with no in-flight messageoutput_item.donewith type "message" — extract the completed item from the done event payloadresponse.completed— use the completed event's full output array as fallback when streaming accumulation missed itemsOption 1 seems most consistent with the existing pattern. Would like to hear thoughts on preferred approach.
Discovered in
PR #59 cassette tests —
reasoning-single-openai-gpt-oss-20b-streaming.yamlcc @maralbahari @franciscojavierarceo