Skip to content

test: extend stateful_responses_integration with tool-call scenarios #78

Description

@ashwing

Summary

Extend stateful_responses_integration.rs to cover tool-call scenarios using the multi-turn cassettes from PR #77.

The existing integration tests validate the full executor loop (execute() → store → rehydrate → continue) for text-only responses. With the tool-call cassettes now landed (6 vLLM + 6 OpenAI covering linear, streaming, branching, parallel, and tool-output-only patterns), we can add integration tests that exercise:

  1. Tool dispatch flowexecute() returns a function_call output item → caller provides function_call_output in next input → executor continues
  2. Store/rehydrate with tool state — verify that previous_response_id correctly rehydrates conversation history including tool call/output pairs
  3. Parallel tool calls — response contains multiple function_call items → multiple function_call_output items in next input
  4. Tool-output-only turn — input contains only function_call_output (no user message) → executor produces text continuation

Motivation

  • Validates correctness of the executor when tool calls are involved (not just the accumulator parsing, which PR test: stateful multi-turn tool-call cassettes with context retention #77 covers)
  • Proves storage layer correctly persists and rehydrates tool call history
  • Catches regressions in the dispatch → store → rehydrate chain that accumulator-only tests cannot

Proposed Test Cases (initial coverage)

# Scenario Cassette Source Key Assertion
1 3-turn tool dispatch (non-streaming) responses_tool_calls_3turn.yaml Each turn returns function_call; chained via previous_response_id
2 3-turn tool dispatch (streaming) responses_tool_calls_3turn_streaming.yaml SSE stream yields function_call events correctly
3 5-turn full pipeline responses_tool_calls_5turn.yaml All 5 tools dispatched in sequence, state retained
4 Branch divergence responses_tool_calls_branch.yaml Turn 3 branches from turn 1, gets different context
5 Parallel tool calls openai_responses_tool_calls_parallel.yaml 2 function_calls in response, 2 outputs in next input
6 Tool-output-only responses_tool_calls_tool_output_only.yaml No user message in turn 2, model produces text

Future edge cases (expand once tool dispatch stabilizes)

  • Invalid call_id in function_call_output — executor should reject or surface error
  • Missing output for one of N parallel calls — partial completion handling
  • Mixed output types (reasoning + function_call in same response — vLLM emits this)
  • store=false with tool calls — single-turn dispatch must still work without persistence
  • Rehydration fidelity — retrieve stored response, verify tool call/output pairs survive round-trip
  • LoopDecision routing — executor returns "needs tool output" vs "complete" to caller
  • Concurrent branches from same previous_response_id — store handles correctly under contention

Implementation Notes

  • Follow the existing TestFixture pattern in tests/support/mod.rs
  • Will need to extend make_request (or add a variant) that accepts function_call_output input items
  • Mock HTTP server already supports non-streaming and streaming responses via cassettes
  • May be blocked on executor actually handling function_call output items in the dispatch path (depends on MCP/tool execution story)

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions