Skip to content

Make exit_plan_mode E2E snapshot tolerant of reworded CLI tool result#1639

Merged
stephentoub merged 1 commit into
mainfrom
stephentoub/fix-csharp-sdk-tests
Jun 12, 2026
Merged

Make exit_plan_mode E2E snapshot tolerant of reworded CLI tool result#1639
stephentoub merged 1 commit into
mainfrom
stephentoub/fix-csharp-sdk-tests

Conversation

@stephentoub

Copy link
Copy Markdown
Collaborator

Why

The Copilot SDK C# tests (non-blocking) leg in copilot-agent-runtime started failing on ModeHandlersE2ETests.Should_Invoke_Exit_Plan_Mode_Handler_When_Model_Uses_Tool (example: run 27386789669). The replay proxy returned 500 (CAPIError: 500 Proxy error) because the recorded snapshot no longer matched the request the CLI sends.

Root cause: the runtime reworded the exit_plan_mode post-approval tool result for interactive mode (copilot-agent-runtime src/runtime/src/tools/exit_plan_mode.rs, commit a0ed9a510e):

  • old: ...interactive mode (edits require manual approval). Proceed with implementing the plan.
  • new: ...interactive mode. Start implementing the plan now, in this same response. ...

That leg builds the CLI from source, so it feeds the new wording back into the conversation. The proxy matches the request against the stored conversation and could not find the new tool-result text.

The constraint

Simply swapping to the new text would break this repo's own CI: the published @github/copilot@1.0.61 pinned by the E2E harness (and used by every language leg) still emits the old wording. I verified this by inspecting the published package binary. So the snapshot has to satisfy both CLI versions during the transition.

Approach

Added a second conversation variant to the shared snapshot so the replay proxy matches either wording. The proxy matches a request as a strict prefix of any stored conversation and returns the next assistant message, so the old CLI resolves against conv0 and the new runtime against conv1. This follows the existing convention in the repo (64 snapshots already use multiple conversations, including version-keyed variants with explanatory comments). A comment documents why both variants exist.

This is a single shared fixture, so it covers the C# leg plus the Node, Python, Go, Java, and Rust legs that exercise the same scenario. The SDK never asserts on this CLI-internal text; it only checks handler invocation, events, and a non-null response.

Validation

Drove the real replay proxy at the HTTP level (where matching happens): loaded the updated snapshot and posted all three request shapes. Turn 1 returns the exit_plan_mode tool call; turn 2 with the old text and turn 2 with the new text both return 200 with the final assistant message.

Once a CLI version carrying the new wording is published and the harness pin is bumped, the old conv0 variant can be pruned.

The runtime reworded the exit_plan_mode post-approval tool result for interactive mode. The copilot-agent-runtime `C# SDK tests'' leg builds the CLI from source (new wording) and started failing because the recorded snapshot only contained the old wording, so the replay proxy could not match the request.

Published @github/copilot 1.0.61 (pinned by this repo's own E2E harness and all language legs) still emits the old wording, so the snapshot must satisfy both. Add a second conversation variant covering the new wording; the replay proxy matches a request as a strict prefix of any stored conversation, so old and new CLI versions both resolve. The SDK never asserts on this CLI-internal text.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 12, 2026 02:17
@stephentoub stephentoub requested a review from a team as a code owner June 12, 2026 02:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates a shared E2E replay snapshot to tolerate a runtime rewording of the exit_plan_mode post-approval tool result, ensuring the replay proxy can match requests from both the currently pinned CLI package and newer runtime-built CLIs.

Changes:

  • Added an explanatory comment documenting why two conversation variants are required for a transition period.
  • Added a second stored conversation (conv1) with the updated interactive-mode tool-result wording so the replay proxy can match either request history.
Show a summary per file
File Description
test/snapshots/mode_handlers/should_invoke_exit_plan_mode_handler_when_model_uses_tool.yaml Adds a second conversation variant to match either old or new exit_plan_mode tool-result text, plus documentation explaining the dual-variant intent.

Copilot's findings

  • Files reviewed: 1/1 changed files
  • Comments generated: 0

@stephentoub stephentoub merged commit 1600b57 into main Jun 12, 2026
31 checks passed
@stephentoub stephentoub deleted the stephentoub/fix-csharp-sdk-tests branch June 12, 2026 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants