Make exit_plan_mode E2E snapshot tolerant of reworded CLI tool result#1639
Merged
Conversation
The runtime reworded the exit_plan_mode post-approval tool result for interactive mode. The copilot-agent-runtime `C# SDK tests'' leg builds the CLI from source (new wording) and started failing because the recorded snapshot only contained the old wording, so the replay proxy could not match the request. Published @github/copilot 1.0.61 (pinned by this repo's own E2E harness and all language legs) still emits the old wording, so the snapshot must satisfy both. Add a second conversation variant covering the new wording; the replay proxy matches a request as a strict prefix of any stored conversation, so old and new CLI versions both resolve. The SDK never asserts on this CLI-internal text. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates a shared E2E replay snapshot to tolerate a runtime rewording of the exit_plan_mode post-approval tool result, ensuring the replay proxy can match requests from both the currently pinned CLI package and newer runtime-built CLIs.
Changes:
- Added an explanatory comment documenting why two conversation variants are required for a transition period.
- Added a second stored conversation (
conv1) with the updated interactive-mode tool-result wording so the replay proxy can match either request history.
Show a summary per file
| File | Description |
|---|---|
test/snapshots/mode_handlers/should_invoke_exit_plan_mode_handler_when_model_uses_tool.yaml |
Adds a second conversation variant to match either old or new exit_plan_mode tool-result text, plus documentation explaining the dual-variant intent. |
Copilot's findings
- Files reviewed: 1/1 changed files
- Comments generated: 0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The
Copilot SDK C# tests (non-blocking)leg in copilot-agent-runtime started failing onModeHandlersE2ETests.Should_Invoke_Exit_Plan_Mode_Handler_When_Model_Uses_Tool(example: run 27386789669). The replay proxy returned500(CAPIError: 500 Proxy error) because the recorded snapshot no longer matched the request the CLI sends.Root cause: the runtime reworded the
exit_plan_modepost-approval tool result for interactive mode (copilot-agent-runtimesrc/runtime/src/tools/exit_plan_mode.rs, commita0ed9a510e):...interactive mode (edits require manual approval). Proceed with implementing the plan....interactive mode. Start implementing the plan now, in this same response. ...That leg builds the CLI from source, so it feeds the new wording back into the conversation. The proxy matches the request against the stored conversation and could not find the new tool-result text.
The constraint
Simply swapping to the new text would break this repo's own CI: the published
@github/copilot@1.0.61pinned by the E2E harness (and used by every language leg) still emits the old wording. I verified this by inspecting the published package binary. So the snapshot has to satisfy both CLI versions during the transition.Approach
Added a second conversation variant to the shared snapshot so the replay proxy matches either wording. The proxy matches a request as a strict prefix of any stored conversation and returns the next assistant message, so the old CLI resolves against
conv0and the new runtime againstconv1. This follows the existing convention in the repo (64 snapshots already use multiple conversations, including version-keyed variants with explanatory comments). A comment documents why both variants exist.This is a single shared fixture, so it covers the C# leg plus the Node, Python, Go, Java, and Rust legs that exercise the same scenario. The SDK never asserts on this CLI-internal text; it only checks handler invocation, events, and a non-null response.
Validation
Drove the real replay proxy at the HTTP level (where matching happens): loaded the updated snapshot and posted all three request shapes. Turn 1 returns the
exit_plan_modetool call; turn 2 with the old text and turn 2 with the new text both return200with the final assistant message.Once a CLI version carrying the new wording is published and the harness pin is bumped, the old
conv0variant can be pruned.