Add E2E coverage for newly added RPC methods across all SDKs#1610
Conversation
Adds meaningful end-to-end tests for RPC surface area that was previously uncovered: server plugins/marketplaces, server remote-control, MCP server lifecycle, session-state extras, user-requested shell exec/cancel, the UI ephemeral query, and miscellaneous server methods (settings reload, attachments gating, agent-registry spawn gate, sessions.open, runtime.shutdown). Authored first in C# (35 tests / 7 files) so the assertions and isolation patterns could be reviewed, then ported faithfully to Python, Go, Rust, and TypeScript. All five suites share the same 35 recorded snapshots under test/snapshots so they exercise identical runtime exchanges. The tests are written to be deterministic in CI: shell commands pass the bare script body (the runtime wraps it in the platform shell itself, so no nested shell can be orphaned on cancel and lock the work dir), synchronization is condition-based polling rather than fixed sleeps, sessions are disposed deterministically, and error-path tests assert on the specific domain error rather than only a generic "unhandled method". Plugin/marketplace and session-spawning tests isolate per-test home directories. The Python harness gains a small wait_for_condition helper and skips snapshot writes under GITHUB_ACTIONS to match the other suites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds cross-SDK end-to-end coverage for recently added JSON-RPC methods, using the shared replaying proxy/snapshot harness so each language suite exercises the same recorded runtime exchanges.
Changes:
- Added new E2E test suites across Rust, Python, Go, Node.js, and .NET for UI ephemeral query, user-requested shell execute/cancel, session-state “extras”, server remote-control, server plugins/marketplaces/misc, and MCP lifecycle.
- Added/updated shared replay snapshots under
test/snapshots/for the new RPC coverage categories. - Extended the Python E2E harness with a polling helper (
wait_for_condition) and adjusted teardown to skip writing snapshot cache in GitHub Actions.
Show a summary per file
| File | Description |
|---|---|
| test/snapshots/rpc_ui_ephemeral_query/should_answer_ephemeral_query.yaml | Snapshot for UI ephemeral query replay. |
| test/snapshots/rpc_shell_user_requested/should_execute_user_requested_shell_command.yaml | Snapshot for user-requested shell execute replay (no model conversation). |
| test/snapshots/rpc_shell_user_requested/should_cancel_user_requested_shell_command.yaml | Snapshot for user-requested shell cancel replay (no model conversation). |
| test/snapshots/rpc_session_state_extras/should_report_session_activity_when_idle.yaml | Snapshot for session activity RPC replay (no model conversation). |
| test/snapshots/rpc_session_state_extras/should_reload_session_plugins.yaml | Snapshot for session plugin reload replay (no model conversation). |
| test/snapshots/rpc_session_state_extras/should_read_empty_sql_todos_for_fresh_session.yaml | Snapshot for SQL todos RPC replay (no model conversation). |
| test/snapshots/rpc_session_state_extras/should_list_models_for_session.yaml | Snapshot for session model list replay (no model conversation). |
| test/snapshots/rpc_session_state_extras/should_get_telemetry_engagement_id.yaml | Snapshot for telemetry engagement id replay (no model conversation). |
| test/snapshots/rpc_session_state_extras/should_get_current_tool_metadata_after_initialization.yaml | Snapshot for initializing a turn prior to tool metadata query. |
| test/snapshots/rpc_session_state_extras/should_get_and_set_allowall_permissions.yaml | Snapshot for allow-all permissions replay (no model conversation). |
| test/snapshots/rpc_server_remote_control/should_treat_set_steering_as_no_op_when_off.yaml | Snapshot for remote-control steering replay (no model conversation). |
| test/snapshots/rpc_server_remote_control/should_report_remote_control_status_as_off.yaml | Snapshot for remote-control status replay (no model conversation). |
| test/snapshots/rpc_server_remote_control/should_report_not_stopped_when_remote_control_is_off.yaml | Snapshot for remote-control stop replay (no model conversation). |
| test/snapshots/rpc_server_remote_control/should_reject_transfer_when_off_with_compare_and_swap.yaml | Snapshot for remote-control transfer replay (no model conversation). |
| test/snapshots/rpc_server_remote_control/should_reach_runtime_when_starting_remote_control_for_unknown_session.yaml | Snapshot for remote-control start unknown-session failure-path replay. |
| test/snapshots/rpc_server_plugins/should_update_single_marketplace_plugin.yaml | Snapshot for plugin update replay (no model conversation). |
| test/snapshots/rpc_server_plugins/should_update_all_installed_plugins.yaml | Snapshot for plugin update-all replay (no model conversation). |
| test/snapshots/rpc_server_plugins/should_reload_mcp_config_cache.yaml | Snapshot for MCP config cache reload replay (no model conversation). |
| test/snapshots/rpc_server_plugins/should_list_browse_refresh_and_remove_local_marketplace.yaml | Snapshot for marketplace list/browse/refresh/remove replay (no model conversation). |
| test/snapshots/rpc_server_plugins/should_install_list_and_uninstall_plugin_from_local_marketplace.yaml | Snapshot for marketplace plugin install/list/uninstall replay (no model conversation). |
| test/snapshots/rpc_server_plugins/should_install_direct_local_plugin_with_deprecation_warning.yaml | Snapshot for direct local plugin install replay (no model conversation). |
| test/snapshots/rpc_server_plugins/should_enable_and_disable_marketplace_plugin.yaml | Snapshot for plugin enable/disable replay (no model conversation). |
| test/snapshots/rpc_server_misc/should_shut_down_owned_runtime.yaml | Snapshot for runtime shutdown replay (no model conversation). |
| test/snapshots/rpc_server_misc/should_report_not_found_when_opening_session_without_context.yaml | Snapshot for sessions.open not_found replay (no model conversation). |
| test/snapshots/rpc_server_misc/should_report_agent_registry_spawn_gate_closed.yaml | Snapshot for agent registry spawn gate replay (no model conversation). |
| test/snapshots/rpc_server_misc/should_reload_user_settings.yaml | Snapshot for user settings reload replay (no model conversation). |
| test/snapshots/rpc_server_misc/should_reject_send_attachments_from_non_extension_connection.yaml | Snapshot for extensions attachment guard replay (no model conversation). |
| test/snapshots/rpc_mcp_lifecycle/should_throw_when_listing_tools_for_unconnected_server.yaml | Snapshot for MCP list-tools error-path replay (no model conversation). |
| test/snapshots/rpc_mcp_lifecycle/should_stop_running_mcp_server.yaml | Snapshot for MCP stop-server replay (no model conversation). |
| test/snapshots/rpc_mcp_lifecycle/should_start_and_restart_mcp_server.yaml | Snapshot for MCP start/restart replay (no model conversation). |
| test/snapshots/rpc_mcp_lifecycle/should_respond_to_mcp_oauth_request_without_pending_request.yaml | Snapshot for MCP oauth.respond no-op replay (no model conversation). |
| test/snapshots/rpc_mcp_lifecycle/should_reload_mcp_servers_with_config.yaml | Snapshot for MCP reload-with-config replay (no model conversation). |
| test/snapshots/rpc_mcp_lifecycle/should_register_and_unregister_external_mcp_client.yaml | Snapshot for MCP external client register/unregister replay (no model conversation). |
| test/snapshots/rpc_mcp_lifecycle/should_list_tools_and_report_running_status_for_connected_server.yaml | Snapshot for MCP list-tools + isServerRunning replay (no model conversation). |
| test/snapshots/rpc_mcp_lifecycle/should_configure_github_mcp_server.yaml | Snapshot for MCP configureGitHub replay (no model conversation). |
| rust/tests/e2e/rpc_ui_ephemeral_query.rs | Rust E2E for session UI ephemeral query. |
| rust/tests/e2e/rpc_shell_user_requested.rs | Rust E2E for shell execute/cancel user-requested flows. |
| rust/tests/e2e/rpc_session_state_extras.rs | Rust E2E for additional session-scoped RPCs (models/activity/permissions/etc). |
| rust/tests/e2e/rpc_server_remote_control.rs | Rust E2E for server-scoped remote-control RPCs. |
| rust/tests/e2e/rpc_server_plugins.rs | Rust E2E for plugin + marketplace server RPCs and MCP config reload. |
| rust/tests/e2e/rpc_server_misc.rs | Rust E2E for misc server RPCs (settings reload, shutdown, sessions.open, attachments guard, agent registry gate). |
| rust/tests/e2e/rpc_mcp_lifecycle.rs | Rust E2E for MCP lifecycle RPCs (list tools, running, stop/start/restart, reloadWithConfig, configureGitHub, oauth.respond). |
| rust/tests/e2e.rs | Wires new Rust E2E modules into the Rust test suite. |
| python/e2e/testharness/helper.py | Adds wait_for_condition polling helper for Python E2E tests. |
| python/e2e/testharness/init.py | Exports wait_for_condition from the Python E2E harness package. |
| python/e2e/test_rpc_ui_ephemeral_query_e2e.py | Python E2E for session UI ephemeral query. |
| python/e2e/test_rpc_shell_user_requested_e2e.py | Python E2E for shell execute/cancel user-requested flows. |
| python/e2e/test_rpc_session_state_extras_e2e.py | Python E2E for additional session-scoped RPCs. |
| python/e2e/test_rpc_server_remote_control_e2e.py | Python E2E for server-scoped remote-control RPCs. |
| python/e2e/test_rpc_server_plugins_e2e.py | Python E2E for plugin + marketplace server RPCs and MCP config reload. |
| python/e2e/test_rpc_server_misc_e2e.py | Python E2E for misc server RPCs. |
| python/e2e/test_rpc_mcp_lifecycle_e2e.py | Python E2E for MCP lifecycle RPCs. |
| python/e2e/conftest.py | Skips writing snapshot cache on CI (and when failures occur) to avoid corruption. |
| nodejs/test/e2e/rpc_ui_ephemeral_query.e2e.test.ts | Node E2E for session UI ephemeral query. |
| nodejs/test/e2e/rpc_shell_user_requested.e2e.test.ts | Node E2E for user-requested shell execute/cancel with polling/cleanup. |
| nodejs/test/e2e/rpc_session_state_extras.e2e.test.ts | Node E2E for additional session-scoped RPCs. |
| nodejs/test/e2e/rpc_server_remote_control.e2e.test.ts | Node E2E for server-scoped remote-control RPCs. |
| nodejs/test/e2e/rpc_server_plugins.e2e.test.ts | Node E2E for plugin + marketplace server RPCs and MCP config reload. |
| nodejs/test/e2e/rpc_server_misc.e2e.test.ts | Node E2E for misc server RPCs. |
| nodejs/test/e2e/rpc_mcp_lifecycle.e2e.test.ts | Node E2E for MCP lifecycle RPCs. |
| go/internal/e2e/rpc_ui_ephemeral_query_e2e_test.go | Go E2E for session UI ephemeral query. |
| go/internal/e2e/rpc_shell_user_requested_e2e_test.go | Go E2E for user-requested shell execute/cancel with polling/cleanup. |
| go/internal/e2e/rpc_session_state_extras_e2e_test.go | Go E2E for additional session-scoped RPCs. |
| go/internal/e2e/rpc_server_remote_control_e2e_test.go | Go E2E for server-scoped remote-control RPCs. |
| go/internal/e2e/rpc_server_plugins_e2e_test.go | Go E2E for plugin + marketplace server RPCs and MCP config reload. |
| go/internal/e2e/rpc_server_misc_e2e_test.go | Go E2E for misc server RPCs. |
| go/internal/e2e/rpc_mcp_lifecycle_e2e_test.go | Go E2E for MCP lifecycle RPCs. |
| dotnet/test/E2E/RpcUiEphemeralQueryE2ETests.cs | .NET E2E for session UI ephemeral query. |
| dotnet/test/E2E/RpcShellUserRequestedE2ETests.cs | .NET E2E for user-requested shell execute/cancel. |
| dotnet/test/E2E/RpcSessionStateExtrasE2ETests.cs | .NET E2E for additional session-scoped RPCs. |
| dotnet/test/E2E/RpcServerRemoteControlE2ETests.cs | .NET E2E for server-scoped remote-control RPCs. |
| dotnet/test/E2E/RpcServerPluginsE2ETests.cs | .NET E2E for plugin + marketplace server RPCs and MCP config reload. |
| dotnet/test/E2E/RpcServerMiscE2ETests.cs | .NET E2E for misc server RPCs. |
| dotnet/test/E2E/RpcMcpLifecycleE2ETests.cs | .NET E2E for MCP lifecycle RPCs. |
Copilot's findings
- Files reviewed: 74/74 changed files
- Comments generated: 3
This comment has been minimized.
This comment has been minimized.
In the user-requested shell cancel test, the spawned execute_user_requested JoinHandle was moved into tokio::time::timeout and dropped on the timeout path. Dropping a JoinHandle detaches the task rather than cancelling it, so a timed-out shell command would keep running in the background and could hold file handles open, destabilizing later tests. Await the handle by mutable reference and abort() it before panicking so the failure path cleans up after itself. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-SDK Consistency ReviewThis PR adds 35 E2E tests per language (175 total) across .NET, Go, Node.js/TypeScript, Python, and Rust — great coverage breadth! However, Java is the one SDK that was not ported. What's missingJava is the 6th SDK in this repo and already has a complete E2E test infrastructure (
Since all five other languages share the same 35 recorded snapshots under SuggestionConsider adding the 7 missing Java test files in
|
Why
We aim for 100% of the JSON-RPC surface to be exercised by meaningful E2E tests, so the SDKs stay provably wired up to the runtime and the tests act as a behavioral backstop. A batch of RPC methods was recently added without E2E coverage. This fills that gap.
What
Adds real-ish E2E tests (asserting meaningful values, not just "no error") for the previously uncovered surface:
Approach
Authored first in C# (35 tests / 7 files) so the assertions and isolation patterns could be reviewed, then ported faithfully to Python, Go, Rust, and TypeScript. All five suites share the same 35 recorded snapshots under
test/snapshots/, so every language replays identical runtime exchanges. 35 tests per language, 175 total.Designed against CI flakiness
These were written and reviewed specifically to avoid nondeterminism:
pwsh -Command/sh -c) itself, so there is no nested shell that could be orphaned on cancel and keep the work dir locked (this was a real Windows fixture-cleanup failure that the fix eliminates).Notes for reviewers
wait_for_conditionhelper and skips snapshot writes underGITHUB_ACTIONS, matching the other suites' replay behavior.dotnet format,ruff,gofmt+go vet,cargo fmt,prettier.