feat: add dry-run mode for execute (Phase 3)#140
Conversation
acmacalister
left a comment
There was a problem hiding this comment.
Good shape overall — clean architecture, DryRunIntegration as an optional interface is the right extension point, and the test coverage is thorough for most paths. Two things need attention before merge: a logic inversion in the circuit-breaker path, and a question about whether dry-run should respect session context the way real execute does.
acmacalister
left a comment
There was a problem hiding this comment.
Good shape overall — the tiered DryRunIntegration interface is clean and the fix commit resolved the two issues from the previous round. One remaining circuit-breaker edge case and a minor silent-ignore to look at inline.
acmacalister
left a comment
There was a problem hiding this comment.
Excellent iteration — all five previous round comments are addressed and the new tests lock down the edge cases well. Two small nits inline, neither blocking.
| assert.Equal(t, "github_create_issue", resp["tool"]) | ||
| assert.Equal(t, "github", resp["integration"]) | ||
| assert.Equal(t, "ok", resp["status"]) | ||
| args := resp["validated_args"].(map[string]any) |
There was a problem hiding this comment.
Direct type assertion on resp["validated_args"] will panic (not fail) if the field is absent or wrong type — making a future regression harder to diagnose. The same pattern appears again at line 353.
validatedArgs, ok := resp["validated_args"].(map[string]any)
require.True(t, ok, "expected validated_args to be a map")
assert.Equal(t, "daltoniam", validatedArgs["owner"])| // returns (result, true), that result is used directly. Otherwise the server | ||
| // falls back to a simulated dry-run (arg validation + preview). | ||
| type DryRunIntegration interface { | ||
| DryRun(ctx context.Context, toolName ToolName, args map[string]any) (*ToolResult, bool) |
There was a problem hiding this comment.
MarkdownIntegration.RenderMarkdown has a method-level doc explaining its return contract; DryRun doesn't. Worth adding a note that (nil, false) is the expected "decline" signal (not an error), so adapter authors don't reach for (*ToolResult{IsError: true}, true) when they just want to fall back:
// DryRun returns a native preview for the given tool call, or (nil, false)
// if this tool doesn't support native dry-run and the server should fall back
// to a simulated preview. Returning (nil, false) is not an error.
DryRun(ctx context.Context, toolName ToolName, args map[string]any) (*ToolResult, bool)| } | ||
|
|
||
| if args.DryRun && args.Script != "" { | ||
| return errorResult("dry_run is not supported with script — use tool_name"), nil |
There was a problem hiding this comment.
not being about to use script is a bit of bummer here, understandable why, but does suck for exercising maximum token use reduction
acmacalister
left a comment
There was a problem hiding this comment.
Solid iteration — previous round of comments are all addressed. Found two more small things inline.
| integration.Name(), | ||
| )), nil | ||
| } | ||
| cb.recordSuccess() |
There was a problem hiding this comment.
recordSuccess() here heals the circuit breaker based on Healthy() returning true — but Healthy() is typically a lightweight credential/ping check, not a full API round-trip. In executeTool, recordSuccess() only fires after a real Execute() call succeeds. The mismatch means a dry-run health ping can close a half-open breaker and let the next real call through against an endpoint that's still broken.
One option: skip recordSuccess() in the dry-run path and let the real execute do the healing:
if !integration.Healthy(ctx) {
cb.recordFailure()
return errorResult(fmt.Sprintf(
"dry-run: integration %q is unhealthy — call would likely fail",
integration.Name(),
)), nil
}
// Don't call cb.recordSuccess() here — only real Execute calls should heal the breaker.If healing on a health check is intentional ("dry-run acts as a probe"), worth a comment saying so.
| }, | ||
| dryRunFn: func(_ context.Context, _ mcp.ToolName, args map[string]any) (*mcp.ToolResult, bool) { | ||
| return &mcp.ToolResult{ | ||
| Data: `{"dry_run":true,"native":true,"would_invoke":"` + args["function_name"].(string) + `"}`, |
There was a problem hiding this comment.
Unsafe type assertion here — if function_name is absent from args or the wrong type, this panics rather than failing the test cleanly. The validated_args assertions were updated to use the two-value form, worth doing the same here:
fn, ok := args["function_name"].(string)
require.True(t, ok, "expected function_name to be a string")
return &mcp.ToolResult{
Data: `{"dry_run":true,"native":true,"would_invoke":"` + fn + `"}`,
}, trueAdd a dry_run flag to the execute meta-tool that validates arguments and checks integration health without actually performing the operation. This makes LLMs more willing to call mutation tools (create, update, delete) since they can preview what would happen first. Tiered approach: - Native: integrations implementing DryRunIntegration can return a custom preview (e.g. AWS Lambda DryRun invocation type) - Simulated fallback: validates args, checks circuit breaker and health, returns a preview with validated_args and status - The tool is never actually executed in either path New DryRunIntegration optional interface in mcp.go — integrations can adopt it incrementally. The execute tool schema now includes a dry_run boolean parameter. 8 new tests: simulated fallback, validation failure, native integration, native-declines-falls-back, unhealthy integration, tool not found, does-not-execute verification, no-pinning verification. 🐘 Generated with Crush Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
f2ad0d5 to
6b1c06f
Compare
acmacalister
left a comment
There was a problem hiding this comment.
Build, tests, and lint all pass — the dry-run design is solid. Two things from earlier review rounds look like they were lost in the squash; flagging them inline.
| } | ||
|
|
||
| if args.DryRun { | ||
| return s.handleDryRun(ctx, args.ToolName, args.Arguments) |
There was a problem hiding this comment.
The args.Script check above this returns early before dry_run is ever evaluated — so { script: "api.call('github_delete_repo', ...)", dry_run: true } executes the script for real. The tool description says "Works with tool_name only (not scripts)" but there's no enforcement.
This guard was in an intermediate commit but appears to have been dropped in the squash:
if args.DryRun && args.Script != "" {
return errorResult("dry_run is not supported with script — use tool_name"), nil
}|
|
||
| cb := s.getBreaker(integration.Name()) | ||
| if !cb.allow() { | ||
| cb.recordSuccess() |
There was a problem hiding this comment.
This cb.recordSuccess() fires on the rejection path — allow() returned false, meaning the breaker is open. Resetting failures to zero here heals the breaker mid-rejection, so the next call passes through as if nothing happened.
This was flagged in the first review round but looks like it was lost in the squash. The fix is to remove this call and only keep the one at line 29:
if !cb.allow() {
return errorResult(fmt.Sprintf(
"dry-run: integration %q temporarily unavailable (circuit breaker open)",
integration.Name(),
)), nil
}
cb.recordSuccess()
Summary
dry_runflag on execute: Passdry_run: trueto validate arguments and check integration health without executing. LLMs can preview mutation operations (create issue, merge PR, post comment) before committing.DryRunIntegrationinterface return native previews. All others get a simulated dry-run that validates args, checks the circuit breaker, and confirms health.Execute, never pins results, never records breadcrumbs — purely a validation + preview path.This is Phase 3 of the stateful features plan. Independent of Phases 1-2 but most useful alongside session context.
Test plan
-racegolangci-lint run— 0 issuesgo test ./...— all existing tests still pass🐘 Generated with Crush