Skip to content

feat: add dry-run mode for execute (Phase 3)#140

Open
acmacalister wants to merge 1 commit into
mainfrom
dry-run-mode
Open

feat: add dry-run mode for execute (Phase 3)#140
acmacalister wants to merge 1 commit into
mainfrom
dry-run-mode

Conversation

@acmacalister
Copy link
Copy Markdown
Collaborator

Summary

  • dry_run flag on execute: Pass dry_run: true to validate arguments and check integration health without executing. LLMs can preview mutation operations (create issue, merge PR, post comment) before committing.
  • Tiered approach: Integrations implementing the new DryRunIntegration interface return native previews. All others get a simulated dry-run that validates args, checks the circuit breaker, and confirms health.
  • No side effects: Dry-run never calls Execute, never pins results, never records breadcrumbs — purely a validation + preview path.

This is Phase 3 of the stateful features plan. Independent of Phases 1-2 but most useful alongside session context.

Test plan

  • 8 tests: simulated fallback, validation failure, native integration, native-declines-falls-back, unhealthy integration, tool not found, does-not-execute, no-pinning
  • All tests pass with -race
  • golangci-lint run — 0 issues
  • Full go test ./... — all existing tests still pass

🐘 Generated with Crush

Copy link
Copy Markdown
Collaborator Author

@acmacalister acmacalister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shape overall — clean architecture, DryRunIntegration as an optional interface is the right extension point, and the test coverage is thorough for most paths. Two things need attention before merge: a logic inversion in the circuit-breaker path, and a question about whether dry-run should respect session context the way real execute does.

Comment thread server/dryrun.go
Comment thread server/dryrun.go
Comment thread server/dryrun_test.go
Copy link
Copy Markdown
Collaborator Author

@acmacalister acmacalister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shape overall — the tiered DryRunIntegration interface is clean and the fix commit resolved the two issues from the previous round. One remaining circuit-breaker edge case and a minor silent-ignore to look at inline.

Comment thread server/dryrun.go
Comment thread server/server.go Outdated
Copy link
Copy Markdown
Collaborator Author

@acmacalister acmacalister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent iteration — all five previous round comments are addressed and the new tests lock down the edge cases well. Two small nits inline, neither blocking.

Comment thread server/dryrun_test.go
assert.Equal(t, "github_create_issue", resp["tool"])
assert.Equal(t, "github", resp["integration"])
assert.Equal(t, "ok", resp["status"])
args := resp["validated_args"].(map[string]any)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct type assertion on resp["validated_args"] will panic (not fail) if the field is absent or wrong type — making a future regression harder to diagnose. The same pattern appears again at line 353.

validatedArgs, ok := resp["validated_args"].(map[string]any)
require.True(t, ok, "expected validated_args to be a map")
assert.Equal(t, "daltoniam", validatedArgs["owner"])

Comment thread mcp.go
// returns (result, true), that result is used directly. Otherwise the server
// falls back to a simulated dry-run (arg validation + preview).
type DryRunIntegration interface {
DryRun(ctx context.Context, toolName ToolName, args map[string]any) (*ToolResult, bool)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MarkdownIntegration.RenderMarkdown has a method-level doc explaining its return contract; DryRun doesn't. Worth adding a note that (nil, false) is the expected "decline" signal (not an error), so adapter authors don't reach for (*ToolResult{IsError: true}, true) when they just want to fall back:

// DryRun returns a native preview for the given tool call, or (nil, false)
// if this tool doesn't support native dry-run and the server should fall back
// to a simulated preview. Returning (nil, false) is not an error.
DryRun(ctx context.Context, toolName ToolName, args map[string]any) (*ToolResult, bool)

Comment thread server/server.go Outdated
}

if args.DryRun && args.Script != "" {
return errorResult("dry_run is not supported with script — use tool_name"), nil
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not being about to use script is a bit of bummer here, understandable why, but does suck for exercising maximum token use reduction

Copy link
Copy Markdown
Collaborator Author

@acmacalister acmacalister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid iteration — previous round of comments are all addressed. Found two more small things inline.

Comment thread server/dryrun.go Outdated
integration.Name(),
)), nil
}
cb.recordSuccess()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recordSuccess() here heals the circuit breaker based on Healthy() returning true — but Healthy() is typically a lightweight credential/ping check, not a full API round-trip. In executeTool, recordSuccess() only fires after a real Execute() call succeeds. The mismatch means a dry-run health ping can close a half-open breaker and let the next real call through against an endpoint that's still broken.

One option: skip recordSuccess() in the dry-run path and let the real execute do the healing:

if !integration.Healthy(ctx) {
    cb.recordFailure()
    return errorResult(fmt.Sprintf(
        "dry-run: integration %q is unhealthy — call would likely fail",
        integration.Name(),
    )), nil
}
// Don't call cb.recordSuccess() here — only real Execute calls should heal the breaker.

If healing on a health check is intentional ("dry-run acts as a probe"), worth a comment saying so.

Comment thread server/dryrun_test.go
},
dryRunFn: func(_ context.Context, _ mcp.ToolName, args map[string]any) (*mcp.ToolResult, bool) {
return &mcp.ToolResult{
Data: `{"dry_run":true,"native":true,"would_invoke":"` + args["function_name"].(string) + `"}`,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe type assertion here — if function_name is absent from args or the wrong type, this panics rather than failing the test cleanly. The validated_args assertions were updated to use the two-value form, worth doing the same here:

fn, ok := args["function_name"].(string)
require.True(t, ok, "expected function_name to be a string")
return &mcp.ToolResult{
    Data: `{"dry_run":true,"native":true,"would_invoke":"` + fn + `"}`,
}, true

Add a dry_run flag to the execute meta-tool that validates arguments
and checks integration health without actually performing the operation.
This makes LLMs more willing to call mutation tools (create, update,
delete) since they can preview what would happen first.

Tiered approach:
- Native: integrations implementing DryRunIntegration can return a
  custom preview (e.g. AWS Lambda DryRun invocation type)
- Simulated fallback: validates args, checks circuit breaker and
  health, returns a preview with validated_args and status
- The tool is never actually executed in either path

New DryRunIntegration optional interface in mcp.go — integrations
can adopt it incrementally. The execute tool schema now includes
a dry_run boolean parameter.

8 new tests: simulated fallback, validation failure, native integration,
native-declines-falls-back, unhealthy integration, tool not found,
does-not-execute verification, no-pinning verification.

🐘 Generated with Crush

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@acmacalister acmacalister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build, tests, and lint all pass — the dry-run design is solid. Two things from earlier review rounds look like they were lost in the squash; flagging them inline.

Comment thread server/server.go
}

if args.DryRun {
return s.handleDryRun(ctx, args.ToolName, args.Arguments)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The args.Script check above this returns early before dry_run is ever evaluated — so { script: "api.call('github_delete_repo', ...)", dry_run: true } executes the script for real. The tool description says "Works with tool_name only (not scripts)" but there's no enforcement.

This guard was in an intermediate commit but appears to have been dropped in the squash:

if args.DryRun && args.Script != "" {
    return errorResult("dry_run is not supported with script — use tool_name"), nil
}

Comment thread server/dryrun.go

cb := s.getBreaker(integration.Name())
if !cb.allow() {
cb.recordSuccess()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cb.recordSuccess() fires on the rejection path — allow() returned false, meaning the breaker is open. Resetting failures to zero here heals the breaker mid-rejection, so the next call passes through as if nothing happened.

This was flagged in the first review round but looks like it was lost in the squash. The fix is to remove this call and only keep the one at line 29:

if !cb.allow() {
    return errorResult(fmt.Sprintf(
        "dry-run: integration %q temporarily unavailable (circuit breaker open)",
        integration.Name(),
    )), nil
}
cb.recordSuccess()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants