feat(harness): detect redundant tool-call loops (3 consecutive identical outputs) by khaliqgant · Pull Request #63 · AgentWorkforce/agent-assistant

khaliqgant · 2026-04-26T06:52:33Z

Summary

Adds redundant-tool-loop detection to the harness. After every successful tool call, hashes the output and checks the last 3 entries in a 5-slot ring buffer. If 3 consecutive entries are from the same toolName and have the same outputHash, the harness exits with outcome: failed, stopReason: redundant_tool_loop.

Failed (not deferred) because more budget will not unstick a model dead-looping on the same response.
5-slot ring is per-turn; resets on new turn.
djb2 hash for output content (non-crypto; collision-tolerant for this use case).
User-facing copy added via stopReasonToUserMessage so sage/specialist surfaces inherit it for free.

Production failure this catches

Slack DM "explore the codebase structure deeply" → sage harness called workspace_list with the same path 7 times in a row (each returning byte-identical 12,604-char output) before hitting max_iterations_reached. With this PR, the harness exits at the 3rd identical response with a clear stopReason instead of burning the iteration budget.

Validation

harness workspace tests green (163/163, incl. 6 new cases in harness.redundant-loop.test.ts)
stop-reason-message tests green
tsc clean

Downstream

After merge + republish, sage bumps @agent-assistant/harness (workflow F: bump-sage-after-harness-release.ts) and the next sage release picks up the new stopReason copy automatically.

🤖 Generated with Claude Code

…cal outputs) Production trace 2026-04-26: Slack DM "explore the codebase structure deeply" hit max_iterations_reached after sage's harness called workspace_list with the same path 7 times in a row, getting back byte-identical 12,604-char responses every time. The harness had no way to detect "same tool + same output again — make a decision", so it burned the budget and produced no useful reply. This adds a 5-entry ring buffer of (toolName, outputHash) per turn. When the last 3 entries match on BOTH toolName and outputHash, the harness exits with outcome=failed, stopReason=redundant_tool_loop — failed (not deferred) because more budget will not help; the model is dead-stuck on the same tool call and needs a different prompt or tool surface to make progress. User-facing copy added to stopReasonToUserMessage so consumers (sage, specialist surfaces) automatically pick up a useful error message instead of the generic "could not complete" fallback. Self-reviewed and peer-reviewed. Tests + typecheck green (163/163). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6b44038280

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-26T06:56:33Z

              }
            } else {
              await emit(config, input, state, { type: 'tool_finished', result });
+              const hash = djb2Hash(result.output ?? JSON.stringify(result.structuredOutput ?? {}));


Skip loop hashing when tool result has no payload

This hashes JSON.stringify({}) whenever both result.output and result.structuredOutput are absent, and HarnessToolResult allows that shape. As a result, any three successful calls from the same tool that intentionally return no payload (common for side-effect tools) are treated as identical and the turn is terminated with redundant_tool_loop, even if each call had different inputs and made progress. Only run redundant-loop comparison when there is an actual deterministic payload, or include call-specific data in the signature.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-26T06:56:33Z

              }
            } else {
              await emit(config, input, state, { type: 'tool_finished', result });
+              const hash = djb2Hash(result.output ?? JSON.stringify(result.structuredOutput ?? {}));


Guard structuredOutput hashing against stringify failures

When result.output is undefined, the detector hashes JSON.stringify(result.structuredOutput ?? {}), but structuredOutput is typed as Record<string, unknown> and may contain non-JSON-safe values (for example BigInt or circular references). In that case JSON.stringify throws, the outer catch converts the turn to runtime_error, and a successful tool execution can fail the whole run. The loop detector should use a safe serialization fallback so hashing cannot throw.

Useful? React with 👍 / 👎.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

…stringify Two codex review issues on PR #63: P1 — false-positive on side-effect tools. Pre-fix the detector hashed `JSON.stringify({})` whenever both `result.output` and `result.structuredOutput` were absent. Three successful calls from a side-effect tool (writes/notifications/etc) all hashed identically and falsely tripped redundant_tool_loop after the third call, even though each call had different inputs and made real progress. Fix: skip the detector entirely when the result has no payload to compare. P2 — JSON.stringify can throw on non-serializable values. structuredOutput is typed as Record<string, unknown> and may carry BigInt or circular refs. A throw inside the detector bubbled to the outer harness catch and converted a successful tool execution into a runtime_error that failed the whole turn. Fix: wrap stringify in try/catch; treat serialization failures as "no comparable signature" — the detector skips, the turn continues. Both behaviors live in a new `computeToolResultSignature()` helper that returns null when there's no comparable payload (or the payload can't be serialized) and a number otherwise. The redundant-loop check is gated on the helper returning a non-null signature. Two new regression tests covering each case. 165/165 harness suite passes; typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

devin-ai-integration Bot reviewed Apr 26, 2026

View reviewed changes

khaliqgant merged commit 9b052a4 into main Apr 26, 2026
1 check passed

khaliqgant deleted the feat/harness-redundant-tool-loop branch April 26, 2026 07:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(harness): detect redundant tool-call loops (3 consecutive identical outputs)#63

feat(harness): detect redundant tool-call loops (3 consecutive identical outputs)#63
khaliqgant merged 2 commits intomainfrom
feat/harness-redundant-tool-loop

khaliqgant commented Apr 26, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

khaliqgant commented Apr 26, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Production failure this catches

Validation

Downstream

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

khaliqgant commented Apr 26, 2026 •

edited by devin-ai-integration Bot

Loading