Skip to content

fix(doom_loop): normalize tool-call args before hashing#119

Open
voidborne-d wants to merge 1 commit intohuggingface:mainfrom
voidborne-d:fix/doom-loop-arg-normalization
Open

fix(doom_loop): normalize tool-call args before hashing#119
voidborne-d wants to merge 1 commit intohuggingface:mainfrom
voidborne-d:fix/doom-loop-arg-normalization

Conversation

@voidborne-d
Copy link
Copy Markdown

Problem

agent/core/doom_loop.py:_hash_args hashes the raw function.arguments JSON string. LLMs can emit semantically-identical tool calls with different key orderings ({"a":1,"b":2} vs {"b":2,"a":1}) or whitespace ({"a":1} vs {"a": 1}), and these hash to different values. That silently breaks detect_identical_consecutive and detect_repeating_sequence: the agent could be calling the same tool with the same args repeatedly and the detector sees three distinct signatures and stays quiet.

Issue #61 P1 explicitly calls this out:

Add semantic-similarity or normalized-task matching for research.

This is the smallest version of that — pure key/whitespace normalisation, no semantic similarity yet.

Reproduction

from agent.core.doom_loop import _hash_args
a = '{"path": "/foo", "query": "bar"}'
b = '{"query": "bar", "path": "/foo"}'  # same call, keys reordered
print(_hash_args(a) == _hash_args(b))   # before this PR: False

A research subagent that hits a 529 retry can re-emit the same call with shuffled keys; pre-fix that gets logged as three separate signatures.

Fix

Parse-and-redump the args string with json.dumps(..., sort_keys=True, separators=(",", ":")) before hashing. Falls back to the raw string when input isn't valid JSON so non-JSON arguments strings (rare edge for some providers / partial streaming residue) keep the legacy behaviour and never raise.

agent/core/doom_loop.py: 31 inserts / 2 deletes — adds _normalize_args helper, threads it through _hash_args. No call-site changes.

Tests

23 new cases in tests/unit/test_doom_loop.py covering:

  • _normalize_args — key reordering, whitespace, nested structure, array-order is significant, invalid-JSON fallback, empty string
  • _hash_args — semantic-equality regression + value-difference negative
  • extract_recent_tool_signatures — three reordered-key calls collapse to one signature, skips non-assistant / no-tool-calls messages
  • detect_identical_consecutive — fires at threshold, silent below, resets on break, catches reordered-args run (the headline regression)
  • detect_repeating_sequence — alternating pair, broken pattern silent, normalises args inside cycle
  • check_for_doom_loop — full entry point: silent below 3 signatures, returns corrective prompt for identical run AND for cycle, quiet when args meaningfully differ

Local gates

$ uv run python -m pytest tests/unit/test_doom_loop.py tests/unit/test_redact.py -v
============================== 32 passed in 2.41s ==============================

The 9 pre-existing test_user_quotas.py failures are missing-pytest-asyncio issues on main and are unrelated to this PR.

Scope discipline

Deliberately narrow: only key-order + whitespace normalisation. Did not touch:

  • detect_repeating_sequence cycle-detection algorithm
  • lookback=30 window (currently messages, should could be tool-call count — separate issue)
  • Cross-turn pattern matching that Cost, stability, and loop-control improvements #61 P1 also asks for
  • Semantic-similarity matching for research-style calls (would need an embedding hop)

Each of those is its own discrete PR; piling them in here would dilute the regression coverage.

The doom-loop detector hashed raw `function.arguments` strings, so
semantically-identical tool calls hashed differently when the LLM emitted
them with different key orderings (`{"a":1,"b":2}` vs `{"b":2,"a":1}`)
or whitespace (`{"a":1}` vs `{"a": 1}`). This silently broke
`detect_identical_consecutive` and `detect_repeating_sequence`: the
agent could be calling the same tool with the same args repeatedly and
the detector would see three distinct signatures and stay quiet.

Issue huggingface#61 P1 explicitly calls this out:
> Add semantic-similarity or normalized-task matching for `research`.

Fix: parse-and-redump JSON via `json.dumps(..., sort_keys=True,
separators=(",", ":"))` before hashing. Falls back to the raw string
when the input isn't valid JSON so non-JSON `arguments` strings (rare
edge for some providers) keep the legacy behaviour and never raise.

Tests: 23 new cases in `tests/unit/test_doom_loop.py` covering
`_normalize_args`, `_hash_args`, `extract_recent_tool_signatures`,
`detect_identical_consecutive`, `detect_repeating_sequence`, and the
`check_for_doom_loop` entry point. Includes the headline regression —
three reordered-key calls collapsing to one signature — plus negative
cases (different values, different array orderings, sub-threshold
counts, broken pattern).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant