fix(doom_loop): normalize tool-call args before hashing by voidborne-d · Pull Request #119 · huggingface/ml-intern

voidborne-d · 2026-04-25T16:38:00Z

Problem

agent/core/doom_loop.py:_hash_args hashes the raw function.arguments JSON string. LLMs can emit semantically-identical tool calls with different key orderings ({"a":1,"b":2} vs {"b":2,"a":1}) or whitespace ({"a":1} vs {"a": 1}), and these hash to different values. That silently breaks detect_identical_consecutive and detect_repeating_sequence: the agent could be calling the same tool with the same args repeatedly and the detector sees three distinct signatures and stays quiet.

Issue #61 P1 explicitly calls this out:

Add semantic-similarity or normalized-task matching for research.

This is the smallest version of that — pure key/whitespace normalisation, no semantic similarity yet.

Reproduction

from agent.core.doom_loop import _hash_args
a = '{"path": "/foo", "query": "bar"}'
b = '{"query": "bar", "path": "/foo"}'  # same call, keys reordered
print(_hash_args(a) == _hash_args(b))   # before this PR: False

A research subagent that hits a 529 retry can re-emit the same call with shuffled keys; pre-fix that gets logged as three separate signatures.

Fix

Parse-and-redump the args string with json.dumps(..., sort_keys=True, separators=(",", ":")) before hashing. Falls back to the raw string when input isn't valid JSON so non-JSON arguments strings (rare edge for some providers / partial streaming residue) keep the legacy behaviour and never raise.

agent/core/doom_loop.py: 31 inserts / 2 deletes — adds _normalize_args helper, threads it through _hash_args. No call-site changes.

Tests

23 new cases in tests/unit/test_doom_loop.py covering:

_normalize_args — key reordering, whitespace, nested structure, array-order is significant, invalid-JSON fallback, empty string
_hash_args — semantic-equality regression + value-difference negative
extract_recent_tool_signatures — three reordered-key calls collapse to one signature, skips non-assistant / no-tool-calls messages
detect_identical_consecutive — fires at threshold, silent below, resets on break, catches reordered-args run (the headline regression)
detect_repeating_sequence — alternating pair, broken pattern silent, normalises args inside cycle
check_for_doom_loop — full entry point: silent below 3 signatures, returns corrective prompt for identical run AND for cycle, quiet when args meaningfully differ

Local gates

$ uv run python -m pytest tests/unit/test_doom_loop.py tests/unit/test_redact.py -v
============================== 32 passed in 2.41s ==============================

The 9 pre-existing test_user_quotas.py failures are missing-pytest-asyncio issues on main and are unrelated to this PR.

Scope discipline

Deliberately narrow: only key-order + whitespace normalisation. Did not touch:

detect_repeating_sequence cycle-detection algorithm
lookback=30 window (currently messages, ~~should~~ could be tool-call count — separate issue)
Cross-turn pattern matching that Cost, stability, and loop-control improvements #61 P1 also asks for
Semantic-similarity matching for research-style calls (would need an embedding hop)

Each of those is its own discrete PR; piling them in here would dilute the regression coverage.

The doom-loop detector hashed raw `function.arguments` strings, so semantically-identical tool calls hashed differently when the LLM emitted them with different key orderings (`{"a":1,"b":2}` vs `{"b":2,"a":1}`) or whitespace (`{"a":1}` vs `{"a": 1}`). This silently broke `detect_identical_consecutive` and `detect_repeating_sequence`: the agent could be calling the same tool with the same args repeatedly and the detector would see three distinct signatures and stay quiet. Issue huggingface#61 P1 explicitly calls this out: > Add semantic-similarity or normalized-task matching for `research`. Fix: parse-and-redump JSON via `json.dumps(..., sort_keys=True, separators=(",", ":"))` before hashing. Falls back to the raw string when the input isn't valid JSON so non-JSON `arguments` strings (rare edge for some providers) keep the legacy behaviour and never raise. Tests: 23 new cases in `tests/unit/test_doom_loop.py` covering `_normalize_args`, `_hash_args`, `extract_recent_tool_signatures`, `detect_identical_consecutive`, `detect_repeating_sequence`, and the `check_for_doom_loop` entry point. Includes the headline regression — three reordered-key calls collapsing to one signature — plus negative cases (different values, different array orderings, sub-threshold counts, broken pattern).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(doom_loop): normalize tool-call args before hashing#119

fix(doom_loop): normalize tool-call args before hashing#119
voidborne-d wants to merge 1 commit intohuggingface:mainfrom
voidborne-d:fix/doom-loop-arg-normalization

voidborne-d commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

voidborne-d commented Apr 25, 2026

Problem

Reproduction

Fix

Tests

Local gates

Scope discipline

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant