Skip to content

fix: correct regex replacement logic in _clean_content()#168

Open
Dabao21 wants to merge 1 commit intolsdefine:mainfrom
Dabao21:fix/clean-content-regex
Open

fix: correct regex replacement logic in _clean_content()#168
Dabao21 wants to merge 1 commit intolsdefine:mainfrom
Dabao21:fix/clean-content-regex

Conversation

@Dabao21
Copy link
Copy Markdown

@Dabao21 Dabao21 commented Apr 25, 2026

Problem

In agent_loop.py, the _clean_content() function used a heuristic '\\n' in p to decide whether to replace with '\n\n' or '':

for p in [r'<file_content>[\s\S]*?</file_content>', r'<tool_(?:use|call)>[\s\S]*?</tool_(?:use|call)>', r'(\r?\n){3,}']:
    text = re.sub(p, '\n\n' if '\\n' in p else '', text)

In Python raw strings, \n is stored as the literal two characters \ and n, so '\\n' in r'(\r?\n){3,}' evaluates to True. This means all three patterns were replaced with '\n\n', including <file_content> and <tool_use> tags which should be replaced with an empty string '' to remove them entirely.

Fix

Replace the heuristic with explicit pattern-replacement pairs:

for p, repl in [(r'<file_content>[\s\S]*?</file_content>', ''), (r'<tool_(?:use|call)>[\s\S]*?</tool_(?:use|call)>', ''), (r'(\r?\n){3,}', '\n\n')]:
    text = re.sub(p, repl, text)

Impact

  • <file_content> and <tool_use> tags are now correctly stripped (replaced with empty string) instead of leaving extra newlines
  • Reduces unnecessary token consumption in compressed history messages

The previous implementation used '\\n' in p to decide replacement string, which incorrectly matched the raw string literal \\n in all three patterns. This caused <file_content> and <tool_use> tags to be replaced with newlines instead of empty strings. Now each pattern is explicitly paired with its correct replacement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant