Skip to content

fix: guard against empty regex matches in WordAug (IndexError #20)#21

Open
temrjan wants to merge 1 commit intoai-forever:mainfrom
temrjan:fix/empty-word-index-error
Open

fix: guard against empty regex matches in WordAug (IndexError #20)#21
temrjan wants to merge 1 commit intoai-forever:mainfrom
temrjan:fix/empty-word-index-error

Conversation

@temrjan
Copy link
Copy Markdown

@temrjan temrjan commented Mar 25, 2026

Summary

Fixes #20WordAug.augment(action='replace') crashes with IndexError: list index out of range when a token contains only characters not matched by the regex pattern (e.g. parentheses (, em-dashes ).

Root cause: re.findall() returns an empty list for unmatchable tokens, then word[0] raises IndexError.

Fix: Check if the regex result is empty before indexing. If empty, return the original token unchanged. Applied to both affected methods:

  • __replace() (line 134)
  • __text2emoji() (line 106) — same bug pattern

Also renamed the shadowed word variable to tokens for clarity — the original code reused word for both the input string and the regex result list.

Test plan

Added tests/test_word_aug.py with 5 regression tests:

tests/test_word_aug.py    5 passed in 2.94s

🤖 Generated with Claude Code

__replace() and __text2emoji() crash with IndexError when a token
contains only characters not matched by the regex (e.g. parentheses,
em-dashes). Return the original token unchanged when re.findall()
produces an empty list.

Added regression tests reproducing the exact scenario from issue ai-forever#20.

Closes ai-forever#20

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Воспроизводимая ошибка "list out of index" в WordAug

1 participant