Summary
The error message in _normalize_token() inside gemma/gm/text/_sampler.py is missing the f prefix, so {token!r} is printed literally instead of interpolating the actual token value. This makes debugging stop_tokens / forbidden_tokens misconfigurations nearly impossible.
Affected Code
File: gemma/gm/text/_sampler.py, lines 579–581
# Current (broken — NOT an f-string):
raise ValueError(
'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
' map to single token ids in the vocab.'
)
Root Cause
The string literal uses {token!r} but there is no f prefix, so Python treats it as a plain string. The user sees the unhelpful message:
ValueError: Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must map to single token ids in the vocab.
Instead of the intended:
ValueError: Invalid token: 'hello world'. `stop_token`s and `forbidden_token`s must map to single token ids in the vocab.
Reproduction
from gemma.gm.text._sampler import _normalize_token
class FakeTokenizer:
def encode(self, text):
return [1, 2] # multi-token, will trigger the error
_normalize_token(FakeTokenizer(), 'hello world')
# => ValueError: Invalid token: {token!r}. ... <-- literal, not interpolated
Fix
# Before:
raise ValueError(
'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
' map to single token ids in the vocab.'
)
# After:
raise ValueError(
f'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
' map to single token ids in the vocab.'
)
A one-character fix — just add f before the opening quote.
Impact
- Severity: Low (no correctness impact — the error is still raised), but High UX impact — users who misconfigure
stop_tokens or forbidden_tokens receive a completely useless error message with no information about which token was invalid.
- All platforms affected.
Environment
|
|
| Python |
3.12.13 |
| Platform |
macOS Apple Silicon (arm64) |
| gemma |
4.0.1 (editable install from HEAD) |
Summary
The error message in
_normalize_token()insidegemma/gm/text/_sampler.pyis missing thefprefix, so{token!r}is printed literally instead of interpolating the actual token value. This makes debuggingstop_tokens/forbidden_tokensmisconfigurations nearly impossible.Affected Code
File:
gemma/gm/text/_sampler.py, lines 579–581Root Cause
The string literal uses
{token!r}but there is nofprefix, so Python treats it as a plain string. The user sees the unhelpful message:Instead of the intended:
Reproduction
Fix
A one-character fix — just add
fbefore the opening quote.Impact
stop_tokensorforbidden_tokensreceive a completely useless error message with no information about which token was invalid.Environment