Skip to content

Bug: Missing f-prefix in error message in _normalize_token() makes stop_tokens debugging impossible #658

@prince-shakyaa

Description

@prince-shakyaa

Summary

The error message in _normalize_token() inside gemma/gm/text/_sampler.py is missing the f prefix, so {token!r} is printed literally instead of interpolating the actual token value. This makes debugging stop_tokens / forbidden_tokens misconfigurations nearly impossible.

Affected Code

File: gemma/gm/text/_sampler.py, lines 579–581

# Current (broken — NOT an f-string):
raise ValueError(
    'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
    ' map to single token ids in the vocab.'
)

Root Cause

The string literal uses {token!r} but there is no f prefix, so Python treats it as a plain string. The user sees the unhelpful message:

ValueError: Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must map to single token ids in the vocab.

Instead of the intended:

ValueError: Invalid token: 'hello world'. `stop_token`s and `forbidden_token`s must map to single token ids in the vocab.

Reproduction

from gemma.gm.text._sampler import _normalize_token

class FakeTokenizer:
    def encode(self, text):
        return [1, 2]  # multi-token, will trigger the error

_normalize_token(FakeTokenizer(), 'hello world')
# => ValueError: Invalid token: {token!r}. ...   <-- literal, not interpolated

Fix

# Before:
raise ValueError(
    'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
    ' map to single token ids in the vocab.'
)

# After:
raise ValueError(
    f'Invalid token: {token!r}. `stop_token`s and `forbidden_token`s must'
    ' map to single token ids in the vocab.'
)

A one-character fix — just add f before the opening quote.

Impact

  • Severity: Low (no correctness impact — the error is still raised), but High UX impact — users who misconfigure stop_tokens or forbidden_tokens receive a completely useless error message with no information about which token was invalid.
  • All platforms affected.

Environment

Python 3.12.13
Platform macOS Apple Silicon (arm64)
gemma 4.0.1 (editable install from HEAD)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions