Skip to content

Improve testTokenizationUnicode to cover 2-byte and 3-byte UTF-8 sequences; add reserve() to str_to_bytes#73

Merged
bernardladenthin merged 1 commit intomasterfrom
claude/improve-llama-test-utf8-O5iWQ
Apr 5, 2026
Merged

Improve testTokenizationUnicode to cover 2-byte and 3-byte UTF-8 sequences; add reserve() to str_to_bytes#73
bernardladenthin merged 1 commit intomasterfrom
claude/improve-llama-test-utf8-O5iWQ

Conversation

@bernardladenthin
Copy link
Copy Markdown
Owner

  • Expands the test to explicitly verify 2-byte Latin extended (ü, ö, é), 3-byte CJK (日本語), and mixed inputs through the full encode/decode path.
  • Adds bytes.reserve(str.size()) in server.hpp str_to_bytes() to avoid repeated allocations when building the byte vector.

https://claude.ai/code/session_01SjXecefeVUEdmj1VnmYB9a

…ences; add reserve() to str_to_bytes

- Expands the test to explicitly verify 2-byte Latin extended (ü, ö, é),
  3-byte CJK (日本語), and mixed inputs through the full encode/decode path.
- Adds bytes.reserve(str.size()) in server.hpp str_to_bytes() to avoid
  repeated allocations when building the byte vector.

https://claude.ai/code/session_01SjXecefeVUEdmj1VnmYB9a
@bernardladenthin bernardladenthin merged commit 10bc3ed into master Apr 5, 2026
13 of 14 checks passed
@bernardladenthin bernardladenthin deleted the claude/improve-llama-test-utf8-O5iWQ branch April 5, 2026 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants