perf: reduce JNI calls in parse_jstring and add Unicode test by bernardladenthin · Pull Request #72 · bernardladenthin/java-llama.cpp

bernardladenthin · 2026-04-04T21:16:39Z

Replace the 6-JNI-call + Java byte[] allocation path in parse_jstring with GetStringUTFLength/GetStringUTFRegion (3 calls, no heap object). This removes the need for c_standard_charsets, c_string, m_get_bytes, f_utf_8, and o_utf_8 global references, simplifying JNI_OnLoad and JNI_OnUnload. The optimisation applies to every JNI entry point (generate, encode, embed, rerank, applyTemplate, …).

Also add bytes.reserve(str.size()) in server.hpp str_to_bytes to eliminate repeated vector reallocations when serialising token probability data.

New test testTokenizationUnicode covers 2-byte (Latin extended) and 3-byte (CJK) UTF-8 sequences through the full model encode/decode path, proving the parse_jstring change handles multi-byte input correctly.

https://claude.ai/code/session_01VcEh4bQMPEUzqZDkgeAxXM

Replace the 6-JNI-call + Java byte[] allocation path in parse_jstring with GetStringUTFLength/GetStringUTFRegion (3 calls, no heap object). This removes the need for c_standard_charsets, c_string, m_get_bytes, f_utf_8, and o_utf_8 global references, simplifying JNI_OnLoad and JNI_OnUnload. The optimisation applies to every JNI entry point (generate, encode, embed, rerank, applyTemplate, …). Also add bytes.reserve(str.size()) in server.hpp str_to_bytes to eliminate repeated vector reallocations when serialising token probability data. New test testTokenizationUnicode covers 2-byte (Latin extended) and 3-byte (CJK) UTF-8 sequences through the full model encode/decode path, proving the parse_jstring change handles multi-byte input correctly. https://claude.ai/code/session_01VcEh4bQMPEUzqZDkgeAxXM

Replace the per-token JSON round-trip in receiveCompletionJson with a new receiveCompletionBytes JNI function that uses dynamic_cast to access the content field directly on the result struct. Old per-token path (called for every generated token): to_json() - build JSON DOM with 7+ fields, heap allocations response.dump() - serialize JSON DOM to std::string NewStringUTF() - convert UTF-8 to UTF-16 Java String LlamaOutput.fromJson() - char-by-char JSON scan in Java (StringBuilder) json.contains("stop") - string scan for stop flag New per-token path: dynamic_cast<> - O(1) RTTI lookup (already used in server.hpp) content field - O(1) direct field access SetByteArrayRegion - single memcpy into pre-allocated buffer bytes[0] - stop flag check (array index) new String(bytes) - single UTF-8 decode LlamaIterator and LlamaModel#complete() now use receiveCompletionBytes. LlamaOutput gains fromBytes() as the canonical fast constructor. receiveCompletionJson is kept for binary compatibility but deprecated. New unit tests in LlamaOutputTest cover all fromBytes() cases including empty content, stop flag, 2-byte UTF-8 (Latin), 3-byte UTF-8 (CJK), mixed content, raw escape characters, and regression equivalence vs fromJson. New integration test testStreamingMatchesComplete verifies the byte path produces the same output length as the non-streaming path. https://claude.ai/code/session_01VcEh4bQMPEUzqZDkgeAxXM

The method was added twice in LlamaModelTest – once in commit 426734b and again in c957210. The second (simpler) copy caused a compilation error. Removed the duplicate; the more comprehensive first version (covering 2-byte Latin, 3-byte CJK, and mixed strings) is kept. https://claude.ai/code/session_01VcEh4bQMPEUzqZDkgeAxXM

@deprecated

receiveCompletionJson is fully replaced by receiveCompletionBytes. Removed: - JNI function Java_de_kherud_llama_LlamaModel_receiveCompletionJson from jllama.cpp - Declaration in jllama.h (replaced with receiveCompletionBytes entry) - Native method declaration in LlamaModel.java - @deprecated on fromJson in LlamaOutput.java Updated all callers in tests (ChatAdvancedTest, ChatScenarioTest) to use receiveCompletionBytes + LlamaOutput.fromBytes() directly. https://claude.ai/code/session_01VcEh4bQMPEUzqZDkgeAxXM

1. jllama.cpp: streaming final result must return empty content server_task_result_cmpl_final::to_json_non_oaicompat() uses stream ? "" : content (line 727 of server.hpp). The dynamic_cast path was bypassing this and returning the full accumulated text, which doubled the output in testGenerateGrammar and caused testStreamingMatchesComplete to fail. Added the same stream check. 2. LlamaModelTest.testStreamingMatchesComplete: assertFalse was logically inverted — assertFalse(length > 0) asserted the output IS empty. Changed to assertTrue. 3. ChatAdvancedTest: fromBytes() intentionally does not surface probability data, so testSetNProbsStreamingJsonHasProbabilities always failed. Renamed to testSetNProbsStreamingCompletesNormally and replaced the foundProbabilities assertion with a check that streaming completes cleanly when nProbs is configured. https://claude.ai/code/session_01VcEh4bQMPEUzqZDkgeAxXM

claude added 5 commits April 4, 2026 21:15

bernardladenthin closed this Apr 5, 2026

bernardladenthin deleted the claude/optimize-cpp-performance-fOzmO branch April 5, 2026 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce JNI calls in parse_jstring and add Unicode test#72

perf: reduce JNI calls in parse_jstring and add Unicode test#72
bernardladenthin wants to merge 5 commits intomasterfrom
claude/optimize-cpp-performance-fOzmO

bernardladenthin commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bernardladenthin commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants