Upgrade llama.cpp from b8854 to b8887 by bernardladenthin · Pull Request #94 · bernardladenthin/java-llama.cpp

bernardladenthin · 2026-04-22T21:28:16Z

Breaking API changes addressed in server.hpp:

common_chat_msg_diff_to_json_oaicompat removed from common/chat.h; define equivalent server_chat_msg_diff_to_json_oaicompat locally
params_base.reasoning_budget → params_base.sampling.reasoning_budget_tokens

https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

Breaking API changes addressed in server.hpp: - common_chat_msg_diff_to_json_oaicompat removed from common/chat.h; define equivalent server_chat_msg_diff_to_json_oaicompat locally - params_base.reasoning_budget → params_base.sampling.reasoning_budget_tokens https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

Instead of a local copy, compile tools/server/server-chat.cpp into jllama and include tools/server/server-chat.h directly — the same pattern used for tools/mtmd. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

…nflicts server-chat.h transitively pulls in server-common.h which redefines macros and types already present in utils.hpp (SLT_* macros, json_value, etc.). Instead of including the header, forward-declare server_chat_msg_diff_to_json_oaicompat after the json using-declaration in server.hpp. The upstream server-chat.cpp is still compiled directly into both jllama and jllama_test (with tools/server in their include paths), so the linker resolves the symbol from upstream code. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

Importing server-chat.cpp causes a linker error: convert_transcriptions_to_chatcmpl (unused by jllama) calls get_media_marker() which is defined in server-common.cpp. Compiling server-common.cpp would cascade further. The local static definition in server.hpp is self-contained and avoids the entire server-common.h dependency chain. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

…/cpp utils.hpp: add #include "server-common.h" which provides JSON_ASSERT, json, raw_buffer, json_value<T>, server_grammar_trigger, server_tokens, error_type, SRV_* macros, and 14 identical utility functions. All identical static definitions are deleted; the SLT_* macros are redefined after the include to keep our simpler (slot).id_task access; the tokenize_input_prompts / tokens_to_str overloads with different signatures are retained. server.hpp: remove now-redundant `using json` and `enum error_type` (both provided by server-common.h); update process_chunk call to the upstream signature (idx + pos + n_tokens_out) and find_chunk call to size_t index. CMakeLists.txt: compile server-common.cpp into both jllama and jllama_test targets; add tools/server to their include paths. net change: -668 lines (utils.hpp 1367 → 699) https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

utils.hpp: restore base64_chars/is_base64/base64_decode — they are static-internal in server-common.cpp and not visible from outside that translation unit, so the local copies are still required. server.hpp: remove local static format_error_response — it is now declared in server-common.h (non-static) and defined in server-common.cpp; the local static definition created an ambiguous overload. The implementations are identical. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

Upstream oaicompat_completion_params_parse (b8887) removed the n != 1 check; the value is now forwarded as-is. Rename the test and flip the expectation from EXPECT_THROW to EXPECT_NO_THROW. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

…_oaicompat, format_response_rerank Remove local 3-function definitions from utils.hpp now that server-common.h/cpp provides them. Update all call sites: - tokenize_input_prompts: add nullptr mctx arg, change return to vector<server_tokens>, use .get_tokens() where llama_tokens is needed - format_embeddings_response_oaicompat: add model_name param extracted from request - format_response_rerank: add model_name param extracted from request https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

…f versions server-common.h declares tokens_to_str(ctx, const llama_tokens&) and tokens_to_str(vocab, const llama_tokens&); the local templates (Iter begin, end) were the only callers. Update both call sites in jllama.cpp to pass the vector directly and remove the now-redundant templates from utils.hpp. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

….hpp base64_chars/is_base64/base64_decode are static in server-common.cpp (internal linkage) so cannot be called from other TUs even though they're in the same .so. Mark the block with BEGIN/END fences and explain exactly why removal requires upstream to change static to inline in a header. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

…ver-chat.cpp server-chat.cpp only depends on server-common.h symbols (json_value, string_format, get_media_marker), all of which are already provided by the server-common.cpp added to the build earlier. Adding server-chat.cpp to both jllama and jllama_test targets unblocks the include of server-chat.h, so the 30-line local copy in server.hpp can be deleted. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

server_chat_params is the superset struct from server-common.h; upstream oaicompat_chat_params_parse (in server-common.cpp) is already linked into the build. Adopt them directly: - utils.hpp: drop the ~260-line local struct + function copy - server.hpp: change field type, convert aggregate init to named-field assignments (tmpls now a shared_ptr, no .get()) - test_utils.cpp: rename struct, tighten EXPECT_THROW to std::exception (upstream throws std::invalid_argument for some validation errors that the local copy threw as std::runtime_error) New behaviour inherited from upstream: media_marker + handle_media() for image handling, reasoning_budget pass-through, chat_parser propagation, content_parts support in prefill_assistant, grammar_type=tool_calls, chat_template_kwargs from CLI merged with body. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

common_chat_templates_ptr is a unique_ptr, so it cannot be copy-assigned. Remove the separate chat_templates field and assign oai_parser_opt.tmpls directly during load_model(), eliminating the broken copy in init(). https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

Two LOG_INF call sites still used ctx_server->chat_templates which was removed in the previous commit. Update them to ctx_server->oai_parser_opt.tmpls. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

enable_thinking was always true because reasoning_budget_tokens defaults to -1 and the != 0 check fires. Mirror the upstream server-context.cpp logic: enable only when enable_reasoning != 0 AND the chat template actually supports thinking. This prevents the "prefill incompatible with thinking" error for models like CodeLlama that have no thinking support in their template. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

Replace manual JSON iteration with a call to upstream's parse_lora_request(json) (from server-common.cpp) which returns a map<int,float>. The local overload still owns the lora_base copy and ID bounds check; the JSON extraction is no longer duplicated. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

claude added 16 commits April 22, 2026 20:54

Import server_chat_msg_diff_to_json_oaicompat from server-chat.h

cc21d6a

Instead of a local copy, compile tools/server/server-chat.cpp into jllama and include tools/server/server-chat.h directly — the same pattern used for tools/mtmd. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

Fix remaining chat_templates references in jllama.cpp

634e563

Two LOG_INF call sites still used ctx_server->chat_templates which was removed in the previous commit. Update them to ctx_server->oai_parser_opt.tmpls. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade llama.cpp from b8854 to b8887#94

Upgrade llama.cpp from b8854 to b8887#94
bernardladenthin wants to merge 16 commits intomasterfrom
claude/update-openvino-b8887-zijoE

bernardladenthin commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bernardladenthin commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants