Upgrade llama.cpp from b8854 to b8887#94
Open
bernardladenthin wants to merge 16 commits intomasterfrom
Open
Conversation
Breaking API changes addressed in server.hpp: - common_chat_msg_diff_to_json_oaicompat removed from common/chat.h; define equivalent server_chat_msg_diff_to_json_oaicompat locally - params_base.reasoning_budget → params_base.sampling.reasoning_budget_tokens https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
Instead of a local copy, compile tools/server/server-chat.cpp into jllama and include tools/server/server-chat.h directly — the same pattern used for tools/mtmd. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
…nflicts server-chat.h transitively pulls in server-common.h which redefines macros and types already present in utils.hpp (SLT_* macros, json_value, etc.). Instead of including the header, forward-declare server_chat_msg_diff_to_json_oaicompat after the json using-declaration in server.hpp. The upstream server-chat.cpp is still compiled directly into both jllama and jllama_test (with tools/server in their include paths), so the linker resolves the symbol from upstream code. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
Importing server-chat.cpp causes a linker error: convert_transcriptions_to_chatcmpl (unused by jllama) calls get_media_marker() which is defined in server-common.cpp. Compiling server-common.cpp would cascade further. The local static definition in server.hpp is self-contained and avoids the entire server-common.h dependency chain. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
…/cpp utils.hpp: add #include "server-common.h" which provides JSON_ASSERT, json, raw_buffer, json_value<T>, server_grammar_trigger, server_tokens, error_type, SRV_* macros, and 14 identical utility functions. All identical static definitions are deleted; the SLT_* macros are redefined after the include to keep our simpler (slot).id_task access; the tokenize_input_prompts / tokens_to_str overloads with different signatures are retained. server.hpp: remove now-redundant `using json` and `enum error_type` (both provided by server-common.h); update process_chunk call to the upstream signature (idx + pos + n_tokens_out) and find_chunk call to size_t index. CMakeLists.txt: compile server-common.cpp into both jllama and jllama_test targets; add tools/server to their include paths. net change: -668 lines (utils.hpp 1367 → 699) https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
utils.hpp: restore base64_chars/is_base64/base64_decode — they are static-internal in server-common.cpp and not visible from outside that translation unit, so the local copies are still required. server.hpp: remove local static format_error_response — it is now declared in server-common.h (non-static) and defined in server-common.cpp; the local static definition created an ambiguous overload. The implementations are identical. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
Upstream oaicompat_completion_params_parse (b8887) removed the n != 1 check; the value is now forwarded as-is. Rename the test and flip the expectation from EXPECT_THROW to EXPECT_NO_THROW. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
…_oaicompat, format_response_rerank Remove local 3-function definitions from utils.hpp now that server-common.h/cpp provides them. Update all call sites: - tokenize_input_prompts: add nullptr mctx arg, change return to vector<server_tokens>, use .get_tokens() where llama_tokens is needed - format_embeddings_response_oaicompat: add model_name param extracted from request - format_response_rerank: add model_name param extracted from request https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
…f versions server-common.h declares tokens_to_str(ctx, const llama_tokens&) and tokens_to_str(vocab, const llama_tokens&); the local templates (Iter begin, end) were the only callers. Update both call sites in jllama.cpp to pass the vector directly and remove the now-redundant templates from utils.hpp. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
….hpp base64_chars/is_base64/base64_decode are static in server-common.cpp (internal linkage) so cannot be called from other TUs even though they're in the same .so. Mark the block with BEGIN/END fences and explain exactly why removal requires upstream to change static to inline in a header. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
…ver-chat.cpp server-chat.cpp only depends on server-common.h symbols (json_value, string_format, get_media_marker), all of which are already provided by the server-common.cpp added to the build earlier. Adding server-chat.cpp to both jllama and jllama_test targets unblocks the include of server-chat.h, so the 30-line local copy in server.hpp can be deleted. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
server_chat_params is the superset struct from server-common.h; upstream oaicompat_chat_params_parse (in server-common.cpp) is already linked into the build. Adopt them directly: - utils.hpp: drop the ~260-line local struct + function copy - server.hpp: change field type, convert aggregate init to named-field assignments (tmpls now a shared_ptr, no .get()) - test_utils.cpp: rename struct, tighten EXPECT_THROW to std::exception (upstream throws std::invalid_argument for some validation errors that the local copy threw as std::runtime_error) New behaviour inherited from upstream: media_marker + handle_media() for image handling, reasoning_budget pass-through, chat_parser propagation, content_parts support in prefill_assistant, grammar_type=tool_calls, chat_template_kwargs from CLI merged with body. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
common_chat_templates_ptr is a unique_ptr, so it cannot be copy-assigned. Remove the separate chat_templates field and assign oai_parser_opt.tmpls directly during load_model(), eliminating the broken copy in init(). https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
Two LOG_INF call sites still used ctx_server->chat_templates which was removed in the previous commit. Update them to ctx_server->oai_parser_opt.tmpls. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
enable_thinking was always true because reasoning_budget_tokens defaults to -1 and the != 0 check fires. Mirror the upstream server-context.cpp logic: enable only when enable_reasoning != 0 AND the chat template actually supports thinking. This prevents the "prefill incompatible with thinking" error for models like CodeLlama that have no thinking support in their template. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
Replace manual JSON iteration with a call to upstream's parse_lora_request(json) (from server-common.cpp) which returns a map<int,float>. The local overload still owns the lora_base copy and ID bounds check; the JSON extraction is no longer duplicated. https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Breaking API changes addressed in server.hpp:
https://claude.ai/code/session_01CzAdkHfyKwKGu7GzkoYK2B