Releases: spacemit-com/llama.cpp
v0.1.3
LingBot-MAP 3D Reconstruction Support
This release adds end-to-end LingBot-MAP 3D reconstruction pipeline support in llama-server through the SMT media backend.
Feature
- Added LingBot-MAP 3D reconstruction pipeline support
- Adds
/reconstructendpoint for multi-image 3D reconstruction - Supports reconstruction post-processing and point cloud output
Usage
Prepare the LingBot-MAP MTMD model directory with config.json, GGUF, and ONNX model files, then start llama-server with:
./llama-server \
--media-backend smt \
--smt-config-dir /path/to/lingbot_map_model_dir \
-t 8 \
--host 0.0.0.0 \
--port 8080 \
--warmupWhat's Changed
Full Changelog: spacemit-llama.cpp.riscv64.0.1.2...v0.1.3
v0.1.2
This release primarily syncs the spacemit-mtmd branch with the latest upstream llama.cpp community progress.
Feature:
-
Synced upstream llama.cpp and ggml runtime updates, including ggml 0.13.1, KV cache improvements, speculative decoding
updates, reasoning budget support, graph/runtime fixes, and general stability improvements. -
Updated the model conversion toolchain. convert_hf_to_gguf.py now follows the upstream modular conversion/ structure,
with broader model architecture support such as DeepSeek, EXAONE, Gemma, Qwen/Qwen-VL, MiniCPM, LFM, Granite, RWKV, and
others. -
Updated llama-server functionality, including OpenAI/Anthropic/Responses API compatibility improvements, HTTP ETag
support, SSE behavior fixes, reasoning interruption control, API key file support, timeout updates, and logging
refinements. -
Synced the new upstream UI/WebUI structure. The community WebUI has moved from server/webui to the standalone tools/ui
layout, with updated frontend build, test, asset embedding, and release workflows. -
Synced backend updates and optimizations across CUDA, Vulkan, OpenCL, SYCL, WebGPU, Metal, Hexagon, CPU, Arm SVE,
LoongArch, ZenDNN, and related runtime paths. -
Updated MTMD multimodal support, including upstream DeepSeekOCR 2 support, Gemma 4 projector/audio fixes, MTMD debug
improvements, while keeping Spacemit SMT Vision integration. -
Updated CI, Docker, release, and packaging workflows. Upstream CI has been split into more granular CPU, CUDA, Vulkan,
WebGPU, UI, server, and platform-specific workflows. -
Updated vendor dependencies and security-related configuration, including cpp-httplib updates, security disclosure
configuration changes, documentation updates, and build instruction refreshes.
Overall, this release brings spacemit-mtmd closer to the current upstream llama.cpp baseline, providing newer model
support, backend improvements, server/API compatibility updates.
v0.1.1
Bugfix
Fixed llama-server slot erase behavior for SMT multimodal models.
This update allows multimodal SMT backends to use the /slots/{id}?action=erase API to clear slot context correctly. The change is intended for long-running service scenarios where prompt/KV state must be reset between requests. Slot save and restore
restrictions for multimodal mode remain unchanged.
What's Changed
- feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
- Add SpacemiT MTMD build workflow and documentation by @co-seven in #2
- server: allow slot erase for SMT multimodal requests by @co-seven in #3
- version: upgrading version number by @co-seven in #4
New Contributors
Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.1
v0.1.0
Feature
This release adds SMT backend support for llama-mtmd, enabling multimodal inference on SpacemiT platforms through a unified llama.cpp integration.
Functional Support
- Added SMT backend integration for
llama-mtmd - Added multimodal inference support in
llama-server - Added end-to-end support for vision-language model inference
- Added end-to-end support for speech recognition model inference
- Added end-to-end support for OCR and document understanding model inference
- Unified model serving flow for SMT-backed multimodal models within the
llama.cppruntime - Added slot context erase support for SMT multimodal server workloads to improve long-running service management
Model Support
- FastVLM-0.5B
- Qwen3-VL-30B-A3B
- Qwen3.5-VL-0.8B
- Qwen3.5-VL-2B
- Qwen3.5-VL-4B
- Qwen3-ASR-0.6B
- PaddleOCR-VL-0.9B
What's Changed
- feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
- Add SpacemiT MTMD build workflow and documentation by @co-seven in #2
Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.0
b6783
SYCL SET operator optimized for F32 tensors (#16350) * SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes * sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups * move SET op to standalone file, GPU-only implementation * Update SYCL SET operator for F32 * ci: fix editorconfig issues (LF endings, trailing spaces, final newline) * fixed ggml-sycl.cpp --------- Co-authored-by: Gitty Burstein <gitty@example.com>
b6556
zdnn: refactor codebase + add docs (#16178) * zdnn: initial matmul refactor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm static from funcs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update ggml-zdnn.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: change header files to hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to common.hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move mulmat forward around Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm inline from utils Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add zDNN docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
b6517
ggml-amx : fix ggml_amx_init() on generic Linux (#16049)
Generalize Linux check to `__linux__` to support non-glibc systems (like musl).
Also, return `false` on unknown/untested OS.
Without this commit, the code compiles (with warnings) but fails:
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Intel(R) Xeon(R) Platinum 8488C)
build: 6487 (51c4cac6) with x86_64-linux-musl-gcc (GCC) 15.1.0 for x86_64-linux-musl (debug)
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
....
print_info: n_ctx_orig_yarn = 262144
print_info: rope_finetuned = unknown
print_info: model type = 4B
Illegal instruction (core dumped)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
b6503
CANN: Remove print (#16044) Signed-off-by: noemotiovon <757486878@qq.com>
b6192
Fix broken build: require updated pip to support --break-system-packa…
b6141
ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …