Skip to content

Releases: spacemit-com/llama.cpp

v0.1.3

08 Jun 12:04
285620f

Choose a tag to compare

LingBot-MAP 3D Reconstruction Support

This release adds end-to-end LingBot-MAP 3D reconstruction pipeline support in llama-server through the SMT media backend.

Feature

  • Added LingBot-MAP 3D reconstruction pipeline support
  • Adds /reconstruct endpoint for multi-image 3D reconstruction
  • Supports reconstruction post-processing and point cloud output

Usage

Prepare the LingBot-MAP MTMD model directory with config.json, GGUF, and ONNX model files, then start llama-server with:

./llama-server \
  --media-backend smt \
  --smt-config-dir /path/to/lingbot_map_model_dir \
  -t 8 \
  --host 0.0.0.0 \
  --port 8080 \
  --warmup

What's Changed

  • server: add LingBot-MAP SMT reconstruction pipeline. by @co-seven in #7

Full Changelog: spacemit-llama.cpp.riscv64.0.1.2...v0.1.3

v0.1.2

05 Jun 02:19
1592cc4

Choose a tag to compare

This release primarily syncs the spacemit-mtmd branch with the latest upstream llama.cpp community progress.

Feature:

  • Synced upstream llama.cpp and ggml runtime updates, including ggml 0.13.1, KV cache improvements, speculative decoding
    updates, reasoning budget support, graph/runtime fixes, and general stability improvements.

  • Updated the model conversion toolchain. convert_hf_to_gguf.py now follows the upstream modular conversion/ structure,
    with broader model architecture support such as DeepSeek, EXAONE, Gemma, Qwen/Qwen-VL, MiniCPM, LFM, Granite, RWKV, and
    others.

  • Updated llama-server functionality, including OpenAI/Anthropic/Responses API compatibility improvements, HTTP ETag
    support, SSE behavior fixes, reasoning interruption control, API key file support, timeout updates, and logging
    refinements.

  • Synced the new upstream UI/WebUI structure. The community WebUI has moved from server/webui to the standalone tools/ui
    layout, with updated frontend build, test, asset embedding, and release workflows.

  • Synced backend updates and optimizations across CUDA, Vulkan, OpenCL, SYCL, WebGPU, Metal, Hexagon, CPU, Arm SVE,
    LoongArch, ZenDNN, and related runtime paths.

  • Updated MTMD multimodal support, including upstream DeepSeekOCR 2 support, Gemma 4 projector/audio fixes, MTMD debug
    improvements, while keeping Spacemit SMT Vision integration.

  • Updated CI, Docker, release, and packaging workflows. Upstream CI has been split into more granular CPU, CUDA, Vulkan,
    WebGPU, UI, server, and platform-specific workflows.

  • Updated vendor dependencies and security-related configuration, including cpp-httplib updates, security disclosure
    configuration changes, documentation updates, and build instruction refreshes.

Overall, this release brings spacemit-mtmd closer to the current upstream llama.cpp baseline, providing newer model
support, backend improvements, server/API compatibility updates.

v0.1.1

22 May 06:33
17ce6aa

Choose a tag to compare

Bugfix

Fixed llama-server slot erase behavior for SMT multimodal models.

This update allows multimodal SMT backends to use the /slots/{id}?action=erase API to clear slot context correctly. The change is intended for long-running service scenarios where prompt/KV state must be reset between requests. Slot save and restore
restrictions for multimodal mode remain unchanged.

What's Changed

  • feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
  • Add SpacemiT MTMD build workflow and documentation by @co-seven in #2
  • server: allow slot erase for SMT multimodal requests by @co-seven in #3
  • version: upgrading version number by @co-seven in #4

New Contributors

Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.1

v0.1.0

20 May 06:26
f798b42

Choose a tag to compare

Feature

This release adds SMT backend support for llama-mtmd, enabling multimodal inference on SpacemiT platforms through a unified llama.cpp integration.

Functional Support

  • Added SMT backend integration for llama-mtmd
  • Added multimodal inference support in llama-server
  • Added end-to-end support for vision-language model inference
  • Added end-to-end support for speech recognition model inference
  • Added end-to-end support for OCR and document understanding model inference
  • Unified model serving flow for SMT-backed multimodal models within the llama.cpp runtime
  • Added slot context erase support for SMT multimodal server workloads to improve long-running service management

Model Support

  • FastVLM-0.5B
  • Qwen3-VL-30B-A3B
  • Qwen3.5-VL-0.8B
  • Qwen3.5-VL-2B
  • Qwen3.5-VL-4B
  • Qwen3-ASR-0.6B
  • PaddleOCR-VL-0.9B

What's Changed

  • feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
  • Add SpacemiT MTMD build workflow and documentation by @co-seven in #2

Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.0

b6783

17 Oct 04:07
ceff6bb

Choose a tag to compare

SYCL SET operator optimized for F32 tensors (#16350)

* SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes

* sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups

* move SET op to standalone file, GPU-only implementation

* Update SYCL SET operator for F32

* ci: fix editorconfig issues (LF endings, trailing spaces, final newline)

* fixed ggml-sycl.cpp

---------

Co-authored-by: Gitty Burstein <gitty@example.com>

b6556

23 Sep 08:59
264f1b5

Choose a tag to compare

zdnn: refactor codebase + add docs (#16178)

* zdnn: initial matmul refactor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm static from funcs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: update ggml-zdnn.h

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: change header files to hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: switch to common.hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: move mulmat forward around

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm inline from utils

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* docs: add zDNN docs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

b6517

19 Sep 04:49
69ffd89

Choose a tag to compare

ggml-amx : fix ggml_amx_init() on generic Linux (#16049)

Generalize Linux check to `__linux__` to support non-glibc systems (like musl).
Also, return `false` on unknown/untested OS.

Without this commit, the code compiles (with warnings) but fails:

    register_backend: registered backend CPU (1 devices)
    register_device: registered device CPU (Intel(R) Xeon(R) Platinum 8488C)
    build: 6487 (51c4cac6) with x86_64-linux-musl-gcc (GCC) 15.1.0 for x86_64-linux-musl (debug)
    system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
    ....
    print_info: n_ctx_orig_yarn  = 262144
    print_info: rope_finetuned   = unknown
    print_info: model type       = 4B
    Illegal instruction (core dumped)

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

b6503

18 Sep 03:31
62c3b64

Choose a tag to compare

CANN: Remove print (#16044)

Signed-off-by: noemotiovon <757486878@qq.com>

b6192

18 Aug 12:13
618575c

Choose a tag to compare

Fix broken build: require updated pip to support --break-system-packa…

b6141

13 Aug 07:23
e71d48e

Choose a tag to compare

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …