Releases · spacemit-com/llama.cpp

08 Jun 12:04

v0.1.3

285620f

v0.1.3 Latest

Latest

LingBot-MAP 3D Reconstruction Support

This release adds end-to-end LingBot-MAP 3D reconstruction pipeline support in llama-server through the SMT media backend.

Feature

Added LingBot-MAP 3D reconstruction pipeline support
Adds /reconstruct endpoint for multi-image 3D reconstruction
Supports reconstruction post-processing and point cloud output

Usage

Prepare the LingBot-MAP MTMD model directory with config.json, GGUF, and ONNX model files, then start llama-server with:

./llama-server \
  --media-backend smt \
  --smt-config-dir /path/to/lingbot_map_model_dir \
  -t 8 \
  --host 0.0.0.0 \
  --port 8080 \
  --warmup

What's Changed

server: add LingBot-MAP SMT reconstruction pipeline. by @co-seven in #7

Full Changelog: spacemit-llama.cpp.riscv64.0.1.2...v0.1.3

Contributors

co-seven

Assets 3

05 Jun 02:19

github-actions

spacemit-llama.cpp.riscv64.0.1.2

1592cc4

v0.1.2

This release primarily syncs the spacemit-mtmd branch with the latest upstream llama.cpp community progress.

Feature:

Synced upstream llama.cpp and ggml runtime updates, including ggml 0.13.1, KV cache improvements, speculative decoding
updates, reasoning budget support, graph/runtime fixes, and general stability improvements.
Updated the model conversion toolchain. convert_hf_to_gguf.py now follows the upstream modular conversion/ structure,
with broader model architecture support such as DeepSeek, EXAONE, Gemma, Qwen/Qwen-VL, MiniCPM, LFM, Granite, RWKV, and
others.
Updated llama-server functionality, including OpenAI/Anthropic/Responses API compatibility improvements, HTTP ETag
support, SSE behavior fixes, reasoning interruption control, API key file support, timeout updates, and logging
refinements.
Synced the new upstream UI/WebUI structure. The community WebUI has moved from server/webui to the standalone tools/ui
layout, with updated frontend build, test, asset embedding, and release workflows.
Synced backend updates and optimizations across CUDA, Vulkan, OpenCL, SYCL, WebGPU, Metal, Hexagon, CPU, Arm SVE,
LoongArch, ZenDNN, and related runtime paths.
Updated MTMD multimodal support, including upstream DeepSeekOCR 2 support, Gemma 4 projector/audio fixes, MTMD debug
improvements, while keeping Spacemit SMT Vision integration.
Updated CI, Docker, release, and packaging workflows. Upstream CI has been split into more granular CPU, CUDA, Vulkan,
WebGPU, UI, server, and platform-specific workflows.
Updated vendor dependencies and security-related configuration, including cpp-httplib updates, security disclosure
configuration changes, documentation updates, and build instruction refreshes.

Overall, this release brings spacemit-mtmd closer to the current upstream llama.cpp baseline, providing newer model
support, backend improvements, server/API compatibility updates.

Assets 3

22 May 06:33

github-actions

spacemit-llama.cpp.riscv64.0.1.1

17ce6aa

v0.1.1

Bugfix

Fixed llama-server slot erase behavior for SMT multimodal models.

This update allows multimodal SMT backends to use the /slots/{id}?action=erase API to clear slot context correctly. The change is intended for long-running service scenarios where prompt/KV state must be reset between requests. Slot save and restore
restrictions for multimodal mode remain unchanged.

What's Changed

feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
Add SpacemiT MTMD build workflow and documentation by @co-seven in #2
server: allow slot erase for SMT multimodal requests by @co-seven in #3
version: upgrading version number by @co-seven in #4

New Contributors

@co-seven made their first contribution in #1

Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.1

Contributors

co-seven

Assets 3

20 May 06:26

github-actions

spacemit-llama.cpp.riscv64.0.1.0

f798b42

v0.1.0

Feature

This release adds SMT backend support for llama-mtmd, enabling multimodal inference on SpacemiT platforms through a unified llama.cpp integration.

Functional Support

Added SMT backend integration for llama-mtmd
Added multimodal inference support in llama-server
Added end-to-end support for vision-language model inference
Added end-to-end support for speech recognition model inference
Added end-to-end support for OCR and document understanding model inference
Unified model serving flow for SMT-backed multimodal models within the llama.cpp runtime
Added slot context erase support for SMT multimodal server workloads to improve long-running service management

Model Support

FastVLM-0.5B
Qwen3-VL-30B-A3B
Qwen3.5-VL-0.8B
Qwen3.5-VL-2B
Qwen3.5-VL-4B
Qwen3-ASR-0.6B
PaddleOCR-VL-0.9B

What's Changed

feat(mtmd): add SMT backend multimodal inference support for llama-server and llama-mtmd-cli by @co-seven in #1
Add SpacemiT MTMD build workflow and documentation by @co-seven in #2

Full Changelog: b6783...spacemit-llama.cpp.riscv64.0.1.0

Contributors

co-seven

Assets 3

17 Oct 04:07

github-actions

b6783

ceff6bb

b6783

SYCL SET operator optimized for F32 tensors (#16350)

* SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes

* sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups

* move SET op to standalone file, GPU-only implementation

* Update SYCL SET operator for F32

* ci: fix editorconfig issues (LF endings, trailing spaces, final newline)

* fixed ggml-sycl.cpp

---------

Co-authored-by: Gitty Burstein <gitty@example.com>

Assets 15

23 Sep 08:59

github-actions

b6556

264f1b5

b6556

zdnn: refactor codebase + add docs (#16178)

* zdnn: initial matmul refactor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm static from funcs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: update ggml-zdnn.h

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: change header files to hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: switch to common.hpp

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: move mulmat forward around

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rm inline from utils

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* docs: add zDNN docs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Assets 15

19 Sep 04:49

github-actions

b6517

69ffd89

b6517

ggml-amx : fix ggml_amx_init() on generic Linux (#16049)

Generalize Linux check to `__linux__` to support non-glibc systems (like musl).
Also, return `false` on unknown/untested OS.

Without this commit, the code compiles (with warnings) but fails:

    register_backend: registered backend CPU (1 devices)
    register_device: registered device CPU (Intel(R) Xeon(R) Platinum 8488C)
    build: 6487 (51c4cac6) with x86_64-linux-musl-gcc (GCC) 15.1.0 for x86_64-linux-musl (debug)
    system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
    ....
    print_info: n_ctx_orig_yarn  = 262144
    print_info: rope_finetuned   = unknown
    print_info: model type       = 4B
    Illegal instruction (core dumped)

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

Assets 15

18 Sep 03:31

github-actions

b6503

62c3b64

b6503

CANN: Remove print (#16044)

Signed-off-by: noemotiovon <757486878@qq.com>

Assets 15

18 Aug 12:13

github-actions

b6192

618575c

b6192

Fix broken build: require updated pip to support --break-system-packa…

Assets 15

13 Aug 07:23

github-actions

b6141

e71d48e

b6141

ggml-rpc: chunk send()/recv() to avoid EINVAL for very large tensors …

Assets 15

Releases: spacemit-com/llama.cpp

v0.1.3

LingBot-MAP 3D Reconstruction Support

Feature

Usage

What's Changed

Contributors

Uh oh!

v0.1.2

Feature:

Uh oh!

v0.1.1

Bugfix

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.0

Feature

Functional Support

Model Support

What's Changed

Contributors

Uh oh!

b6783

Uh oh!

b6556

Uh oh!

b6517

Uh oh!

b6503

Uh oh!

b6192

Uh oh!

b6141

Uh oh!