-
Notifications
You must be signed in to change notification settings - Fork 17.1k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Modality conditional adapters
examples
server
testing
Everything test related
#22184
opened Apr 20, 2026 by
gabe-l-hart
Collaborator
Loading…
Optimize reduction stage of dot product of q4_L/q5_K to q8_K on AVX2
ggml
changes relating to the ggml tensor library for machine learning
#22181
opened Apr 20, 2026 by
nariox
Loading…
vulkan: Support F16 OP_FILL
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#22177
opened Apr 20, 2026 by
jeffbolznv
Contributor
Loading…
cuda: disable MMQ stream-k by default for tensor-split MoE
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#22174
opened Apr 20, 2026 by
nisparks
Contributor
Loading…
fit-params : refactor + add option to output estimated memory per device
breaking change
Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility.
examples
server
#22171
opened Apr 20, 2026 by
ggerganov
Member
Loading…
ngram-mod: Reset i_last when low acceptance streak occurs
#22168
opened Apr 20, 2026 by
treo
Loading…
Fix incorrect assertion
ggml
changes relating to the ggml tensor library for machine learning
#22167
opened Apr 20, 2026 by
fiesh
Loading…
server: Allow continue in thinking (reasoning prefill)
examples
server
#22162
opened Apr 20, 2026 by
roj234
Contributor
Loading…
sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22156
opened Apr 20, 2026 by
aicss-genai
Loading…
server: Support
chat_template_kwargs for /v1/messages
examples
server
#22154
opened Apr 20, 2026 by
Soreepeong
Loading…
sycl: add GGML_SYCL_USE_ASYNC_MEM_OP env toggle
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22153
opened Apr 20, 2026 by
aicss-genai
Loading…
sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22152
opened Apr 20, 2026 by
aicss-genai
Loading…
cmake : skip project() when consumed as a subdirectory (#20415)
build
Compilation issues
ggml
changes relating to the ggml tensor library for machine learning
#22151
opened Apr 20, 2026 by
jinweihan-ai
•
Draft
4 tasks done
sycl: route small f32 matmuls to oneMKL, bypass oneDNN
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22150
opened Apr 20, 2026 by
aicss-genai
Loading…
sycl: add FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, GATED_DELTA_NET
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22149
opened Apr 20, 2026 by
aicss-genai
Loading…
sycl: support non-contiguous input in PAD op
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22148
opened Apr 20, 2026 by
aicss-genai
Loading…
sycl: Battlemage AOT build via spir64_gen + MMQ subgroup annotations
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#22147
opened Apr 20, 2026 by
aicss-genai
Loading…
server : add missing return after validation errors in /infill endpoint
examples
python
python script changes
server
#22146
opened Apr 20, 2026 by
JoongHyuk-Shin
Loading…
server : do not cap slot context to training context (#22140)
examples
server
#22145
opened Apr 20, 2026 by
jinweihan-ai
Loading…
3 tasks done
vendor : update cpp-httplib to 0.43.1
python
python script changes
script
Script related
#22143
opened Apr 20, 2026 by
cabelo
Contributor
Loading…
CUDA FA: run KV_max mask scan for all Q batch sizes
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#22137
opened Apr 20, 2026 by
ssam18
Contributor
Loading…
ggml : vectorize Q6_K unpack on WASM SIMD128 (strict, deterministic)
ggml
changes relating to the ggml tensor library for machine learning
#22134
opened Apr 19, 2026 by
Simlowker
Loading…
8 tasks done
Previous Next
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.