Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
b1b478b
Classify tool-call tokens via SampledTokenClassifier
mcharytoniuk May 5, 2026
5d74356
Diff-based tool-call marker detection works around brittle autoparser…
mcharytoniuk May 5, 2026
47e0242
Cover the SampledTokenClassifier markers getter
mcharytoniuk May 5, 2026
820e476
Address clippy warnings on bindings test arms
mcharytoniuk May 5, 2026
bbfa321
Expose common_chat_parse via opaque-handle FFI
mcharytoniuk May 5, 2026
d14777c
Extract llama-cpp-bindings-types crate; ToolCallArguments enum at FFI…
mcharytoniuk May 5, 2026
a6d7374
Cache per-model toktrie env and classifier markers; consolidate ffi s…
mcharytoniuk May 6, 2026
925c94f
Rewrite sampled token classifier with prompt-token replay; add per-mo…
mcharytoniuk May 6, 2026
f17d508
Tool-call template overrides registry with ToolCallArgsShape variants…
mcharytoniuk May 6, 2026
fc88f15
Merge branch 'main' into classify-tool-call-tokens
malzag May 7, 2026
bd2844a
Pre-merge quality pass: dedup C++ helpers, port marker extraction to …
mcharytoniuk May 7, 2026
fd96d22
Merge remote-tracking branch 'origin/classify-tool-call-tokens' into …
mcharytoniuk May 7, 2026
6084fdf
Restore llama.cpp submodule to 846262d (May 4) after merge accidental…
mcharytoniuk May 7, 2026
97f5f1b
Fix coverage gate: combine library unit tests with LLM integration te…
mcharytoniuk May 7, 2026
b4b8fe4
Make llguidance unconditional and add tests pushing line coverage to …
mcharytoniuk May 7, 2026
93d09e1
Fold template-override fallback parsers (nom-based) and tool-call id …
mcharytoniuk May 7, 2026
8575266
Add GLM-4.7 key-value XML tool-call parser and per-model classifier c…
mcharytoniuk May 8, 2026
9c81fab
Replay multimodal text-chunk tokens through marker state machine so r…
mcharytoniuk May 8, 2026
cff3a77
Process multimodal chunks in a single pass with split start/final pos…
mcharytoniuk May 8, 2026
01c9912
Recover tool calls via wrapper parser when C++ chat autoparser throws
malzag May 9, 2026
98f9fe8
Detect markerless JSON tool calls via streaming probe in classifier a…
mcharytoniuk May 10, 2026
01f20aa
clean up makefile
mcharytoniuk May 11, 2026
8778138
clean up makefile
mcharytoniuk May 11, 2026
09f81a9
fix metal shutdown errors
malzag May 11, 2026
e39dbd1
Refuse oversized image chunks in eval_single with typed error instead…
malzag May 11, 2026
6e4614c
Silence Darwin ar -D warnings by overriding cmake archive recipes; ac…
malzag May 12, 2026
3136323
add claude rules
mcharytoniuk May 12, 2026
b8dcecf
Pin workspace dependencies to exact versions and consolidate via inhe…
mcharytoniuk May 12, 2026
8f9a636
Apply rule-compliance sweep and break context↔model cycle
mcharytoniuk May 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .claude/rules/code-style.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Coding Standards

- Keep at most a single public struct per module.
- Keep at most a single public function per module (multiple public struct methods are OK).
- Keep module names elegant and clearly readable. The name of the module, or any file, should be enough to determine its contents unambiguously.
- Keep modules structure as flat as possible, avoid logical grouping of modules, instead keep the naming consistent.
- Keep standalone, private functions and structs above the public struct or function that is exported.
- Group the modules by name prefix. For example, `client_foo`, `client_bar`, etc., wherever it makes sense to do so.
- Decide to group the modules based on software architecture, messaging hierarchy, or inheritance. Do not group modules just for the sake of it.
- Maintain a tree-like structure of modules, avoid circular dependencies at all costs. Extract common functions or structs into separate modules, or separate subprojects in the workspace.
- Name files the same way as the struct or function they contain.
- Be explicit, do not use general import statements that involve "*", prefer to import everything explicitly.
- Do not use copy-pasted or copied code in any capacity. If you have issues extracting something into a module, discuss the steps first.
- Keeping slightly different message types, or other kinds of structs that are only slightly different, because of the context they are used in, is fine.
- Each function or method should do just a single thing. The single responsibility principle is really important.
- Always use descriptive and explicit variable names, even in anonymous functions. Never use single-letter variable names.
- Instead of writing comments that explain what the code does, make the code self-documenting.
- Handle all the errors; never ignore them. Make sure the application does not panic.
- Use object-oriented style and composition. Avoid functions that take a struct as a parameter; move it to the struct implementation instead.
- Avoid unnecessary abstractions.
- Before using vendor crates or modules, make sure they are well-maintained, secure, and documented.
- Always make sure there is only one valid way to do a specific task in the codebase. Make sure everything has a single source of truth.
- Prefer using data/value objects instead of inline types
5 changes: 5 additions & 0 deletions .claude/rules/commits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Committing Changes

- Always keep the commit messages short, human-readable, and descriptive. Keep commit messages as one-liners.
- Do not add any metadata to commits.
- Describe what the changes actually do instead of listing the changed files.
19 changes: 19 additions & 0 deletions .claude/rules/rust.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
paths:
- "**/*.rs"
- "**/Cargo.toml"
---

# Rust Standards

- Do not inline import paths unless necessary. Prefer to use `use` statements in Rust files instead of inline paths to imported modules. The exception would be `error.rs` type modules that handle lib-level error structs.
- Always use explicit lifetime variable names (do not use `'a` and such, use descriptive names like `'message` or similar)
- Always use explicit generic parameter names (never use single letter names like `T` for generics, prefix all of them with `T`, however). For example, use `TMessage` instead of `T`, etc.
- Do not use `pub(crate)` in Rust; in case of doubt, just make things public.
- In Rust, never ignore errors with `Err(_)`; always make sure you are matching an expected error variant instead.
- Never use `.expect`, or `.unwrap`. In Rust, if a function can fail, use a matching Result (can be from the anyhow crate) instead. In case of doubt on this, ask. Allow `.expect` in mutex lock poison checks, or when integrating CPP libraries into Rust.
- Always make sure mutex locks are held for the shortest possible time.
- Always specify Rust dependencies in root Cargo.toml, then use workspace versions of packages in workspace members.
- In Rust, when implementing a `new` method in a struct, prefer to use a struct with a parameter list instead of multiple function arguments. It should be easier to maintain.
- Always check the project with Clippy.
- Always format the code with `cargo fmt`.
7 changes: 7 additions & 0 deletions .claude/rules/teamwork.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Teamwork and Project Organization

Team members own one module each. The project needs to be organized around small self-contained modules.

Each class, struct, function, interface, trait, and alike needs to be named after its functionality in self-descriptive English. The goal is to name things in a way that will allow anyone to understand the project organization, and goals by just listing the directory of files.

Developers need to be able to own their own modules without stepping on another's work.
14 changes: 14 additions & 0 deletions .claude/rules/testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Unit Tests and Quality Control

- Always check that the unit tests pass.
- Always test the code, make sure tests work after the changes.
- Always write tests that check the algorithms, or meaningful edge cases. Never write tests that check things that can be handled by types instead.
- If some piece of code can be handled by proper types, use types instead. Write tests as a last resort.
- In unit tests, make sure there is always just a single correct way to do a specific thing. Never accept fuzzy inputs from end users.
- When working on tests, if you notice that the tested code can be better, you can suggest changes.
- Maintain 100% test coverage across the codebase. No file, branch, or line may be excluded from coverage reports.
- Reach 100% coverage with the minimum number of tests. Each test must cover a unique code path, behavior, or edge case that no other test already covers.
- If two tests cover overlapping paths, remove the weaker one. Redundant tests waste maintenance effort without improving correctness signal.
- Tests must exercise actual functionality and observable behavior. Never write a test purely to hit lines for the sake of coverage.
- Design tests deliberately before writing them. Identify the feature or branch under test, then write the smallest test that verifies it.
- Coverage gaps signal missing tests, never permission to exclude files. Write the test instead of suppressing the gap.
4 changes: 2 additions & 2 deletions .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
with:
submodules: recursive

- uses: dtolnay/rust-toolchain@stable
- uses: dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8 # stable

- uses: Swatinem/rust-cache@v2

Expand All @@ -34,7 +34,7 @@ jobs:
- name: install system dependencies
run: sudo apt-get update && sudo apt-get install -y cmake libclang-dev

- uses: dtolnay/rust-toolchain@stable
- uses: dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8 # stable

- uses: Swatinem/rust-cache@v2

Expand Down
59 changes: 2 additions & 57 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,61 +6,6 @@ Keep it simple, be opinionated, follow best practices. Avoid using configurable

Keep the code beautiful. Always optimize the code for a great developer experience.

Be proactive and fix preexisting issues if you encounter them.
Codebase needs to be architected in a way to make it easy for multiple team members to work in parallel on multiple modules, so the concerns always need clear separation.

Be uncompromising when it comes to the code quality and architecture. Any compromises, coverage gaps, or quality gaps are not acceptable.

Never make assumptions or guesses about code behavior; always investigate. Always make sure everything works.

## Coding Standards

- Do not inline import paths unless necessary. Prefer to use `use` statements in Rust files instead of inline paths to imported modules. The exception would be `error.rs` type modules that handle lib-level error structs.
- Keep at most a single public struct per Rust module.
- Keep at most a single public function per Rust module (multiple public struct methods are OK).
- Keep module names elegant and clearly readable. The name of the module, or any file, should be enough to determine its contents unambiguously.
- Keep modules structure as flat as possible, avoid logical grouping of modules, instead keep the naming consistent.
- Keep standalone, private functions and structs above the public struct or function that is exported.
- Group the modules by name prefix. For example, `client_foo`, `client_bar`, etc., wherever it makes sense to do so.
- Decide to group the modules based on software architecture, messaging hierarchy, or inheritance. Do not group modules just for the sake of it.
- Maintain a tree-like structure of modules, avoid circular dependencies at all costs. Extract common functions or structs into separate modules, or separate subprojects in the workspace.
- Name files the same way as the struct or function they contain.
- Be explicit, do not use general import statements that involve "*", prefer to import everything explicitly.
- Do not use copy-pasted or copied code in any capacity. If you have issues extracting something into a module, discuss the steps first.
- Keeping slightly different message types, or other kinds of structs that are only slightly different, because of the context they are used in, is fine.
- Each function or method should do just a single thing. The single responsibility principle is really important.
- Always use explicit lifetime variable names (do not use `'a` and such, use descriptive names like `'message` or similar)
- Always use explicit generic parameter names (never use single letter names like `T` for generics, prefix all of them with `T`, however). For example, use `TMessage` instead of `T`, etc.
- Always use descriptive and explicit variable names, even in anonymous functions. Never use single-letter variable names.
- Instead of writing comments that explain what the code does, make the code self-documenting.
- Do not use `pub(crate)` in Rust; in case of doubt, just make things public.
- Add an empty line before return statements that end the function or a method.
- Add an empty line between loops and preceding statements from the same scope.
- Handle all the errors; never ignore them. Make sure the application does not panic.
- In Rust, never ignore errors with `Err(_)`; always make sure you are matching an expected error variant instead.
- Never use `.expect`, or `.unwrap`. In Rust, if a function can fail, use a matching Result (can be from the anyhow crate) instead. In case of doubt on this, ask. Allow `.expect` in mutex lock poison checks, unit tests, or when integrating CPP libraries into Rust, and there is no way to use Result instead.
- Use object-oriented style and composition. Avoid functions that take a struct as a parameter; move it to the struct implementation instead.
- Always make sure mutex locks are held for the shortest possible time.
- Always specify Rust dependencies in root Cargo.toml, then use workspace versions of packages in workspace members.
- Avoid unnecessary abstractions.
- Before using vendor crates or modules, make sure they are well-maintained, secure, and documented.
- Always make sure there is only one valid way to do a specific task in the codebase. Make sure everything has a single source of truth.
- In Rust, when implementing `new` method in a struct, prefer to use a struct with parameters list instead of multiple function arguments. It should be easier to maintain.
- Use only the most precise error variants to cover a Result error case. If nothing suitable is available, add a new error variant.

## Unit Tests and Quality Control

- Always check the project with Clippy.
- Always format the code with `cargo fmt`.
- Always check that the unit tests pass.
- Always test the code, make sure tests work after the changes.
- Always write tests that check the algorithms, or meaningful edge cases. Never write tests that check things that can be handled by types instead.
- If some piece of code can be handled by proper types, use types instead. Write tests as a last resort.
- In unit tests, make sure there is always just a single correct way to do a specific thing. Never accept fuzzy inputs from end users.
- When working on tests, if you notice that the tested code can be better, you can suggest changes.
- When running tests, always save output to a temporary file, so you won't need to re-run them to analyze it.

## Committing Changes

- Always keep the commit messages short, human readable, descriptive. Keep commit messages as one-liners.
- Do not add any metadata to commits.
- Describe what the changes actually do instead of listing the changed files.
Be proactive and fix any preexisting issues you encounter.
24 changes: 23 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 25 additions & 6 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ resolver = "2"
members = [
"llama-cpp-bindings-build",
"llama-cpp-bindings-sys",
"llama-cpp-bindings-types",
"llama-cpp-bindings",
"llama-cpp-bindings-tests",
]
Expand All @@ -11,9 +12,27 @@ members = [
edition = "2024"

[workspace.dependencies]
encoding_rs = "0.8.35"
llama-cpp-bindings = { path = "llama-cpp-bindings", version = "0.5.0" }
llama-cpp-bindings-build = { path = "llama-cpp-bindings-build", version = "0.5.0" }
llama-cpp-bindings-sys = { path = "llama-cpp-bindings-sys", version = "0.5.0" }
tracing = "0.1"

anyhow = "=1.0.102"
bindgen = "=0.72.1"
cc = { version = "=1.2.58", features = ["parallel"] }
cmake = "=0.1.58"
encoding_rs = "=0.8.35"
enumflags2 = "=0.7.12"
find_cuda_helper = "=0.2.0"
glob = "=0.3.3"
hf-hub = "=0.5.0"
llama-cpp-bindings = { path = "llama-cpp-bindings", version = "=0.5.0" }
llama-cpp-bindings-build = { path = "llama-cpp-bindings-build", version = "=0.5.0" }
llama-cpp-bindings-sys = { path = "llama-cpp-bindings-sys", version = "=0.5.0" }
llama-cpp-bindings-types = { path = "llama-cpp-bindings-types", version = "=0.5.0" }
llguidance = "=1.7.0"
nom = "=8.0.0"
serde = { version = "=1.0.228", features = ["derive"] }
serde_json = "=1.0.149"
serial_test = "=3.4.0"
thiserror = "=2.0.18"
toktrie = "=1.7.0"
tracing = "=0.1.44"
tracing-core = "=0.1.36"
tracing-subscriber = { version = "=0.3.23", features = ["json"] }
walkdir = "=2.5.0"
56 changes: 34 additions & 22 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
FEATURES = sampler,llguidance
FEATURES = sampler
TEST_FEATURES =
QWEN_CAPABLE_FEATURES = multimodal_capable,mrope_model
CARGO_TEST_LLM_FLAGS = --no-fail-fast -p llama-cpp-bindings-tests $(if $(TEST_FEATURES),--features $(TEST_FEATURES),) -- --test-threads=1
CARGO_COV_LLM_FLAGS = -p llama-cpp-bindings-tests $(if $(TEST_FEATURES),--features $(TEST_FEATURES),)
CARGO_TEST_LLM_FLAGS_QWEN_CAPABLE = --no-fail-fast -p llama-cpp-bindings-tests $(if $(TEST_FEATURES),--features $(TEST_FEATURES),) --features $(QWEN_CAPABLE_FEATURES) -- --test-threads=1

QWEN3_5_0_8B_ENV = \
LLAMA_TEST_HF_REPO=unsloth/Qwen3.5-0.8B-GGUF \
Expand All @@ -21,37 +22,48 @@ QWEN3_6_35B_A3B_ENV = \
LLAMA_TEST_HF_ENCODER_REPO=Xiaojian9992024/t5-small-GGUF \
LLAMA_TEST_HF_ENCODER_MODEL=t5-small.bf16.gguf

GLM4_7_FLASH_ENV = \
LLAMA_TEST_HF_REPO=unsloth/GLM-4.7-Flash-GGUF \
LLAMA_TEST_HF_MODEL=GLM-4.7-Flash-Q4_K_M.gguf \
LLAMA_TEST_HF_EMBED_REPO=Qwen/Qwen3-Embedding-0.6B-GGUF \
LLAMA_TEST_HF_EMBED_MODEL=Qwen3-Embedding-0.6B-Q8_0.gguf \
LLAMA_TEST_HF_ENCODER_REPO=Xiaojian9992024/t5-small-GGUF \
LLAMA_TEST_HF_ENCODER_MODEL=t5-small.bf16.gguf

DEEPSEEK_R1_DISTILL_LLAMA_8B_ENV = \
LLAMA_TEST_HF_REPO=unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF \
LLAMA_TEST_HF_MODEL=DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf \
LLAMA_TEST_HF_EMBED_REPO=Qwen/Qwen3-Embedding-0.6B-GGUF \
LLAMA_TEST_HF_EMBED_MODEL=Qwen3-Embedding-0.6B-Q8_0.gguf \
LLAMA_TEST_HF_ENCODER_REPO=Xiaojian9992024/t5-small-GGUF \
LLAMA_TEST_HF_ENCODER_MODEL=t5-small.bf16.gguf

.PHONY: test.unit
test.unit: clippy
cargo test -p llama-cpp-bindings --features $(FEATURES)

.PHONY: test.deepseek_r1_distill_llama_8b
test.deepseek_r1_distill_llama_8b: clippy
$(DEEPSEEK_R1_DISTILL_LLAMA_8B_ENV) cargo test $(CARGO_TEST_LLM_FLAGS)

.PHONY: test.glm4_7_flash
test.glm4_7_flash: clippy
$(GLM4_7_FLASH_ENV) cargo test $(CARGO_TEST_LLM_FLAGS)

.PHONY: test.qwen3.5_0.8B
test.qwen3.5_0.8B: clippy
$(QWEN3_5_0_8B_ENV) cargo test $(CARGO_TEST_LLM_FLAGS)
$(QWEN3_5_0_8B_ENV) cargo test $(CARGO_TEST_LLM_FLAGS_QWEN_CAPABLE)

.PHONY: test.qwen3.6_35b_a3b
test.qwen3.6_35b_a3b: clippy
$(QWEN3_6_35B_A3B_ENV) cargo test $(CARGO_TEST_LLM_FLAGS)

.PHONY: test.qwen3.5_0.8B.coverage.run
test.qwen3.5_0.8B.coverage.run: clippy
$(QWEN3_5_0_8B_ENV) cargo llvm-cov $(CARGO_COV_LLM_FLAGS) -- --test-threads=1

.PHONY: test.qwen3.5_0.8B.coverage

test.qwen3.5_0.8B.coverage: clippy
$(QWEN3_5_0_8B_ENV) cargo llvm-cov $(CARGO_COV_LLM_FLAGS) --fail-under-lines 99.5 -- --test-threads=1

.PHONY: test.qwen3.5_0.8B.coverage.json
test.qwen3.5_0.8B.coverage.json: test.qwen3.5_0.8B.coverage.run
cargo llvm-cov report -p llama-cpp-bindings --json --output-path target/coverage.json

.PHONY: test.qwen3.5_0.8B.coverage.html
test.qwen3.5_0.8B.coverage.html: test.qwen3.5_0.8B.coverage.run
cargo llvm-cov report -p llama-cpp-bindings --html
$(QWEN3_6_35B_A3B_ENV) cargo test $(CARGO_TEST_LLM_FLAGS_QWEN_CAPABLE)

.PHONY: test.llms
test.llms: test.qwen3.5_0.8B
test.llms: \
test.deepseek_r1_distill_llama_8b \
test.glm4_7_flash \
test.qwen3.5_0.8B \
test.qwen3.6_35b_a3b

.PHONY: test
test: test.unit test.llms
Expand Down
13 changes: 7 additions & 6 deletions llama-cpp-bindings-build/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@ license = "Apache-2.0"
repository = "https://github.com/intentee/llama-cpp-bindings"

[dependencies]
bindgen = "0.72.1"
cc = { version = "1.2.58", features = ["parallel"] }
cmake = "0.1"
find_cuda_helper = "0.2.0"
glob = "0.3.3"
walkdir = "2"
bindgen = { workspace = true }
cc = { workspace = true }
cmake = { workspace = true }
find_cuda_helper = { workspace = true }
glob = { workspace = true }
thiserror = { workspace = true }
walkdir = { workspace = true }

[features]
cuda = []
Expand Down
Loading
Loading