Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
d3004e1
Surface token usage and classify reasoning/tool-call/content deltas
mcharytoniuk May 5, 2026
357df27
Diagnostic test for sampled-token classifier marker detection
mcharytoniuk May 5, 2026
9f0cba9
Parse tool-call JSON between structural braces
mcharytoniuk May 5, 2026
32a8e25
Streaming finish_reason reflects whether the turn produced a tool call
mcharytoniuk May 5, 2026
5ee161d
Cover streaming finish_reason routing
mcharytoniuk May 5, 2026
0d6afed
Stable tool_call id across streaming chunks
mcharytoniuk May 5, 2026
8173a47
Address clippy lints from the workspace pedantic profile
mcharytoniuk May 5, 2026
0b3c800
Python client: surface the new GeneratedTokenResult variants
mcharytoniuk May 5, 2026
b52be86
Factor tool-call handling into a shared, single-responsibility pipeline
mcharytoniuk May 5, 2026
33e0f4c
Consume shared bindings-types crate; drop value-object duplication
mcharytoniuk May 5, 2026
b180783
Decompose scheduler iteration into named pipeline phases
mcharytoniuk May 5, 2026
5817399
Drop impl on ClusterHandle SIGTERMs subprocess children to prevent or…
mcharytoniuk May 5, 2026
cc7e26a
make tool-call parsing explicit via parse_tool_calls flag and isolate…
mcharytoniuk May 5, 2026
b000890
introduce @intentee/paddler-client npm workspace and migrate admin pa…
mcharytoniuk May 5, 2026
ca5ba9e
Surface tool-call schema invalidity as soft event and infrastructure …
mcharytoniuk May 6, 2026
1cf5e37
Drop llama-cpp-bindings type re-exports; add per-model thinking and t…
mcharytoniuk May 6, 2026
b492791
Wrapper-layer tool-call parsers for Gemma 4, Mistral 3, and Qwen 3.5/…
mcharytoniuk May 6, 2026
68a35db
Decompose tool_call_pass into module fn; add table-driven unit tests …
mcharytoniuk May 7, 2026
061a656
Inline scheduler phase wrappers, surface fire-and-forget Result disca…
mcharytoniuk May 7, 2026
33971c0
Move tool-call template-override parsers and orchestrator to bindings…
mcharytoniuk May 7, 2026
c646c7d
Fix agent status snapshot dropping field updates after first change a…
mcharytoniuk May 8, 2026
d579ba7
Split chat_template_renderer into module and harden pyjinja_tojson wi…
mcharytoniuk May 8, 2026
46cf957
Add GLM-4.7-Flash and DeepSeek-R1-Distill-Llama-8B integration test c…
mcharytoniuk May 8, 2026
e2871da
Add per-model integration tests covering image-attached requests with…
mcharytoniuk May 8, 2026
2a704c1
add paddler client CLI application
malzag May 8, 2026
14464a3
Fix template-swap drain race with RAII slot guard; consolidate model …
malzag May 9, 2026
b111348
highlight tokens with different colors, reorganize paddler_client_cli
malzag May 9, 2026
72077ae
attribute every response to its producing agent via envelope.generate…
mcharytoniuk May 9, 2026
7ff51d8
Surface UnrecognizedToolCallFormat through wire types and clients; ad…
mcharytoniuk May 10, 2026
019b0dd
Install shutdown signal handlers synchronously before bootstrap so SI…
mcharytoniuk May 11, 2026
7f0ccc3
Surface oversized image inputs as a typed wire variant instead of cra…
malzag May 11, 2026
45aea05
Distribute embedding batches evenly across agents up to a configurabl…
malzag May 12, 2026
57ecf9f
Reject zero agent count and zero per-chunk cap in chunk_evenly_with_c…
malzag May 12, 2026
647c2ae
Resolve preexisting clippy warnings: box large arbiter variant, annot…
malzag May 12, 2026
89c4dc6
Migrate to LlamaContext::from_model after bindings cycle break
mcharytoniuk May 12, 2026
100bc00
update paddler_client_python gitignore
malzag May 12, 2026
c8fb5d6
Scope Claude rules to source-file paths and add python-on-nixos guidance
mcharytoniuk May 12, 2026
429a24a
Bump Python client ruff to 0.15.12 and refactor GeneratedTokenResult …
mcharytoniuk May 12, 2026
d32be7c
Merge remote-tracking branch 'origin/token-usage-and-thinking-classif…
mcharytoniuk May 12, 2026
7a33340
Use nanoid for inference socket requestId
malzag May 12, 2026
ee55691
update claude rules
mcharytoniuk May 12, 2026
dcffc52
add SKILL to run the tests
mcharytoniuk May 12, 2026
d8e0a43
update claude rules
mcharytoniuk May 12, 2026
74f9001
switch ot llama.cpp from crates.io
mcharytoniuk May 12, 2026
c38e57b
Share cargo target dir between integration build and tests so CUDA ke…
mcharytoniuk May 13, 2026
a1fb73b
Detect agent disappearance in AgentsStreamWatcher and fail fast inste…
mcharytoniuk May 13, 2026
4d49f78
update claude rules
mcharytoniuk May 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
11 changes: 11 additions & 0 deletions .claude/rules/github-workflows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
paths:
- ".github/**/*"
---

# GitHub Workflows Standards

- Always use Makefile targets in the workflow to avoid code duplication.
- Never add the tests that use LLMs to GitHub workflows, because the default GitHub worker does not have the capacity to run them.
- Only add unit tests to GitHub workflows.
- Keep GitHub workflows responsible for only a single concern. For example, run linter, and tests in parallel.
13 changes: 13 additions & 0 deletions .claude/rules/makefile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
paths:
- "**/Makefile"
---

# Makefile Standards

- Keep variables at the top of the file. Always.
- Prefer real targets over phony targets. If something can be express as a real target, do that.
- If you see that a phony target can be expressed as a real target, you can suggest a fix.
- Keep real targets, phony targets grouped together. Keep targets alphabetically sorted within each group.
- Keep all the real targets above phony targets.
- Make sure each Makefile target has enough dependencies to be able to run from a clean state.
19 changes: 19 additions & 0 deletions .claude/rules/python-on-nixos.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
paths:
- "paddler_client_python/**/*"
---

# Running Python tooling on NixOS

To run any Python tool that may have ELF / dynamic-linker issues on NixOS — `ruff`, `mypy`, `pyright`, `pytest`, anything installed from a pip wheel with native
bits — first enter `paddler_client_python/shell.nix`, then drive everything through `poetry` from inside that shell.

**Why:**
pip wheels like `ruff` ship a generic-linux binary, which NixOS does not provide.
Running them directly fails with `Could not start dynamically linked executable: ... NixOS cannot run dynamically linked executables intended for generic linux environments`.
`shell.nix` provides the Nix-built loader / replacement tools that make those binaries (or their Nix equivalents) actually launch.
`poetry` is just the dispatcher you use *inside* that prepared shell — never the entry point on its own.

**How to apply:**
- Never invoke `ruff`, `poetry run ...`, `python`, `pytest`, etc. from outside `nix-shell`. If a command starts with one of those, it must be inside `nix-shell --run "..."`.
- If `paddler_client_python/shell.nix` is missing, stop and ask. Adding a `shell.nix` is the fix; running tooling unwrapped is not.
13 changes: 13 additions & 0 deletions .claude/rules/rust-integration-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
paths:
- "**/tests/**/*.rs"
---

# Rust Integration Tests Standards

- Each test needs to be named after what functionality, or issue it actually tests.
- Each test file needs to be named after what functionality, or issue it actually tests.
- Each test represents a specific scenario that the core project needs to support, or represent an uncovered issue.
- If you uncover a new issue while testing, create yet another targeted test that covers that.
- Every test muse use production code. Never recreate the original code to test something conceptually. Always use production code.
- They must be single-purpose.
17 changes: 17 additions & 0 deletions .claude/rules/rust.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,20 @@ paths:
- In Rust, when implementing a `new` method in a struct, prefer to use a struct with a parameter list instead of multiple function arguments. It should be easier to maintain.
- Always check the project with Clippy.
- Always format the code with `cargo fmt`.
- Each file must contain at most a single struct, or single enum. For readability split those into multiple modules. You can still keep multiple private function helpers.
- Never use Result<> as a function argument.
- Never forward Result in enums if you can instead create a targeted error enum. It is always better to signal the specific issue, so it can be handled downstream.
- Always destructure structs in arguments if possible.

# Code Style

Imports/uses must not be mixed with other kinds of rust syntax.

Each file needs to follow this order:
1. `pub mod`/`mod` exports
2. vendor crate `use`
2. project crate `use`
3. local crate `use`
4. private function helpers
5. private struct helpers
6. single public export
6 changes: 6 additions & 0 deletions .claude/rules/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,9 @@
- If some piece of code can be handled by proper types, use types instead. Write tests as a last resort.
- In unit tests, make sure there is always just a single correct way to do a specific thing. Never accept fuzzy inputs from end users.
- When working on tests, if you notice that the tested code can be better, you can suggest changes.
- Maintain 100% test coverage across the codebase. No file, branch, or line may be excluded from coverage reports.
- Reach 100% coverage with the minimum number of tests. Each test must cover a unique code path, behavior, or edge case that no other test already covers.
- If two tests cover overlapping paths, remove the weaker one. Redundant tests waste maintenance effort without improving correctness signal.
- Tests must exercise actual functionality and observable behavior. Never write a test purely to hit lines for the sake of coverage.
- Design tests deliberately before writing them. Identify the feature or branch under test, then write the smallest test that verifies it.
- Coverage gaps signal missing tests, never permission to exclude files. Write the test instead of suppressing the gap.
56 changes: 56 additions & 0 deletions .claude/skills/running-all-tests/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
name: running-all-tests
description: Runs every test suite in the paddler workspace on the fastest available device. Use when the user asks to run the tests, run all the tests, run the full test suite, or check that everything still passes.
---

# Running all tests

Run every test suite in the workspace, picking the fastest compiled device backend for the host.

## Step 1: detect the device

Run this once at the start and echo the chosen device:

```bash
if [[ "$OSTYPE" == "darwin"* ]]; then
DEVICE=metal
elif command -v nvidia-smi >/dev/null 2>&1 && nvidia-smi >/dev/null 2>&1; then
DEVICE=cuda
else
DEVICE=cpu
fi
echo "Device: $DEVICE"
```

`$DEVICE` selects the Rust integration suite variant in Step 2. The other four suites don't take a device feature.

## Step 2: run the five suites

Copy this checklist and tick each item as the suite completes:

```
- [ ] JS client
- [ ] Python client
- [ ] Rust unit
- [ ] Rust integration
- [ ] paddler_gui
```

| # | Suite | Inner command | Working dir |
|---|------------------|-----------------------------------------------------------------------------------------------------------------------------------|--------------------------|
| 1 | JS client | `make test.client.js` | repo root |
| 2 | Python client | NixOS: `poetry run pytest`, `ruff`, `poetry run mypy"`. Every other OS: `poetry run pytest`, `poetry run ruff`, `poetry run mypy` | `paddler_client_python/` |
| 3 | Rust unit | `make test.unit` | repo root |
| 4 | Rust integration | `make test.integration` (cpu) / `make test.integration.cuda` / `make test.integration.metal` — pick by `$DEVICE` | repo root |
| 5 | paddler_gui | `cargo test -p paddler_gui --features web_admin_panel` | `paddler_gui/` |

Run them in this order. Cheap suites (1, 3, 4) surface bugs quickly; the heavy GPU-bound suites (2, 5) come last.

## Step 3: rules during the run

- **Serialize GPU suites.** When `$DEVICE` is `cuda` or `metal`, run test suites sequentially to avoid device contention.
- **Per-test 30 s budget.** Flag any individual test that exceeds 30 s wall-clock. That is a real bug — production or test — not flakiness.

## Step 4: report

After all suites finish, sum up the results in an actionable report.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
.DS_Store
/*.db
/.tsimp
/esbuild-meta.json
/godepgraph.png
/models
Expand Down
Loading
Loading