hf gguf converter by richiejp · Pull Request #1 · localai-org/privacy-filter.cpp

richiejp · 2026-06-15T19:32:23Z

docs: version-controlled model cards + HF publish script
convert: add self-contained HF->GGUF converter, run it in the parity job
fuzz: fail cleanly when PF_GGUF is set but the GGUF is missing
tests: require the model GGUF/fixtures once parity is requested (no silent skips)

Bring the HuggingFace model cards under version control and add a reproducible publish script, so the published cards stop drifting from this repo: - model-cards/privacy-filter-multilingual.md — refreshed from the live HF card. The "requires a patched llama.cpp" framing is replaced with a Runtimes section centered on privacy-filter.cpp (the patch-free reference engine, CPU/CUDA/Vulkan), with LocalAI (the privacy-filter backend) and llama.cpp (only with carry-patches, or upstream once PR #19725 lands) alongside. - model-cards/privacy-filter.md — NEW card for the base openai/privacy-filter model (8 categories / 33 labels). Converted to GGUF with the fork converter and validated with pf-cli: arch openai-privacy-filter, 33 labels, correct private_person / private_email spans. Same Runtimes section. - scripts/publish_hf.py — mirrors parakeet.cpp: dry-run by default, --upload to push the converted GGUF + the version-controlled card (-> README.md) to the matching LocalAI-io repo; prints the sha256 to pin in the LocalAI gallery. Assisted-by: Claude Code:claude-opus-4-8

The openai-privacy-filter GGUF was produced by an out-of-tree llama.cpp fork (the conversion/ package on the openai-privacy-filter-arch branch) and seeded into the nightly cache by hand, so the parity job depended on a manual step and the model-label tests skipped whenever the seed was missing. Add scripts/convert.py: a self-contained converter that reads config.json + model.safetensors + tokenizer.json and writes the GGUF directly via the gguf package -- no llama.cpp dependency. It encodes the load-bearing transforms the arch needs (expert gate_up concatenated-halves split, down_proj transpose), the per-tensor f16/f32 precision rule, the YaRN rope KVs, the 217-label TOKEN_CLS head, and the o200k tokenizer. Verified byte-identical to the published reference: all 156 shared tensors diff to 0.0 vs pf-rope2-f16.gguf and pf-f32.gguf, and test_parity passes on the converted files (the f16 production gate and the tight f32 exact-rotation gate). Wire it into the parity job: convert f16 + f32 from the cached safetensors on every run (outside the cached paths, so the result always reflects the current script), then ctest -L model gates them -- a converter regression now fails CI instead of silently shipping a wrong model. Drops the manual cache seed. Also link the pre-converted LocalAI-io GGUF repos and document the converter (README, model card, publish_hf.py). Assisted-by: Claude Code:claude-opus-4-8

The nightly fuzz-smoke step runs fuzz_tokenizer with PF_GGUF set to the model file. When that file was absent, gguf_init_from_file failed and the harness called abort() -- a core dump (exit 134) that turned the whole parity job red. Probe the PF_GGUF path with fopen. Setting PF_GGUF requests full-encode fuzzing, so a missing file is a hard error: print a clear message and exit(1) (clean, no core dump) instead of aborting -- and instead of silently degrading to pretokenize-only, which would drop the encode-path coverage without anyone noticing. PF_GGUF unset still runs pretokenize-only; a file that exists but won't load still abort()s (a real loader bug). CI generates the GGUF with scripts/convert.py, so in practice it is always present. Assisted-by: Claude Code:claude-opus-4-8

…ilent skips) test_parity and test_window_stitch skipped (exit 77) when the GGUF or fixtures were absent. That was a safety valve for the era when the GGUF was a manually-seeded cache artifact -- but it also means a parity job can go green having tested nothing, which is exactly how a missing model slips through. Now that CI regenerates every asset on each run (scripts/hf_dump.py + scripts/convert.py), make them hard requirements. The one legitimate skip stays: PF_GGUF_DIR / PF_FIXTURES unset means model testing wasn't requested (the fast tier runs -LE model; local "full suite, no assets"). Once they ARE set, a missing fixture, f16 GGUF, or f32 GGUF fails loudly (exit 1) instead of skipping the gate. Assisted-by: Claude Code:claude-opus-4-8

richiejp added 4 commits June 14, 2026 13:31

richiejp merged commit 63b9f45 into master Jun 16, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hf gguf converter#1

hf gguf converter#1
richiejp merged 4 commits into
masterfrom
hf-gguf-converter

richiejp commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

richiejp commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant