hf gguf converter#1
Merged
Merged
Conversation
richiejp
commented
Jun 15, 2026
Contributor
- docs: version-controlled model cards + HF publish script
- convert: add self-contained HF->GGUF converter, run it in the parity job
- fuzz: fail cleanly when PF_GGUF is set but the GGUF is missing
- tests: require the model GGUF/fixtures once parity is requested (no silent skips)
Bring the HuggingFace model cards under version control and add a reproducible publish script, so the published cards stop drifting from this repo: - model-cards/privacy-filter-multilingual.md — refreshed from the live HF card. The "requires a patched llama.cpp" framing is replaced with a Runtimes section centered on privacy-filter.cpp (the patch-free reference engine, CPU/CUDA/Vulkan), with LocalAI (the privacy-filter backend) and llama.cpp (only with carry-patches, or upstream once PR #19725 lands) alongside. - model-cards/privacy-filter.md — NEW card for the base openai/privacy-filter model (8 categories / 33 labels). Converted to GGUF with the fork converter and validated with pf-cli: arch openai-privacy-filter, 33 labels, correct private_person / private_email spans. Same Runtimes section. - scripts/publish_hf.py — mirrors parakeet.cpp: dry-run by default, --upload to push the converted GGUF + the version-controlled card (-> README.md) to the matching LocalAI-io repo; prints the sha256 to pin in the LocalAI gallery. Assisted-by: Claude Code:claude-opus-4-8
The openai-privacy-filter GGUF was produced by an out-of-tree llama.cpp fork (the conversion/ package on the openai-privacy-filter-arch branch) and seeded into the nightly cache by hand, so the parity job depended on a manual step and the model-label tests skipped whenever the seed was missing. Add scripts/convert.py: a self-contained converter that reads config.json + model.safetensors + tokenizer.json and writes the GGUF directly via the gguf package -- no llama.cpp dependency. It encodes the load-bearing transforms the arch needs (expert gate_up concatenated-halves split, down_proj transpose), the per-tensor f16/f32 precision rule, the YaRN rope KVs, the 217-label TOKEN_CLS head, and the o200k tokenizer. Verified byte-identical to the published reference: all 156 shared tensors diff to 0.0 vs pf-rope2-f16.gguf and pf-f32.gguf, and test_parity passes on the converted files (the f16 production gate and the tight f32 exact-rotation gate). Wire it into the parity job: convert f16 + f32 from the cached safetensors on every run (outside the cached paths, so the result always reflects the current script), then ctest -L model gates them -- a converter regression now fails CI instead of silently shipping a wrong model. Drops the manual cache seed. Also link the pre-converted LocalAI-io GGUF repos and document the converter (README, model card, publish_hf.py). Assisted-by: Claude Code:claude-opus-4-8
The nightly fuzz-smoke step runs fuzz_tokenizer with PF_GGUF set to the model file. When that file was absent, gguf_init_from_file failed and the harness called abort() -- a core dump (exit 134) that turned the whole parity job red. Probe the PF_GGUF path with fopen. Setting PF_GGUF requests full-encode fuzzing, so a missing file is a hard error: print a clear message and exit(1) (clean, no core dump) instead of aborting -- and instead of silently degrading to pretokenize-only, which would drop the encode-path coverage without anyone noticing. PF_GGUF unset still runs pretokenize-only; a file that exists but won't load still abort()s (a real loader bug). CI generates the GGUF with scripts/convert.py, so in practice it is always present. Assisted-by: Claude Code:claude-opus-4-8
…ilent skips) test_parity and test_window_stitch skipped (exit 77) when the GGUF or fixtures were absent. That was a safety valve for the era when the GGUF was a manually-seeded cache artifact -- but it also means a parity job can go green having tested nothing, which is exactly how a missing model slips through. Now that CI regenerates every asset on each run (scripts/hf_dump.py + scripts/convert.py), make them hard requirements. The one legitimate skip stays: PF_GGUF_DIR / PF_FIXTURES unset means model testing wasn't requested (the fast tier runs -LE model; local "full suite, no assets"). Once they ARE set, a missing fixture, f16 GGUF, or f32 GGUF fails loudly (exit 1) instead of skipping the gate. Assisted-by: Claude Code:claude-opus-4-8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.