Skip to content

hf gguf converter#1

Merged
richiejp merged 4 commits into
masterfrom
hf-gguf-converter
Jun 16, 2026
Merged

hf gguf converter#1
richiejp merged 4 commits into
masterfrom
hf-gguf-converter

Conversation

@richiejp

Copy link
Copy Markdown
Contributor
  • docs: version-controlled model cards + HF publish script
  • convert: add self-contained HF->GGUF converter, run it in the parity job
  • fuzz: fail cleanly when PF_GGUF is set but the GGUF is missing
  • tests: require the model GGUF/fixtures once parity is requested (no silent skips)

richiejp added 4 commits June 14, 2026 13:31
Bring the HuggingFace model cards under version control and add a reproducible
publish script, so the published cards stop drifting from this repo:

- model-cards/privacy-filter-multilingual.md — refreshed from the live HF card.
  The "requires a patched llama.cpp" framing is replaced with a Runtimes
  section centered on privacy-filter.cpp (the patch-free reference engine,
  CPU/CUDA/Vulkan), with LocalAI (the privacy-filter backend) and llama.cpp
  (only with carry-patches, or upstream once PR #19725 lands) alongside.
- model-cards/privacy-filter.md — NEW card for the base openai/privacy-filter
  model (8 categories / 33 labels). Converted to GGUF with the fork converter
  and validated with pf-cli: arch openai-privacy-filter, 33 labels, correct
  private_person / private_email spans. Same Runtimes section.
- scripts/publish_hf.py — mirrors parakeet.cpp: dry-run by default, --upload to
  push the converted GGUF + the version-controlled card (-> README.md) to the
  matching LocalAI-io repo; prints the sha256 to pin in the LocalAI gallery.

Assisted-by: Claude Code:claude-opus-4-8
The openai-privacy-filter GGUF was produced by an out-of-tree llama.cpp fork
(the conversion/ package on the openai-privacy-filter-arch branch) and seeded
into the nightly cache by hand, so the parity job depended on a manual step and
the model-label tests skipped whenever the seed was missing.

Add scripts/convert.py: a self-contained converter that reads config.json +
model.safetensors + tokenizer.json and writes the GGUF directly via the gguf
package -- no llama.cpp dependency. It encodes the load-bearing transforms the
arch needs (expert gate_up concatenated-halves split, down_proj transpose), the
per-tensor f16/f32 precision rule, the YaRN rope KVs, the 217-label TOKEN_CLS
head, and the o200k tokenizer. Verified byte-identical to the published
reference: all 156 shared tensors diff to 0.0 vs pf-rope2-f16.gguf and
pf-f32.gguf, and test_parity passes on the converted files (the f16 production
gate and the tight f32 exact-rotation gate).

Wire it into the parity job: convert f16 + f32 from the cached safetensors on
every run (outside the cached paths, so the result always reflects the current
script), then ctest -L model gates them -- a converter regression now fails CI
instead of silently shipping a wrong model. Drops the manual cache seed.

Also link the pre-converted LocalAI-io GGUF repos and document the converter
(README, model card, publish_hf.py).

Assisted-by: Claude Code:claude-opus-4-8
The nightly fuzz-smoke step runs fuzz_tokenizer with PF_GGUF set to the model
file. When that file was absent, gguf_init_from_file failed and the harness
called abort() -- a core dump (exit 134) that turned the whole parity job red.

Probe the PF_GGUF path with fopen. Setting PF_GGUF requests full-encode
fuzzing, so a missing file is a hard error: print a clear message and exit(1)
(clean, no core dump) instead of aborting -- and instead of silently degrading
to pretokenize-only, which would drop the encode-path coverage without anyone
noticing. PF_GGUF unset still runs pretokenize-only; a file that exists but
won't load still abort()s (a real loader bug). CI generates the GGUF with
scripts/convert.py, so in practice it is always present.

Assisted-by: Claude Code:claude-opus-4-8
…ilent skips)

test_parity and test_window_stitch skipped (exit 77) when the GGUF or fixtures
were absent. That was a safety valve for the era when the GGUF was a
manually-seeded cache artifact -- but it also means a parity job can go green
having tested nothing, which is exactly how a missing model slips through.

Now that CI regenerates every asset on each run (scripts/hf_dump.py +
scripts/convert.py), make them hard requirements. The one legitimate skip
stays: PF_GGUF_DIR / PF_FIXTURES unset means model testing wasn't requested
(the fast tier runs -LE model; local "full suite, no assets"). Once they ARE
set, a missing fixture, f16 GGUF, or f32 GGUF fails loudly (exit 1) instead of
skipping the gate.

Assisted-by: Claude Code:claude-opus-4-8
@richiejp richiejp merged commit 63b9f45 into master Jun 16, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant