Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Keep the docker build context small: ship source + the vendored ggml submodule,
# not host build trees, model blobs, or demo media.
build/
**/build/
.git/
.cache/
demo/out/
demo/traces/
*.gguf
*.mp4
*.gif
*.webm
.venv/
__pycache__/
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,16 @@ hits the 16 GiB memory wall and OOMs at ~16k.

Full-quality MP4s: [CPU](demo/out/pii_duel_cpu_final.mp4) · [GPU](demo/out/pii_duel_gpu_final.mp4).

**Raspberry Pi 5 — on-device, real time.** The same engine, no GPU: 1,360 tokens
of mixed PII classified in 3.8 s (360 tok/s) on a Cortex-A76 @ 1.5 GHz with q8
weights. The right pane is the live NER feed — 107 spans across 22 categories,
each with its category and byte range (q8 output is span-for-span identical to
f16 here).

![Raspberry Pi 5 on-device PII scan: 1,360 tokens, 107 PII spans across 22 categories in 3.8 s](demo/out/pii_scan.gif)

Full-quality MP4: [Pi 5 scan](demo/out/pii_scan_final.mp4).

Single forward-pass latency and throughput vs stock HF Transformers (transformers
5.9, eager), Ryzen 9 7900 (12 threads) + RTX 5070 Ti, f16/fp16, matched token
counts ([scripts/bench_torch.py](scripts/bench_torch.py)). `tokens` is the input
Expand Down
75 changes: 75 additions & 0 deletions demo/gen_scan.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#!/usr/bin/env python3
"""Build the single-document NER-scan trace for pii_scan.py from REAL pf-cli output.

Unlike gen_corpus.py (which tiles a paragraph and replicates hard-coded spans),
this runs the actual engine on demo/scan_doc.txt:
* pf-cli --tok-batch -> exact token count (vocab only)
* pf-cli --classify -> real entity spans (entity_group/start/end/score/text)

so every span and category on screen is exactly what the model emits. Writes
demo/traces/scan/{content.json,engines.json}.

python3 gen_scan.py --cli build/release/pf-cli \
--model ~/ggufs_perf/pf-q8experts.gguf --ld build/release/ggml/src \
--tps 366 --device "Raspberry Pi 5 · CPU · q8 @ 1.5 GHz"
"""
import argparse, json, os, struct, subprocess
from pathlib import Path

HERE = Path(__file__).resolve().parent


def run_cli(cli, ld, args, stdin=None):
env = dict(os.environ, LD_LIBRARY_PATH=ld)
return subprocess.run([cli, *args], input=stdin, capture_output=True, env=env)


def token_count(cli, ld, model, doc_bytes):
inp, outp = "/tmp/scan_tb_in.bin", "/tmp/scan_tb_out.bin"
with open(inp, "wb") as f:
f.write(struct.pack("<I", 1) + struct.pack("<I", len(doc_bytes)) + doc_bytes)
run_cli(cli, ld, ["--tok-batch", model, inp, outp]).check_returncode()
return struct.unpack_from("<I", open(outp, "rb").read(), 0)[0]


def main():
ap = argparse.ArgumentParser()
ap.add_argument("--cli", required=True)
ap.add_argument("--model", required=True)
ap.add_argument("--ld", default="", help="LD_LIBRARY_PATH for the cli (shared ggml build)")
ap.add_argument("--doc", default=str(HERE / "scan_doc.txt"))
ap.add_argument("--scene", default=str(HERE / "traces/scan"))
ap.add_argument("--threshold", default="0.5")
ap.add_argument("--tps", type=float, default=0.0, help="measured engine throughput (tok/s)")
ap.add_argument("--label", default="privacy-filter.cpp")
ap.add_argument("--device", default="Raspberry Pi 5 · CPU")
ap.add_argument("--note", default="")
a = ap.parse_args()

doc_bytes = Path(a.doc).read_bytes()
n_tok = token_count(a.cli, a.ld, a.model, doc_bytes)
r = run_cli(a.cli, a.ld, ["--classify", a.model, a.threshold, "cpu"], stdin=doc_bytes)
ents_raw = json.loads(r.stdout)
ents = [{"type": e["entity_group"], "start": e["start"], "end": e["end"],
"text": e.get("text", ""), "score": e.get("score", 0.0)} for e in ents_raw]

d = Path(a.scene); d.mkdir(parents=True, exist_ok=True)
content = {"document": doc_bytes.decode("utf-8"), "n_tokens": n_tok,
"note": a.note, "entities": ents}
json.dump(content, open(d / "content.json", "w"), ensure_ascii=False)
json.dump([{"label": a.label, "device": a.device, "tps": a.tps}],
open(d / "engines.json", "w"), indent=2, ensure_ascii=False)

cats = {}
for e in ents:
cats[e["type"]] = cats.get(e["type"], 0) + 1
print(f"{len(doc_bytes):,} chars {n_tok:,} tokens {len(ents)} entities "
f"{len(cats)} categories")
if a.tps:
print(f"engine: {a.tps:g} tok/s -> {n_tok / a.tps:.2f}s run")
print("categories:", ", ".join(f"{k}:{v}" for k, v in
sorted(cats.items(), key=lambda x: -x[1])))


if __name__ == "__main__":
main()
55 changes: 55 additions & 0 deletions demo/make_scan.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env bash
# Render the single-document PII-scan TUI (pii_scan.py) for the "scan" scene,
# record it with recorder-for-agents, trim the lead-in, append a branding outro.
# The scene + real spans + real tok/s come from gen_scan.py (pf-cli output).
#
# ./make_scan.sh # real time
# DILATE=2 ./make_scan.sh # slowed 2x
#
# env: RECORDER, DILATE, WIDTH/HEIGHT/FPS/FONTSIZE, HOLD, CARD, LINK
set -euo pipefail
HERE=$(cd "$(dirname "$0")" && pwd)
RECORDER=${RECORDER:-/home/rich/python/recorder-for-agents}
SCENE=scan
OUT=${1:-pii_scan.mp4}
DILATE=${DILATE:-1}
W=${WIDTH:-1280}; H=${HEIGHT:-720}; FS=${FONTSIZE:-16}; FPS=${FPS:-30}
HOLD=${HOLD:-1.4}; CARD=${CARD:-3.5}
LINK=${LINK:-github.com/richiejp/privacy-filter.cpp}
SDIR="$HERE/traces/$SCENE"

[ -d "$SDIR" ] || { echo "no scene at $SDIR (run gen_scan.py)"; exit 1; }
[ -x "$RECORDER/record.sh" ] || { echo "recorder not found at $RECORDER"; exit 1; }

# capture length: start delay + scan (scaled) + hold + settle + card + buffer
DUR=$(python3 - "$SDIR" "$DILATE" "$HOLD" "$CARD" <<'PY'
import json, sys, math
from pathlib import Path
d = Path(sys.argv[1]); dil, hold, card = map(float, sys.argv[2:5])
c = json.load(open(d / "content.json")); e = json.load(open(d / "engines.json"))[0]
proc = c["n_tokens"] / e["tps"]
print(int(math.ceil(1.0 + proc * dil + hold + 0.7 + card + 1.0)))
PY
)
echo "[make-scan] ${W}x${H}@${FPS} fs=${FS} dilate=${DILATE} duration=${DUR}s -> out/$OUT"

WORK="$HERE" BG="#0d1117" FG="#d7dde5" FONTSIZE="$FS" DURATION="$DUR" \
WIDTH="$W" HEIGHT="$H" FPS="$FPS" START_DELAY=1.0 END_HOLD=0.2 \
"$RECORDER/record.sh" \
"python3 pii_scan.py --scene traces/$SCENE --dilate $DILATE --hold $HOLD --card $CARD --link '$LINK'" \
"$OUT"

RAW="$HERE/out/$OUT"; NOEXT="${OUT%.mp4}"
if [ -f "$RECORDER/examples/duel/trim_lead.sh" ]; then
bash "$RECORDER/examples/duel/trim_lead.sh" "$RAW" "$HERE/out/.trim_$SCENE.mp4" \
&& mv "$HERE/out/.trim_$SCENE.mp4" "$RAW"
fi
if [ -f "$RECORDER/examples/duel/outro.sh" ]; then
OW="$W" OH="$H" TITLE="privacy-filter.cpp" \
LINK1="github.com/richiejp/privacy-filter.cpp" \
LINK2="on-device NER · Raspberry Pi 5 · real PII spans" \
bash "$RECORDER/examples/duel/outro.sh" "$RAW" "$HERE/out/${NOEXT}_final.mp4"
echo "-> $HERE/out/${NOEXT}_final.mp4"
else
echo "-> $RAW"
fi
Binary file added demo/out/pii_scan.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added demo/out/pii_scan.mp4
Binary file not shown.
Binary file added demo/out/pii_scan_final.mp4
Binary file not shown.
Loading
Loading