-
Notifications
You must be signed in to change notification settings - Fork 22
Roadmap
What's coming next, in roughly the order it'll land.
In review now. Adds first-class build paths for:
- Apple Silicon (M1–M4) — NEON + Accelerate (AMX) + SME on M4+
- Apple Neural Engine via Core ML bridge
- AArch64 NEON (Linux/Android, Raspberry Pi)
- Cross-platform
tools/bench.cfor unified benchmarking - AVFoundation live-camera bench on macOS (
make bench-camera)
Status: M2 path is 51/51 tests passing. AArch64 NEON foundation done. iMX 8/93/95 NPU lib (Ethos-U65 / VxDelegate / XNNPACK) and ESP32-P4 ESP-IDF component are draft until physical hardware testing.
The demo already has a basic "turn your head / blink twice" challenge. Next:
- Multi-step challenges ("turn left, then up, then blink")
- Randomized sequence per session to defeat video-replay
- Server-attested challenge tokens (HMAC, time-bounded)
- iBeta-style attack dataset for evaluation
Currently the demo ships a single AES-256 key obfuscated in JS. The production-ready path:
- Per-customer key wrapping (KEYWRAP under a master)
- Domain-binding via HMAC
- Time-bounded keys with rotation endpoint
- Audit log on each decrypt attempt
See Encrypted Weights for the API shape.
Pending: real hardware in hand. Once we have boards:
- INT8 quantize the 4 recognition variants for the NPU
- Wire MiniFASNet ensemble into the i.MX SDK
- Stream MIPI-CSI directly into the engine, no RAM round-trip
Expected: full pipeline (decode → detect → recognize) on a single $10 ESP32-P4 chip.
The 187 KB smile classifier is a template. Adding more tiny binary heads is straightforward:
- age — bucketed 0/18/30/45/60+
- emotion — happy/sad/angry/surprised/neutral (5 classes via softmax)
- glasses — yes/no
- mask — yes/no
- eyewear — sunglasses/clear/none
Each ~50–200 K params, ~200 KB ONNX, ~10 min training on scraped data.
The current 5-second forehead DFT gets ~80% accuracy on a still face. Improvements queued:
- Adaptive ROI tracking (forehead moves as the user does)
- Bandpass via Butterworth IIR instead of narrow-band scan
- HRV (heart-rate variability) metrics over 30-second window
- Pulse waveform visualization (sparkline in HUD)
onnxruntime-web has a WebGPU EP. On supported devices (Chrome 121+
desktop, Safari 18) inference drops by another 3–5×. We just need to:
- Add
executionProviders: ['webgpu', 'wasm']fallback chain - Verify all ops have WebGPU implementations
- Bench across recent browsers
Compile the C engine to WASM with SIMD128 paths instead of AVX-512. Skip the AArch64 NEON kernel rewrites; emit a portable WASM SIMD microkernel directly.
Expected: 1.3–2× faster than onnxruntime-web in browsers that
don't have WebGPU.
Several people in the frame, each with their own colored bbox + ID + cumulative match score. Useful for kiosk / surveillance demos.
- Bitstream-domain detection — pull motion-vector and DCT residual info from H.264 / NXV without full decode; route ML inference to changed regions only. 10× speedup at the pipeline level (some of the prior research is in nn2/README.md)
- iBeta-certified passive PAD — collect real attack samples, train a competing model, certify
- Event-camera support — Prophesee GENX320 / Sony IMX636 native pipeline. Bypasses the frame-based ML problem entirely
- Open an issue describing your use case
- Send a PR — small + focused is best (see @navado's #3 for tone)
- Email bauratynov@gmail.com for paid custom work / priority items