Skip to content

Roadmap

Baurzhan Atinov edited this page May 14, 2026 · 1 revision

Roadmap

What's coming next, in roughly the order it'll land.

Shipping next

nn2 ARM / Apple Silicon (PR #3 — @navado)

In review now. Adds first-class build paths for:

  • Apple Silicon (M1–M4) — NEON + Accelerate (AMX) + SME on M4+
  • Apple Neural Engine via Core ML bridge
  • AArch64 NEON (Linux/Android, Raspberry Pi)
  • Cross-platform tools/bench.c for unified benchmarking
  • AVFoundation live-camera bench on macOS (make bench-camera)

Status: M2 path is 51/51 tests passing. AArch64 NEON foundation done. iMX 8/93/95 NPU lib (Ethos-U65 / VxDelegate / XNNPACK) and ESP32-P4 ESP-IDF component are draft until physical hardware testing.

Active liveness in the browser

The demo already has a basic "turn your head / blink twice" challenge. Next:

  • Multi-step challenges ("turn left, then up, then blink")
  • Randomized sequence per session to defeat video-replay
  • Server-attested challenge tokens (HMAC, time-bounded)
  • iBeta-style attack dataset for evaluation

Per-customer encrypted weight bundles

Currently the demo ships a single AES-256 key obfuscated in JS. The production-ready path:

  • Per-customer key wrapping (KEYWRAP under a master)
  • Domain-binding via HMAC
  • Time-bounded keys with rotation endpoint
  • Audit log on each decrypt attempt

See Encrypted Weights for the API shape.


Bigger items on the horizon

ESP32-P4 + iMX 95 hardware port

Pending: real hardware in hand. Once we have boards:

  • INT8 quantize the 4 recognition variants for the NPU
  • Wire MiniFASNet ensemble into the i.MX SDK
  • Stream MIPI-CSI directly into the engine, no RAM round-trip

Expected: full pipeline (decode → detect → recognize) on a single $10 ESP32-P4 chip.

Smile + age + emotion + glasses heads

The 187 KB smile classifier is a template. Adding more tiny binary heads is straightforward:

  • age — bucketed 0/18/30/45/60+
  • emotion — happy/sad/angry/surprised/neutral (5 classes via softmax)
  • glasses — yes/no
  • mask — yes/no
  • eyewear — sunglasses/clear/none

Each ~50–200 K params, ~200 KB ONNX, ~10 min training on scraped data.

rPPG pulse — phase 2

The current 5-second forehead DFT gets ~80% accuracy on a still face. Improvements queued:

  • Adaptive ROI tracking (forehead moves as the user does)
  • Bandpass via Butterworth IIR instead of narrow-band scan
  • HRV (heart-rate variability) metrics over 30-second window
  • Pulse waveform visualization (sparkline in HUD)

WebGPU backend

onnxruntime-web has a WebGPU EP. On supported devices (Chrome 121+ desktop, Safari 18) inference drops by another 3–5×. We just need to:

  • Add executionProviders: ['webgpu', 'wasm'] fallback chain
  • Verify all ops have WebGPU implementations
  • Bench across recent browsers

nn2 → WASM port

Compile the C engine to WASM with SIMD128 paths instead of AVX-512. Skip the AArch64 NEON kernel rewrites; emit a portable WASM SIMD microkernel directly.

Expected: 1.3–2× faster than onnxruntime-web in browsers that don't have WebGPU.

Multi-face persistent tracking

Several people in the frame, each with their own colored bbox + ID + cumulative match score. Useful for kiosk / surveillance demos.


Long-term

  • Bitstream-domain detection — pull motion-vector and DCT residual info from H.264 / NXV without full decode; route ML inference to changed regions only. 10× speedup at the pipeline level (some of the prior research is in nn2/README.md)
  • iBeta-certified passive PAD — collect real attack samples, train a competing model, certify
  • Event-camera support — Prophesee GENX320 / Sony IMX636 native pipeline. Bypasses the frame-based ML problem entirely

How to influence the roadmap