A native Rust implementation of the SLIDE paper family (Sub-LInear Deep learning Engine), with extensions for LLM inference, transformer sparsity prediction, and private two-party computation.
AGPL-3.0 with additional terms. See LICENSE for details. Commercial use is restricted to the Quilibrium mainnet. Automated reproduction (including LLM-assisted "clean room" reimplementation) for commercial substitutes is expressly prohibited.
Klearu is organized as a Cargo workspace with 11 crates:
| Crate | Description |
|---|---|
| klearu-core | Foundation: LSH hash families, sparse tensors, SLIDE network training |
| klearu-accel | SIMD vectorization (AVX2/NEON/scalar), BF16 quantization, cache-aligned memory |
| klearu-mongoose | Learnable hash functions, adaptive rebuild scheduling with drift detection |
| klearu-bolt | LSH hyperparameter autotuning, sparse inference optimizations |
| klearu-dejavu | Deja Vu transformer sparsity prediction (attention heads + MLP neurons) |
| klearu-vision | Vision transformers: DaViT, ViT, Swin, ConvNeXt, Hiera, EVA-02, Qwen Vision, SigLIP, DINOv2 |
| klearu-llm | LLaMA-compatible LLM inference with optional sparsity and VLM support |
| klearu | Facade crate with feature-gated re-exports |
| klearu-dpf | Distributed Point Functions (AES-based BGI construction) and DCF |
| klearu-mpc | 2PC building blocks: Q16.16/Q32.32 fixed-point, Beaver triples, additive sharing |
| klearu-private | Private LLM inference via 2PC with Ferret OT and Ristretto255 OPRF |
The core crates build standalone. The klearu-private crate depends on the ferret crate from the Quilibrium monorepo via a relative path. To build the full workspace including private inference, clone both repositories as siblings:
your-workspace/
klearu/ # this repository
monorepo/ # git clone https://github.com/quilibriumnetwork/monorepo
Then build from inside klearu/:
# Full workspace (requires monorepo sibling for klearu-private)
cargo build --release
# With specific features via the facade crate
cargo build --release -p klearu --features full
# LLM inference only (no monorepo needed)
cargo build --release -p klearu-llm
# LLM with sparse inference (no monorepo needed)
cargo build --release -p klearu-llm --features sparse
# Vision models (no monorepo needed)
cargo build --release -p klearu-vision
# Vision with sparse inference
cargo build --release -p klearu-vision --features sparse
# Vision CLI (image classification)
cargo build --release -p klearu-vision --features cli
# Private inference (requires monorepo sibling)
cargo build --release -p klearu-privatecargo test --workspaceThe foundation crate provides LSH-based sub-linear training and inference.
Hash families (HashFamily trait): SimHash, WtaHash, DwtaHash, MinHash, SparseRandomProjection
LSH index (LshIndexTrait): query(), query_union(), query_with_counts() — with FIFO or reservoir-sampled buckets
Network: Full SLIDE training loop with configurable layers, optimizers, and sampling strategies.
use klearu_core::config::*;
use klearu_core::network::Network;
let config = SlideConfig {
network: NetworkConfig {
layers: vec![
LayerConfig::hidden(784, 1024),
LayerConfig::output(1024, 10),
],
optimizer: OptimizerType::Adam,
learning_rate: 0.001,
batch_size: 128,
num_threads: 4,
},
seed: 42,
hogwild: true,
};
let mut network = Network::new(config);| Parameter | Default | Description |
|---|---|---|
num_tables (L) |
50 | Number of LSH hash tables |
num_hashes (K) |
6 | Hash bits per table |
bucket_capacity |
128 | Max neurons per bucket |
bucket_type |
FIFO | FIFO or Reservoir sampling |
hash_function |
SimHash | SimHash, WtaHash, DwtaHash, MinHash, SRP |
rebuild_interval_base |
100 | Steps between LSH rebuilds |
rebuild_decay |
0.1 | Exponential decay for rebuild interval |
optimizer |
Adam | Adam or SGD |
activation |
ReLU | ReLU, Sigmoid, Tanh, Softmax |
sampling |
Vanilla | Vanilla, TopK, Threshold |
hogwild |
false | Lock-free parallel training |
Platform-adaptive SIMD (AVX2 on x86, NEON on ARM, scalar fallback) for dot products and scatter-add. BF16 quantization with two modes: full BF16 or BF16-storage/FP32-gradient. ContiguousWeightStore provides cache-line-aligned (64-byte) weight layouts.
Trainable hash functions that adapt to data distribution, plus an AdaptiveScheduler that monitors hash-bucket drift via EMA and triggers rebuilds only when needed.
| Parameter | Default | Description |
|---|---|---|
min_interval |
— | Minimum steps between rebuild checks |
max_interval |
— | Forced rebuild interval |
sample_fraction |
— | Fraction of neurons to sample for drift |
drift_threshold |
— | Drift level that triggers a rebuild |
ema_alpha |
0.3 | Exponential moving average smoothing |
Automatic LSH hyperparameter search over K and L to hit a target recall while minimizing query cost.
use klearu_bolt::autotune::LshAutotuner;
let tuner = LshAutotuner::new(0.9) // target 90% recall
.with_k_range(4, 16)
.with_l_range(10, 200)
.with_num_samples(100)
.with_speedup_ratio(0.1);
let result = tuner.autotune(&neurons, &queries, 42);
// result.best_k, result.best_l, result.recall, result.query_costImplementation of the Deja Vu paper: lightweight MLP predictors that identify which attention heads and FFN neurons are important for each token, enabling sparse transformer inference.
A LLaMA-compatible inference engine supporting GQA, RoPE, RMSNorm, and SwiGLU. Works with any HuggingFace-format model that uses the LLaMA architecture.
| Parameter | Default | Description |
|---|---|---|
temperature |
0.7 | Sampling temperature (0.0 = greedy) |
top_k |
40 | Top-k filtering (0 = disabled) |
top_p |
0.9 | Nucleus sampling (1.0 = disabled) |
repetition_penalty |
1.1 | Penalize repeated tokens (1.0 = disabled) |
max_new_tokens |
512 | Maximum tokens to generate |
template |
auto | Chat template (auto, zephyr, chatml, llama2, llama3, mistral, raw) |
| Parameter | Default | Description |
|---|---|---|
head_sparsity |
0.5 | Fraction of attention heads to keep |
neuron_sparsity |
0.5 | Fraction of FFN neurons to keep |
A pure-Rust vision encoder library supporting multiple architectures. All models load from timm safetensors format with automatic preprocessing config detection.
Supported architectures:
| Architecture | Loader | Notes |
|---|---|---|
| DaViT | load_davit_model() |
4-stage dual-attention (spatial window + channel) |
| ViT | load_vit_model() |
Standard Vision Transformer (CLS/mean pool) |
| Swin | — | Shifted-window attention with relative position bias |
| ConvNeXt | — | Pure-convolution "modernized ResNet" |
| Hiera | — | Hierarchical ViT with masked-unit attention |
| EVA-02 | load_eva02_model() |
ViT with SwiGLU MLP and RoPE |
| DINOv2 | load_dinov2_model() |
Self-supervised ViT feature extractor |
| SigLIP | load_siglip_model() |
Sigmoid-loss contrastive vision encoder |
| Qwen Vision | load_qwen_vision_from_dir() |
Conv2d patch embed → ViT blocks → PatchMerger |
Features:
- Preprocessing: resize (bicubic/bilinear), center crop, ImageNet normalization — auto-detected from timm
pretrained_cfg - INT8 quantization:
QuantizedLinear(W8A32) with per-channel symmetric quantization - 2D RoPE for position-aware attention (EVA-02)
- Test-time augmentation: horizontal flip, five-crop, ten-crop
- Sparse inference (feature:
sparse): per-block Deja Vu sparsity prediction for all architectures
# Download a DaViT model from timm
huggingface-cli download timm/davit_tiny.msft_in1k --local-dir davit_tiny
# Run inference on an image (requires `cli` feature)
cargo run --release -p klearu-vision --features cli --bin davit-infer -- \
./davit_tiny path/to/image.jpg
# With horizontal-flip TTA
cargo run --release -p klearu-vision --features cli --bin davit-infer -- \
./davit_tiny path/to/image.jpg --ttause klearu_vision::weight::load_vit_model;
use klearu_vision::preprocess::PreprocessConfig;
// Load a ViT model from a timm model directory
let model = load_vit_model("./vit_tiny")?;
// Preprocess: [C, H, W] f32 tensor, ImageNet-normalized
let image: Vec<f32> = /* your preprocessing */;
let logits = model.forward(&image);The klearu-llm crate includes a VlmBridge that connects the Qwen Vision encoder to LLM inference, replacing <image> placeholder tokens with vision encoder outputs:
use klearu_llm::vlm::{VlmBridge, VlmImage};
let bridge = VlmBridge::new(vision_encoder, image_token_id,
vision_start_token_id, vision_end_token_id);
// Encode image and inject into text embeddings
let image = VlmImage { data: chw_tensor, height: 448, width: 448 };
let merged = bridge.inject_vision_tokens(
&token_ids, &text_embeddings, &[image], hidden_size,
);# DaViT tiny (~44 MB)
huggingface-cli download timm/davit_tiny.msft_in1k --local-dir davit_tiny
# ViT tiny (~22 MB)
huggingface-cli download timm/vit_tiny_patch16_224.augreg_in21k_ft_in1k --local-dir vit_tiny
# EVA-02 tiny (~22 MB)
huggingface-cli download timm/eva02_tiny_patch14_336.mim_in22k_ft_in1k --local-dir eva02_tiny
# DINOv2 small (~84 MB)
huggingface-cli download timm/vit_small_patch14_dinov2.lvd142m --local-dir dinov2_small
# SigLIP base (~354 MB)
huggingface-cli download timm/vit_base_patch16_siglip_224.webli --local-dir siglip_base
# Qwen3.5-0.8B VLM (~1.8 GB, includes both vision encoder and LLM)
huggingface-cli download Qwen/Qwen3.5-0.8B --local-dir Qwen3.5-0.8BAES-based DPF using the BGI construction, plus DCF (Distributed Comparison Functions) via prefix decomposition into DPFs. Used as a building block for the MPC protocols.
Fixed-point arithmetic in Q16.16 (u32 shares) and Q32.32 (u64 shares), additive secret sharing, Beaver triple multiplication, polynomial SiLU approximation, and reveal-and-correct RMSNorm. Provides a Transport trait for abstracting communication.
End-to-end private inference combining Ferret COT (Correlated Oblivious Transfer), Ristretto255 OPRF, and the MPC building blocks. Two security levels:
| Level | Communication | Privacy | Speed |
|---|---|---|---|
| Lower | ~4.6 KB/token | Server learns nothing; client embedding revealed then plaintext forward | Fast |
| High | ~2 MB/token, ~34K triples | Only norms, queries, and gate values revealed | Slower |
Klearu works with any HuggingFace LLaMA-architecture model in safetensors format. SmolLM models are a good starting point for testing:
# Install the HuggingFace CLI if you don't have it
pip install huggingface-hub
# Download SmolLM-135M-Instruct (~270 MB)
huggingface-cli download HuggingFaceTB/SmolLM-135M-Instruct \
--local-dir SmolLM-135M-Instruct
# Or a larger model — SmolLM-360M-Instruct (~720 MB)
huggingface-cli download HuggingFaceTB/SmolLM-360M-Instruct \
--local-dir SmolLM-360M-Instruct
# Or SmolLM-1.7B-Instruct (~3.4 GB)
huggingface-cli download HuggingFaceTB/SmolLM-1.7B-Instruct \
--local-dir SmolLM-1.7B-InstructThe model directory should contain at minimum:
config.json— HuggingFace model configurationtokenizer.json— Tokenizer*.safetensors— Model weights
# Basic chat (auto-detects chat template)
cargo run --release --bin chat -- ./SmolLM-135M-Instruct
# With custom sampling parameters
cargo run --release --bin chat -- ./SmolLM-135M-Instruct \
--temp 0.8 --top-k 50 --top-p 0.95 --max-tokens 256
# With a system prompt
cargo run --release --bin chat -- ./SmolLM-135M-Instruct \
--system "You are a helpful coding assistant."
# Force a specific chat template
cargo run --release --bin chat -- ./SmolLM-135M-Instruct \
--template chatmlThe chat binary starts an interactive loop — type your message and press Enter. Use Ctrl-D to quit.
First calibrate sparsity predictors, then run with --sparse:
# Train predictors (requires sparse feature)
cargo run --release --features sparse --bin calibrate -- ./SmolLM-135M-Instruct \
--samples 16 --epochs 100
# Chat with sparse inference
cargo run --release --features sparse --bin chat -- ./SmolLM-135M-Instruct \
--sparse --head-sparsity 0.5 --neuron-sparsity 0.5Validate that a model loads and runs correctly:
cargo run --release --bin diagnose -- ./SmolLM-135M-InstructThis checks config parsing, weight loading, tokenizer functionality, forward pass sanity, and greedy generation.
Run inference where the server holds the model weights and the client's input tokens remain private:
# Terminal 1 — start the server
cargo run --release --bin private-server -- ./SmolLM-135M-Instruct \
--port 9000 --security lower
# Terminal 2 — connect the client
cargo run --release --bin private-client -- ./SmolLM-135M-Instruct \
--host localhost:9000 --security lowerFor development and testing, add --dummy-triples to both sides to skip Ferret OT setup. For real security, omit this flag to use actual oblivious transfer.
The facade crate (klearu) provides feature-gated access to all functionality:
| Feature | Enables |
|---|---|
simd |
SIMD-accelerated dot products and scatter-add |
bf16 |
BF16 quantization |
mongoose |
Learnable hashing and adaptive scheduling |
bolt |
LSH autotuning |
deja-vu |
Transformer sparsity prediction |
llm |
LLM inference engine |
full |
All of the above |
The sparse feature on klearu-llm enables Deja Vu sparse inference and the calibrate binary. The sparse feature on klearu-vision enables per-block sparsity prediction for all vision architectures. The cli feature on klearu-vision enables the davit-infer binary for image classification.