lele: Bare-Metal Rust Audio AI Framework

lele is a standalone, dependency-free inference engine for audio intelligence, built from scratch in pure Rust.

It rejects the "general-purpose runtime" approach (wrapping C++ libs like ORT or using heavy Torch ports) in favor of hand-crafted, domain-specific kernels.

Overview

lele is designed to run deep learning models (specifically speech-related ones like SenseVoice, Silero VAD, and TTS) with minimal overhead. It avoids heavy runtimes like ONNX Runtime or burn by compiling ONNX graphs directly into optimized Rust source code.

Why Not ORT/Burn?

Generic Overhead: General runtimes carry massive baggage (graph optimization, dynamic shapes, thousands of unused ops) that slows down specific, small-batch audio models.
FFI Penalties: Binding layers introduce latency and inhibit compiler inlining.
Black Box Memory: We need absolute control over every byte of allocation for embedded/real-time constraints.

Key Features

Zero Runtime Dependencies: Generated models are pure Rust.
AOT Compilation: Converts ONNX models to specialized Rust code for maximum performance.
SIMD Optimized: Hand-written kernels using Apple Silicon (NEON) and x86_64 (AVX/SSE) intrinsics.
Memory Efficient: Static buffer allocation and zero-copy weight loading.
Speech Optimized: Built-in feature extraction for audio (FFT, Mel-spectrogram, LFR, CMVN).

Supported Models

SenseVoiceSmall: High-accuracy multi-lingual ASR.
Silero VAD: Reliable Voice Activity Detection.
Supertonic: Fast and high-quality Text-to-Speech.

Getting Started

Prerequisites

Rust (Latest stable)
cargo

Compilation & Generation

To compile an ONNX model into Rust code:

cargo run --release --bin lele_gen -- <model.onnx> <output_path.rs>

Running Examples

# SenseVoice ASR
./run_sensevoice.sh

# Supertonic TTS
./run_supertonic.sh

Roadmap

Performance optimizations (SIMD, multi-threading, etc.), better than ONNX Runtime.
Support for more audio models (e.g., Whisper, CosyVoice, etc.)
GPU acceleration backend (wgpu); Quantization (INT8/FP16)
Advanced attention mechanisms (FlashAttention, PagedAttention)
Voice API server (RESTful service), including ASR/TTS/Denoise endpoints.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
examples		examples
fixtures		fixtures
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
build.rs		build.rs
run_sensevoice.sh		run_sensevoice.sh
run_silero.sh		run_silero.sh
run_supertonic.sh		run_supertonic.sh
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lele: Bare-Metal Rust Audio AI Framework

Overview

Key Features

Supported Models

Getting Started

Prerequisites

Compilation & Generation

Running Examples

Roadmap

License

About

Uh oh!

Releases

Packages

Languages

audiohacking/lele

Folders and files

Latest commit

History

Repository files navigation

lele: Bare-Metal Rust Audio AI Framework

Overview

Key Features

Supported Models

Getting Started

Prerequisites

Compilation & Generation

Running Examples

Roadmap

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages