Native Apple Silicon implementation of Lightricks LTX-2 video generation models using MLX. Supports both LTX-2.0 (19B) and LTX-2.3 (22B) with automatic version detection.
|
"A golden retriever running through a sunny meadow" golden_retriever.mp4 |
"A city street at night with neon lights and rain" cyberpunk_city.mp4 |
|
"A rocket ship launching into space with flames" rocket_launch.mp4 |
"Ocean waves crashing on a beach at sunset" ocean_sunset.mp4 |
768×512, 65 frames (~2.7s at 24fps), 8 steps on Apple Silicon
# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Download weights
uv run scripts/download_weights.py
# 3. Generate video (auto-resolves cached LTX/Gemma weights from HF_HOME)
uv run python scripts/generate.py "A golden retriever running through a meadow"
# Or specify the 2.3 checkpoint explicitly
uv run python scripts/generate.py --weights weights/ltx-2.3/ltx-2.3-22b-distilled.safetensors \
"A golden retriever running through a meadow"Download from Lightricks/LTX-2 on HuggingFace:
| Model | Size | Description |
|---|---|---|
ltx-2.3-22b-distilled.safetensors |
46GB | Latest - 22B params, 8 steps |
| Model | Size | Description |
|---|---|---|
ltx-2-19b-distilled.safetensors |
43GB | Fast generation (8 steps) |
ltx-2-19b-dev.safetensors |
43GB | Higher quality (25-50 steps) |
ltx-2.3-spatial-upscaler-x2-1.1.safetensors |
950MB | 2x resolution upscaling |
ltx-2-temporal-upscaler-x2-1.0.safetensors |
262MB | 2x framerate upscaling |
ltx-2-19b-distilled-lora-384.safetensors |
1.5GB | LoRA for two-stage refinement |
Text Encoder: Gemma 3 12B (~25GB) - Requires accepting license
Or use the interactive downloader:
uv run scripts/download_weights.py --weights all| Pipeline | Speed | Quality | Best For |
|---|---|---|---|
text-to-video |
Medium | Good | Basic generation |
distilled |
Fast | Good | No-CFG two-stage quick iteration |
one-stage |
Slow | High | Quality priority or single-pass distilled |
two-stage |
Medium | High | High resolution (512p+) |
# Fast two-stage distilled preview
python scripts/generate.py "Your prompt" --pipeline distilled
# Existing single-pass distilled path
python scripts/generate.py "Your prompt" --pipeline one-stage --model-variant distilled
# High quality dev-style sampling
python scripts/generate.py "Your prompt" --pipeline one-stage --steps 20 --cfg 5.0
# High resolution
python scripts/generate.py "Your prompt" --pipeline two-stage --height 768 --width 1024See Pipelines Guide for all 6 pipelines and options.
- Use default BF16 compute - override with
--dtype float16or--dtype float32only for experiments - Audio follows compute dtype where safe - LTX-2.3 Vocoder+BWE keeps a scoped FP32 island matching Lightricks' precision caution
- VAE decode defaults to native Conv3d + zero padding -
--vae-tiling autonow picks a RAM-aware native tile plan, so override--vae-decoder,--vae-tiling, or--vae-spatial-paddingonly for A/B tests - Default canvas is 512x288 - pass
--height/--widthonly when you want to leave the fast 16:9 preview size - Default outputs are timestamped - without
--output, runs save toDIFFUSERS_OUTPUT_DIR, thenOUTPUT_DIR, thenoutputs/asltx_YYYYmmdd_HHMMSS.mp4; use--output-prefixto name a run family - Pick the encode tier by destination -
--encode-tier {web,default,hq,export,reference}(default:default) selects codec/container/audio together.webis libx264 + AAC for universal browser compat;defaultis hardware HEVC + ALAC for Apple/modern browsers;hqis software HEVC 4:4:4;exportandreferenceare ProRes (.mov) for NLE / mastering workflows - Converted-weight cache defaults to auto - the first run builds reusable transformer, connector, video VAE, audio VAE, and vocoder cache files; pass
--weights-cache offonly when you specifically want direct stock-weight loading - Keep
--weightsas the bundle path - advanced runs can override individual subsystems with--transformer-weights,--connector-weights,--vae-weights,--audio-vae-weights,--vocoder-weights, and--config-weights - MLX allocator cache defaults to 1GB - this keeps unified-memory pressure lower without needing a routine
--mlx-cache-limit-gb 1 - Same-math video layouts default on - FF
project_in/project_outand attentionto_outpretranspose are enabled by default; pass--video-ff-layout off --video-attn-layout offfor baseline A/Bs - Use
--stream-transformerfor the block-streaming preset - it expands to 16 resident blocks, resident-group compile, and 4-block compile groups - Save latents for decode-only tests - add
--save-latentsto write an NPZ sidecar next to the requested output; distilled two-stage runs include both stage-1 and stage-2 latents plus the existing final-latent keys - Save text conditioning for denoise A/Bs - add
--save-text-embeddingsto write the positive/negative AV text encoder outputs as an_text.npzsidecar that can be reused with--embedding - Save run metadata for reproducibility - add
--save-run-logto write params, argv, outputs, and timings as an_run.jsonsidecar, starting before the long generation step - Save the lossless audio next to the encoded video - add
--save-audio-sidecarto write the vocoder's raw WAV alongside the output (useful for A/B against the codec-compressed audio inside the container) - Save all reproducibility sidecars - add
--save-all-sidecarsto turn on latents, text conditioning, run metadata, and the audio WAV sidecar together - Use
--pipeline distilled- Fast no-CFG two-stage inference (8+3 steps) - Use
--stream-transformerbefore--low-memory- the streaming preset is the cleaner constrained-memory path for modern distilled runs;--low-memoryremains an emergency fallback - Reduce resolution - Start with
--height 256 --width 384for testing - Research denoise speed carefully -
--video-ff-quantize project_out:mxfp8can A/B weight-only quantized video FF projections, and--video-ff-quantize-layers 40-47narrows it to selected layers; this is non-canonical and needs quality checks - A/B same-math layout baselines - use
--video-ff-layout off --video-attn-layout offwhen you want to compare against untransposed stock weight layout - Track denoise-speed experiments - see Performance Optimization Notes for MLX runtime optimization ideas and benchmark rules
See Usage Guide for memory requirements and benchmarks.
Focus on detailed, chronological descriptions. Include movements, appearances, camera angles, and environment details in a flowing paragraph. Keep under 200 words.
Structure your prompts:
- Main action in a single sentence
- Specific movements and gestures
- Character/object appearances
- Background and environment
- Camera angles and movements
- Lighting and colors
See Lightricks prompting guide for more tips.
- macOS with Apple Silicon (M1/M2/M3/M4)
- ~25GB RAM (128GB recommended for high resolution)
- ffmpeg:
brew install ffmpeg
- Usage Guide - Options, examples, troubleshooting
- Pipelines - All 6 pipelines explained
- Architecture - Model architecture details
- Performance Optimization Notes - Denoise-speed benchmark ideas and implementation candidates
- Parity Testing - PyTorch/MLX verification (97%+ correlation)
- Technical Report - Official Lightricks paper
Research and educational use. See LTX-2 for model licensing.
- Lightricks for LTX-2
- Apple MLX Team for MLX
- Google for Gemma 3