Skip to content

ltctech/LTX-2-MLX

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LTX-2-MLX

Model Original Repo Paper

Native Apple Silicon implementation of Lightricks LTX-2 video generation models using MLX. Supports both LTX-2.0 (19B) and LTX-2.3 (22B) with automatic version detection.

"A golden retriever running through a sunny meadow"

golden_retriever.mp4

"A city street at night with neon lights and rain"

cyberpunk_city.mp4

"A rocket ship launching into space with flames"

rocket_launch.mp4

"Ocean waves crashing on a beach at sunset"

ocean_sunset.mp4

768×512, 65 frames (~2.7s at 24fps), 8 steps on Apple Silicon

Quick Start

# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# 2. Download weights
uv run scripts/download_weights.py

# 3. Generate video (auto-resolves cached LTX/Gemma weights from HF_HOME)
uv run python scripts/generate.py "A golden retriever running through a meadow"

# Or specify the 2.3 checkpoint explicitly
uv run python scripts/generate.py --weights weights/ltx-2.3/ltx-2.3-22b-distilled.safetensors \
  "A golden retriever running through a meadow"

Required Models

LTX-2.3 (Recommended)

Download from Lightricks/LTX-2 on HuggingFace:

Model Size Description
ltx-2.3-22b-distilled.safetensors 46GB Latest - 22B params, 8 steps

LTX-2.0

Model Size Description
ltx-2-19b-distilled.safetensors 43GB Fast generation (8 steps)
ltx-2-19b-dev.safetensors 43GB Higher quality (25-50 steps)
ltx-2.3-spatial-upscaler-x2-1.1.safetensors 950MB 2x resolution upscaling
ltx-2-temporal-upscaler-x2-1.0.safetensors 262MB 2x framerate upscaling
ltx-2-19b-distilled-lora-384.safetensors 1.5GB LoRA for two-stage refinement

Text Encoder: Gemma 3 12B (~25GB) - Requires accepting license

Or use the interactive downloader:

uv run scripts/download_weights.py --weights all

Available Pipelines

Pipeline Speed Quality Best For
text-to-video Medium Good Basic generation
distilled Fast Good No-CFG two-stage quick iteration
one-stage Slow High Quality priority or single-pass distilled
two-stage Medium High High resolution (512p+)
# Fast two-stage distilled preview
python scripts/generate.py "Your prompt" --pipeline distilled

# Existing single-pass distilled path
python scripts/generate.py "Your prompt" --pipeline one-stage --model-variant distilled

# High quality dev-style sampling
python scripts/generate.py "Your prompt" --pipeline one-stage --steps 20 --cfg 5.0

# High resolution
python scripts/generate.py "Your prompt" --pipeline two-stage --height 768 --width 1024

See Pipelines Guide for all 6 pipelines and options.

Optimization Tips

  • Use default BF16 compute - override with --dtype float16 or --dtype float32 only for experiments
  • Audio follows compute dtype where safe - LTX-2.3 Vocoder+BWE keeps a scoped FP32 island matching Lightricks' precision caution
  • VAE decode defaults to native Conv3d + zero padding - --vae-tiling auto now picks a RAM-aware native tile plan, so override --vae-decoder, --vae-tiling, or --vae-spatial-padding only for A/B tests
  • Default canvas is 512x288 - pass --height/--width only when you want to leave the fast 16:9 preview size
  • Default outputs are timestamped - without --output, runs save to DIFFUSERS_OUTPUT_DIR, then OUTPUT_DIR, then outputs/ as ltx_YYYYmmdd_HHMMSS.mp4; use --output-prefix to name a run family
  • Pick the encode tier by destination - --encode-tier {web,default,hq,export,reference} (default: default) selects codec/container/audio together. web is libx264 + AAC for universal browser compat; default is hardware HEVC + ALAC for Apple/modern browsers; hq is software HEVC 4:4:4; export and reference are ProRes (.mov) for NLE / mastering workflows
  • Converted-weight cache defaults to auto - the first run builds reusable transformer, connector, video VAE, audio VAE, and vocoder cache files; pass --weights-cache off only when you specifically want direct stock-weight loading
  • Keep --weights as the bundle path - advanced runs can override individual subsystems with --transformer-weights, --connector-weights, --vae-weights, --audio-vae-weights, --vocoder-weights, and --config-weights
  • MLX allocator cache defaults to 1GB - this keeps unified-memory pressure lower without needing a routine --mlx-cache-limit-gb 1
  • Same-math video layouts default on - FF project_in/project_out and attention to_out pretranspose are enabled by default; pass --video-ff-layout off --video-attn-layout off for baseline A/Bs
  • Use --stream-transformer for the block-streaming preset - it expands to 16 resident blocks, resident-group compile, and 4-block compile groups
  • Save latents for decode-only tests - add --save-latents to write an NPZ sidecar next to the requested output; distilled two-stage runs include both stage-1 and stage-2 latents plus the existing final-latent keys
  • Save text conditioning for denoise A/Bs - add --save-text-embeddings to write the positive/negative AV text encoder outputs as an _text.npz sidecar that can be reused with --embedding
  • Save run metadata for reproducibility - add --save-run-log to write params, argv, outputs, and timings as an _run.json sidecar, starting before the long generation step
  • Save the lossless audio next to the encoded video - add --save-audio-sidecar to write the vocoder's raw WAV alongside the output (useful for A/B against the codec-compressed audio inside the container)
  • Save all reproducibility sidecars - add --save-all-sidecars to turn on latents, text conditioning, run metadata, and the audio WAV sidecar together
  • Use --pipeline distilled - Fast no-CFG two-stage inference (8+3 steps)
  • Use --stream-transformer before --low-memory - the streaming preset is the cleaner constrained-memory path for modern distilled runs; --low-memory remains an emergency fallback
  • Reduce resolution - Start with --height 256 --width 384 for testing
  • Research denoise speed carefully - --video-ff-quantize project_out:mxfp8 can A/B weight-only quantized video FF projections, and --video-ff-quantize-layers 40-47 narrows it to selected layers; this is non-canonical and needs quality checks
  • A/B same-math layout baselines - use --video-ff-layout off --video-attn-layout off when you want to compare against untransposed stock weight layout
  • Track denoise-speed experiments - see Performance Optimization Notes for MLX runtime optimization ideas and benchmark rules

See Usage Guide for memory requirements and benchmarks.

Prompting Tips

Focus on detailed, chronological descriptions. Include movements, appearances, camera angles, and environment details in a flowing paragraph. Keep under 200 words.

Structure your prompts:

  1. Main action in a single sentence
  2. Specific movements and gestures
  3. Character/object appearances
  4. Background and environment
  5. Camera angles and movements
  6. Lighting and colors

See Lightricks prompting guide for more tips.

Requirements

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • ~25GB RAM (128GB recommended for high resolution)
  • ffmpeg: brew install ffmpeg

Documentation

License

Research and educational use. See LTX-2 for model licensing.

Acknowledgments

About

Attempt at Porting LTX-2 Video Model to Apple's MLX Machine Learning Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 97.6%
  • Shell 2.2%
  • C 0.2%