Linum v2 is a pair of 2B parameter text-to-video generation models (360p or 720p, 2-5 seconds, 24 FPS).
- Python 3.10-3.12
- NVIDIA GPU with CUDA 12.8 support
First, install uv:
curl -LsSf https://astral.sh/uv/install.sh | shThen clone and install dependencies:
git clone https://github.com/Linum-AI/linum-v2.git
cd linum-v2
uv syncGenerate your first video:
# 720p (default)
uv run python generate_video.py \
--prompt "In a charming hand-drawn 2D animation style, a rust-orange fox with cream chest fur and alert triangular ears grips a cherry-red steering wheel with both paws, its bushy tail curled on the passenger seat. Stylized trees and pastel houses whoosh past the windows in smooth parallax layers. The fox's golden eyes focus intently ahead, whiskers twitching as it navigates a winding country road rendered in soft watercolor textures." \
--output fox.mp4 \
--seed 20 \
--cfg 7.0fox_720p_demo.mp4
# 360p (faster, lower VRAM)
uv run python generate_video.py \
--prompt "A cute 3D animated baby goat with shaggy gray fur, a fluffy white chin tuft, and stubby curved horns perches on a round wooden stool. Warm golden studio lights bounce off its glossy cherry-red acoustic guitar as it rhythmically strums with a confident hoof, hind legs dangling. Framed family portraits of other barnyard animals line the cream-colored walls, a leafy potted ficus sits in the back corner, and dust motes drift through the cozy, sun-speckled room." \
--output goat.mp4 \
--seed 16 \
--cfg 10.0 \
--resolution 360pgoat.mp4
Weights are automatically downloaded from HuggingFace Hub on first run (~20GB per model).
# 720p video, 2 seconds (default)
uv run python generate_video.py --prompt "Your prompt here" --output output.mp4
# 360p video, 2 seconds (faster, lower VRAM)
uv run python generate_video.py --prompt "Your prompt here" --output output.mp4 --resolution 360p
# 720p video, longer duration
uv run python generate_video.py --prompt "Your prompt here" --duration 4.0uv run python generate_video.py \
--prompt "Your detailed prompt" \
--output output.mp4 \
--resolution 720p \
--duration 2.0 \
--seed 42 \
--cfg 7.0 \
--num_steps 50 \
--negative_prompt "blurry, low quality"| Argument | Default | Description |
|---|---|---|
--prompt |
(required) | Text description of the video |
--output |
output.mp4 |
Output file path |
--resolution |
720p |
Resolution: 360p or 720p |
--duration |
2.0 |
Video duration in seconds (2.0-5.0) |
--seed |
20 |
Random seed for reproducibility |
--cfg |
10.0 |
Classifier-free guidance scale (recommended: 7-10; higher values follow prompts more closely but may oversaturate) |
--num_steps |
50 |
Number of sampler steps |
--negative_prompt |
"" |
What to avoid in generation |
If you've downloaded weights manually:
uv run python generate_video.py \
--prompt "Your prompt" \
--model-path /path/to/dit.safetensors \
--vae-path /path/to/vae.safetensors \
--t5-encoder-path /path/to/t5/text_encoder \
--t5-tokenizer-path /path/to/t5/tokenizer| Resolution | VRAM Required |
|---|---|
| 360p | ~25GB |
| 720p | ~35GB |
Recommended GPUs: H100, A100-80GB, or similar high-VRAM GPUs
| Resolution | Duration | Generation Time |
|---|---|---|
| 360p | 2 seconds | ~40 seconds |
| 360p | 5 seconds | ~2 minutes |
| 720p | 2 seconds | ~4 minutes |
| 720p | 5 seconds | ~15 minutes |
Linum V2 uses a Diffusion Transformer (DiT) architecture with:
- DiT Backbone: 2B parameters, trained from scratch with flow matching objective
- Text Encoder: T5-XXL
- VAE: WAN 2.1 VAE
Weights are hosted on HuggingFace Hub:
- Linum-AI/linum-v2-360p - 360p model
- Linum-AI/linum-v2-720p - 720p model
@software{linum_v2_2026,
title = {Linum V2: Text-to-Video Generation},
author = {Linum AI},
year = {2026},
url = {https://github.com/Linum-AI/linum-v2}
}This project is licensed under the Apache License 2.0 - see the LICENSE file.
Linum is a team of two brothers building a tiny-yet-powerful AI research lab. We train our own generative media models from scratch.
Subscribe to Field Notes — technical deep dives on building generative video models from the ground up, plus updates on new releases from Linum.
Contact: hello@linum.ai — Reach out if you're selling high-quality video data.
This project uses the following components under the Apache 2.0 License:
- Wan Video 2.1 3D Causal Video VAE
- Google T5-XXL
- PyTorch, HuggingFace Transformers, HuggingFace Diffusers
Thank you to our investors and infrastructure partners: