Linum v2: Text-to-Video Generation

Linum v2 is a pair of 2B parameter text-to-video generation models (360p or 720p, 2-5 seconds, 24 FPS).

Installation

Prerequisites

Python 3.10-3.12
NVIDIA GPU with CUDA 12.8 support

Install with uv

First, install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then clone and install dependencies:

git clone https://github.com/Linum-AI/linum-v2.git
cd linum-v2
uv sync

Quick Start

Generate your first video:

# 720p (default)
uv run python generate_video.py \
    --prompt "In a charming hand-drawn 2D animation style, a rust-orange fox with cream chest fur and alert triangular ears grips a cherry-red steering wheel with both paws, its bushy tail curled on the passenger seat. Stylized trees and pastel houses whoosh past the windows in smooth parallax layers. The fox's golden eyes focus intently ahead, whiskers twitching as it navigates a winding country road rendered in soft watercolor textures." \
    --output fox.mp4 \
    --seed 20 \
    --cfg 7.0

fox_720p_demo.mp4

# 360p (faster, lower VRAM)
uv run python generate_video.py \
    --prompt "A cute 3D animated baby goat with shaggy gray fur, a fluffy white chin tuft, and stubby curved horns perches on a round wooden stool. Warm golden studio lights bounce off its glossy cherry-red acoustic guitar as it rhythmically strums with a confident hoof, hind legs dangling. Framed family portraits of other barnyard animals line the cream-colored walls, a leafy potted ficus sits in the back corner, and dust motes drift through the cozy, sun-speckled room." \
    --output goat.mp4 \
    --seed 16 \
    --cfg 10.0 \
    --resolution 360p

goat.mp4

Weights are automatically downloaded from HuggingFace Hub on first run (~20GB per model).

Usage

Basic Usage

# 720p video, 2 seconds (default)
uv run python generate_video.py --prompt "Your prompt here" --output output.mp4

# 360p video, 2 seconds (faster, lower VRAM)
uv run python generate_video.py --prompt "Your prompt here" --output output.mp4 --resolution 360p

# 720p video, longer duration
uv run python generate_video.py --prompt "Your prompt here" --duration 4.0

All Options

uv run python generate_video.py \
    --prompt "Your detailed prompt" \
    --output output.mp4 \
    --resolution 720p \
    --duration 2.0 \
    --seed 42 \
    --cfg 7.0 \
    --num_steps 50 \
    --negative_prompt "blurry, low quality"

Argument	Default	Description
`--prompt`	(required)	Text description of the video
`--output`	`output.mp4`	Output file path
`--resolution`	`720p`	Resolution: `360p` or `720p`
`--duration`	`2.0`	Video duration in seconds (2.0-5.0)
`--seed`	`20`	Random seed for reproducibility
`--cfg`	`10.0`	Classifier-free guidance scale (recommended: 7-10; higher values follow prompts more closely but may oversaturate)
`--num_steps`	`50`	Number of sampler steps
`--negative_prompt`	`""`	What to avoid in generation

Using Local Weights

If you've downloaded weights manually:

uv run python generate_video.py \
    --prompt "Your prompt" \
    --model-path /path/to/dit.safetensors \
    --vae-path /path/to/vae.safetensors \
    --t5-encoder-path /path/to/t5/text_encoder \
    --t5-tokenizer-path /path/to/t5/tokenizer

Hardware Requirements

Resolution	VRAM Required
360p	~25GB
720p	~35GB

Recommended GPUs: H100, A100-80GB, or similar high-VRAM GPUs

Speed Benchmarks (H100, 50 steps)

Resolution	Duration	Generation Time
360p	2 seconds	~40 seconds
360p	5 seconds	~2 minutes
720p	2 seconds	~4 minutes
720p	5 seconds	~15 minutes

Model Architecture

Linum V2 uses a Diffusion Transformer (DiT) architecture with:

DiT Backbone: 2B parameters, trained from scratch with flow matching objective
Text Encoder: T5-XXL
VAE: WAN 2.1 VAE

Model Weights

Weights are hosted on HuggingFace Hub:

Linum-AI/linum-v2-360p - 360p model
Linum-AI/linum-v2-720p - 720p model

Citation

@software{linum_v2_2026,
  title = {Linum V2: Text-to-Video Generation},
  author = {Linum AI},
  year = {2026},
  url = {https://github.com/Linum-AI/linum-v2}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file.

About Linum

Linum is a team of two brothers building a tiny-yet-powerful AI research lab. We train our own generative media models from scratch.

Subscribe to Field Notes — technical deep dives on building generative video models from the ground up, plus updates on new releases from Linum.

Contact: hello@linum.ai — Reach out if you're selling high-quality video data.

Acknowledgments

This project uses the following components under the Apache 2.0 License:

Thank you to our investors and infrastructure partners:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
hf_model_cards		hf_model_cards
linum_v2		linum_v2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_video.py		generate_video.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linum v2: Text-to-Video Generation

Installation

Prerequisites

Install with uv

Quick Start

Usage

Basic Usage

All Options

Using Local Weights

Hardware Requirements

Speed Benchmarks (H100, 50 steps)

Model Architecture

Model Weights

Citation

License

About Linum

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Linum v2: Text-to-Video Generation

Installation

Prerequisites

Install with uv

Quick Start

Usage

Basic Usage

All Options

Using Local Weights

Hardware Requirements

Speed Benchmarks (H100, 50 steps)

Model Architecture

Model Weights

Citation

License

About Linum

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages