Skip to content

Linum-AI/linum-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Linum AI

✨ Check out the launch

Linum v2: Text-to-Video Generation

Linum v2 is a pair of 2B parameter text-to-video generation models (360p or 720p, 2-5 seconds, 24 FPS).

Installation

Prerequisites

  • Python 3.10-3.12
  • NVIDIA GPU with CUDA 12.8 support

Install with uv

First, install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then clone and install dependencies:

git clone https://github.com/Linum-AI/linum-v2.git
cd linum-v2
uv sync

Quick Start

Generate your first video:

# 720p (default)
uv run python generate_video.py \
    --prompt "In a charming hand-drawn 2D animation style, a rust-orange fox with cream chest fur and alert triangular ears grips a cherry-red steering wheel with both paws, its bushy tail curled on the passenger seat. Stylized trees and pastel houses whoosh past the windows in smooth parallax layers. The fox's golden eyes focus intently ahead, whiskers twitching as it navigates a winding country road rendered in soft watercolor textures." \
    --output fox.mp4 \
    --seed 20 \
    --cfg 7.0
fox_720p_demo.mp4
# 360p (faster, lower VRAM)
uv run python generate_video.py \
    --prompt "A cute 3D animated baby goat with shaggy gray fur, a fluffy white chin tuft, and stubby curved horns perches on a round wooden stool. Warm golden studio lights bounce off its glossy cherry-red acoustic guitar as it rhythmically strums with a confident hoof, hind legs dangling. Framed family portraits of other barnyard animals line the cream-colored walls, a leafy potted ficus sits in the back corner, and dust motes drift through the cozy, sun-speckled room." \
    --output goat.mp4 \
    --seed 16 \
    --cfg 10.0 \
    --resolution 360p
goat.mp4

Weights are automatically downloaded from HuggingFace Hub on first run (~20GB per model).

Usage

Basic Usage

# 720p video, 2 seconds (default)
uv run python generate_video.py --prompt "Your prompt here" --output output.mp4

# 360p video, 2 seconds (faster, lower VRAM)
uv run python generate_video.py --prompt "Your prompt here" --output output.mp4 --resolution 360p

# 720p video, longer duration
uv run python generate_video.py --prompt "Your prompt here" --duration 4.0

All Options

uv run python generate_video.py \
    --prompt "Your detailed prompt" \
    --output output.mp4 \
    --resolution 720p \
    --duration 2.0 \
    --seed 42 \
    --cfg 7.0 \
    --num_steps 50 \
    --negative_prompt "blurry, low quality"
Argument Default Description
--prompt (required) Text description of the video
--output output.mp4 Output file path
--resolution 720p Resolution: 360p or 720p
--duration 2.0 Video duration in seconds (2.0-5.0)
--seed 20 Random seed for reproducibility
--cfg 10.0 Classifier-free guidance scale (recommended: 7-10; higher values follow prompts more closely but may oversaturate)
--num_steps 50 Number of sampler steps
--negative_prompt "" What to avoid in generation

Using Local Weights

If you've downloaded weights manually:

uv run python generate_video.py \
    --prompt "Your prompt" \
    --model-path /path/to/dit.safetensors \
    --vae-path /path/to/vae.safetensors \
    --t5-encoder-path /path/to/t5/text_encoder \
    --t5-tokenizer-path /path/to/t5/tokenizer

Hardware Requirements

Resolution VRAM Required
360p ~25GB
720p ~35GB

Recommended GPUs: H100, A100-80GB, or similar high-VRAM GPUs

Speed Benchmarks (H100, 50 steps)

Resolution Duration Generation Time
360p 2 seconds ~40 seconds
360p 5 seconds ~2 minutes
720p 2 seconds ~4 minutes
720p 5 seconds ~15 minutes

Model Architecture

Linum V2 uses a Diffusion Transformer (DiT) architecture with:

  • DiT Backbone: 2B parameters, trained from scratch with flow matching objective
  • Text Encoder: T5-XXL
  • VAE: WAN 2.1 VAE

Model Weights

Weights are hosted on HuggingFace Hub:

Citation

@software{linum_v2_2026,
  title = {Linum V2: Text-to-Video Generation},
  author = {Linum AI},
  year = {2026},
  url = {https://github.com/Linum-AI/linum-v2}
}

License

This project is licensed under the Apache License 2.0 - see the LICENSE file.

About Linum

Linum is a team of two brothers building a tiny-yet-powerful AI research lab. We train our own generative media models from scratch.

Subscribe to Field Notes — technical deep dives on building generative video models from the ground up, plus updates on new releases from Linum.

Contact: hello@linum.ai — Reach out if you're selling high-quality video data.

Acknowledgments

This project uses the following components under the Apache 2.0 License:

Thank you to our investors and infrastructure partners:

Y Combinator        Adverb Ventures



Crusoe        Together AI        Cloudflare        Ubicloud

About

Linum v2 (text-to-video) models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages