Qualitative results of video generation using Wan-Alpha-v2.0. Our model successfully generates various scenes with accurate and clearly rendered transparency. Notably, it can synthesize diverse semi-transparent objects, glowing effects, and fine-grained details such as hair.
- [2026.03.17] Released checkpoints for Wan-Alpha v1.0 and v2.0 VAE.
- [2026.03.10] Released Wan-Alpha v2.0 and Wan-Alpha VAE training codes and training datasets.
- [2026.02.21] π Wan-Alpha v2.0 has been accepted by CVPR 2026!
- [2025.12.16] Released Wan-Alpha v2.0, the Wan2.1-14B-T2Vβadapted weights and inference code are now open-sourced.
- [2025.12.16] We update our paper on arXiv.
- [2025.09.30] Our technical report is available on arXiv.
- [2025.09.30] Released Wan-Alpha v1.0, the Wan2.1-14B-T2Vβadapted weights and inference code are now open-sourced.
- Paper: Available on arXiv.
- Inference Code: Released inference pipeline for Wan-Alpha v1.0 and v2.0.
- Model Weights: Released checkpoints for Wan-Alpha v1.0 and v2.0.
- Dataset: Open-source the VAE and T2V training dataset.
- Training Code (VAE&T2V): Release training scripts for the VAE and text-to-RGBA video generation.
- VAE Checkpoints: Release checkpoints for Wan-Alpha v1.0 and v2.0 VAE.
- Image-to-Video: Release Wan-Alpha-I2V model weights.
| Prompt | Preview Video | Alpha Video |
|---|---|---|
| "The background of this video is transparent. It features a beige, woven rattan hanging chair with soft seat and back cushions. Realistic style. Medium shot." | ![]() |
![]() |
For more results, please visit Our Website
# Clone the project repository
git clone https://github.com/WeChatCV/Wan-Alpha.git
cd Wan-Alpha
# Create and activate Conda environment
conda create -n Wan-Alpha python=3.11 -y
conda activate Wan-Alpha
# Install dependencies
pip install -r requirements.txtDownload Wan2.1-T2V-14B
Download Lightx2v-T2V-14B
Download Wan-Alpha-v1.0, Wan-Alpha-v2.0
Download Wan-Alpha-v1.0-VAE, Wan-Alpha-v2.0-VAE
You can test our model through:
torchrun --nproc_per_node=8 --master_port=29501 generate_dora_lightx2v_mask.py --size 832*480\
--ckpt_dir "path/to/your/Wan-2.1/Wan2.1-T2V-14B" \
--dit_fsdp --t5_fsdp --ulysses_size 8 \
--vae_lora_checkpoint "path/to/your/decoder.bin" \
--lora_path "path/to/your/t2v.safetensors" \
--lightx2v_path "path/to/your/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors" \
--sample_guide_scale 1.0 \
--frame_num 81 \
--sample_steps 4 \
--lora_ratio 1.0 \
--lora_prefix "" \
--alpha_shift_mean 0.05 \
--cache_path_mask "path/to/your/gauss_mask" \
--prompt_file ./data/prompt.txt \
--output_dir ./output You can specify the weights of Wan2.1-T2V-14B with --ckpt_dir, LightX2V-T2V-14B with --lightx2v_path, Wan-Alpha-VAE with --vae_lora_checkpoint, and Wan-Alpha-T2V with --lora_path. Finally, you can find the rendered RGBA videos with a checkerboard background and PNG frames at --output_dir.
We provide an example of Gaussian mask. You can also use gen_gaussian_mask.py to generate a Gaussian mask from an existing alpha video. Alternatively, you can directly create a Gaussian ellipse video, which can be either static or dynamic (e.g., moving from left to right). Note that alpha_shift_mean is a fixed parameter.
Prompt Writing Tip: You need to specify that the background of the video is transparent, the visual style, the shot type (such as close-up, medium shot, wide shot, or extreme close-up), and a description of the main subject. Prompts support both Chinese and English input.
# An example of prompt.
This video has a transparent background. Close-up shot. A colorful parrot flying. Realistic style.VAE
For the VAE training dataset, please refer to Section 3.3 of our paper for preparation details.
You can train our VAE model through
save_path="./checkpoints"
mkdir -p $save_path
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 train_vae.py \
--train_dataset_path data.csv \
--val_dataset_path val.csv \
--output_path $save_path \
--vae_path "Wan-2.1/Wan2.1-T2V-14B/Wan2.1_VAE.pth" \
--num_frames 17 \
--height 272 \
--width 272 \
--training_strategy deepspeed_stage_2 \
--max_epochs 100 \
--learning_rate 1e-4 \
--lora_rank 128 \
--batch_size 2 \
--model_id_with_origin_paths "Wan2.1-T2V-1.3B:diffusion_pytorch_model*.safetensors" \
--job_name "prompt" \
--use_gradient_checkpointing_offload \
--dataloader_num_workers 8 2>&1 | tee -a ${save_path}/train.logBefore VAE training, you need to cache empty text ("") for VAE training.
T2V
You can download the training dataset from Google Drive or Hugging Face.
You can train our T2V model through
job_name="12"
output_path="./checkpoints"
mkdir -p ${output_path}
# cache data
for id in {0..7}; do
CUDA_VISIBLE_DEVICES=$id accelerate launch \
--mixed_precision='bf16' \
--num_processes=1 \
--num_machines=$WORLD_SIZE \
--machine_rank=$RANK \
--main_process_ip=$MASTER_ADDR \
--main_process_port=$MASTER_PORT \
examples/wanvideo/model_training/prepare_trans_alpha_mask.py \
--dataset_metadata_path data.csv \
--height 640 \
--width 624 \
--data_file_keys video_fgr,video_pha \
--dataset_repeat 1 \
--model_id_with_origin_paths "Wan2.1-T2V-14B:models_t5_umt5-xxl-enc-bf16.pth" \
--learning_rate 1e-4 \
--num_epochs 1 \
--remove_prefix_in_ckpt "pipe.t2v." \
--output_path "" \
--lora_base_model "dit" \
--lora_target_modules "q,k,v,o,ffn.0,ffn.2" \
--lora_rank 32 \
--job_name $job_name \
--job_id $id \
--job_num 8 \
--new_vae_path "VAE/pytorch_model.bin" \
--use_gradient_checkpointing_offload 2>&1 | tee -a ${output_path}/train_prepare.log &
done
trap 'kill 0' SIGINT
wait
# train
accelerate launch \
--mixed_precision='bf16' \
--num_processes=$RANK_NUM \
--num_machines=$WORLD_SIZE \
--machine_rank=$RANK \
--main_process_ip=$MASTER_ADDR \
--main_process_port=$MASTER_PORT \
--config_file $ACCELERATE_CONFIG_FILE \
examples/wanvideo/model_training/train_gauss_ellipse.py \
--dataset_metadata_path data.csv \
--height 640 \
--width 624 \
--dataset_repeat 1 \
--model_id_with_origin_paths "Wan2.1-T2V-14B:diffusion_pytorch_model*.safetensors" \
--learning_rate 1e-4 \
--num_epochs 20 \
--remove_prefix_in_ckpt "pipe.dit." \
--output_path $output_path \
--lora_base_model "dit" \
--lora_target_modules "q,k,v,o,ffn.0,ffn.2" \
--lora_rank 32 \
--initial_learnable_value 0.05 \
--job_name $job_name \
--use_gradient_checkpointing_offload 2>&1 | tee -a ${output_path}/train.logWe recommend caching the processed data before starting the training process to improve efficiency.
Coming soon...
This project is built upon the following excellent open-source projects:
- DiffSynth-Studio (training/inference framework)
- Wan2.1 (base video generation model)
- LightX2V (inference acceleration)
- WanVideo_comfy (inference acceleration)
We sincerely thank the authors and contributors of these projects.
If you find our work helpful for your research, please consider citing our paper:
@misc{dong2025wanalpha,
title={Video Generation with Stable Transparency via Shiftable RGB-A Distribution Learner},
author={Haotian Dong and Wenjing Wang and Chen Li and Jing Lyu and Di Lin},
year={2025},
eprint={2509.24979},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.24979},
}If you have any questions or suggestions, feel free to reach out via GitHub Issues . We look forward to your feedback!


