Hyeongju Mun* · In-Hwan Jin* · Sohyeong Kim · Kyeongbo Kong†
*Equal contribution. †Corresponding author.
For the installation to be done correctly, please proceed only with CUDA-compatible GPU available. It requires 48GB GPU memory to run.
Clone the repo and create the environment:
git clone https://github.com/<YOUR_GITHUB_USERNAME>/LivingWorld.git
cd LivingWorld
conda create --name livingworld python=3.10
conda activate livingworldInstall PyTorch and CUDA-dependent rendering modules. We tested with torch==2.7.1 and CUDA 12.8. Other CUDA versions may work, but the PyTorch/CUDA versions should match your system.
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
conda install -c conda-forge cmake ninja git
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
pip install submodules/depth-diff-gaussian-rasterization-min/
pip install submodules/simple-knn/
pip install "git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch"Install the rest of the requirements:
pip install -r requirements.txt
cd RepViT/sam
pip install -e .
cd ../..
python -m spacy download en_core_web_smExport your OpenAI API key if you want to use GPT-4 to generate scene descriptions:
export OPENAI_API_KEY='your_api_key_here'Export your Hugging Face token to download gated or authenticated model weights:
export HF_TOKEN='your_huggingface_token_here'Download the RepViT-SAM checkpoint and put it in the root directory:
wget https://github.com/THU-MIG/RepViT/releases/download/v1.0/repvit_sam.ptIf you use the bundled MoGe depth/normal model and SAM3 modules, their weights are downloaded automatically from Hugging Face on first use.
-
Example config file
To run an example, first you need to write a config. An example config
./config/example.yamlis shown below (more examples are located atconfig/more_examples, feel free to try):runs_dir: output/como example_name: como seed: 1 # enable guided depth diffusion depth_conditioning: True # use gpt to generate scene description use_gpt: False debug: True # depth model and camera/depth parameters depth_model: moge camera_speed: 0.001 fg_depth_range: 0.015 depth_shift: 0.001 sky_hard_depth: 0.02 init_focal_length: 960 # re-generate sky panorama images gen_sky_image: False # generate sky point cloud gen_sky: False # enable layer-wise generation gen_layer: True # load previously generated gaussians load_gen: False
-
Run
Use the
splat/viewer included in this repository and opensplat/index_stream.htmlon your local laptop.To enable interactive visualization through the local browser, follow these steps:
- Ensure you have
'ssh'installed on your local machine. - The main program will run on
user_id@server_name. - The socket port in
splat/main_stream.jsmust match the--portargument used to runrun.py. For example, ifmain_stream.jsconnects tohttp://127.0.0.1:7778, run the server with--port 7778and forward port7778.
# On your local machine ssh -L 7778:localhost:7778 server_nameOn the server, run the main program:
# On user_id@server_name CUDA_VISIBLE_DEVICES=0 python run.py \ --example_config config/more_examples/venice.yaml \ --port 7778 \ --input_dir input/mhj/veniceMore examples are located at
config/more_examples, feel free to try!Open
index_stream.htmlon your local machine to visualize and navigate the generated scene. This spatial expansion interface follows the WonderWorld-style interactive workflow, while LivingWorld extends the pipeline toward 4D world generation with environmental dynamics.- If
use_gpt=True, the next-scene prompt is generated automatically. Otherwise, enter the desired scene description in the browser interface. - Navigate to a target viewpoint and press
Rto spatially expand the scene from the selected view. - If the result is unsatisfactory, press
Zto undo the latest spatial generation and try another prompt or viewpoint. - Repeat the process to build a connected spatial scene.
- Press
Xto save the current scene. The saved Gaussian scene and motion model are stored under--input_dir/model, and can be loaded later by settingload_gen=True.
This is the main LivingWorld step for adding environmental dynamics. After pressing
Rduring spatial generation, the selected view is shown as a paused image for motion annotation.- Enter a text prompt for the region where motion should be applied.
- Specify the motion direction by placing paired points on the image. Each pair represents a start point and an end point, forming a direction arrow. Adding multiple point pairs helps define a more stable and detailed motion field.
- After the motion is applied, adjust the motion magnitude in the GUI until the dynamics match the desired strength.
- Ensure you have
We highly encourage you to add new images and try new stuff! You would need to prepare the image-caption pairing separately. For real-world examples, you can collect images from sources such as Pexels or Unsplash and write the description manually or generate it with a vision-language model.
-
Add a new image in
./examples/images/. -
Add content of this new image in
./examples/examples.yaml.Here is an example:
- name: new_example image_filepath: examples/images/new_example.png style_prompt: DSLR 35mm landscape content_prompt: scene name, object 1, object 2, object 3 negative_prompt: '' background: ''
-
content_prompt: "scene name", "object 1", "object 2", "object 3"
-
negative_prompt and background are optional
-
-
Write a config
config/new_example.yamllike./config/example.yamlfor the new example. -
Run the program following the previous section. (For the first time use, the model will automatically generate the panorama sky images for the example, which takes about 20 minutes on A6000 GPU. After the corresponding sky images for the example are stored, later use of this example will automatically skip this step)
@article{mun2026livingworld,
title={LivingWorld: Interactive 4D World Generation with Environmental Dynamics},
author={Mun, Hyeongju and Jin, In-Hwan and Kim, Sohyeong and Kong, Kyeongbo},
journal={arXiv preprint arXiv:2604.01641},
year={2026}
}
Big thanks to the authors of WonderWorld. This codebase is built upon WonderWorld, which provides the baseline framework for interactive 3D scene generation from a single image.
We appreciate the authors of WonderWorld, Marigold, SyncDiffusion, RepViT, Stable Diffusion, and OneFormer for sharing their code and models.
