Skip to content

mova with sglang not working #39

@elvizlai

Description

@elvizlai
mkdir mova-sglang
cd mova-sglang

uv init --python 3.12
uv sync

# sglang==0.5.9
uv add sglang

# should also adding missing modules
uv add remote-pdb imageio diffusers addict cache-dit

# download model if not exist
# hf download OpenMOSS-Team/MOVA-720p --local-dir MOVA-720p

# nvcc
export PATH=/usr/local/cuda/bin:$PATH

# specify GPU
export CUDA_VISIBLE_DEVICES=1

uv run sglang generate \
  --model-path MOVA-720p \
  --prompt "A man in a blue blazer and glasses speaks in a formal indoor setting, \
  framed by wooden furniture and a filled bookshelf. \
  Quiet room acoustics underscore his measured tone as he delivers his remarks. \
  At one point, he says, \"I would also believe that this advance in AI recently wasn’t unexpected.\"" \
  --image-path "./assets/single_person.jpg" \
  --save-output
uv run sglang generate \
  --model-path /workspace/mova-gen/MOVA-720p \
  --prompt "A man in a blue blazer and glasses speaks in a formal indoor setting, \
  framed by wooden furniture and a filled bookshelf. \
  Quiet room acoustics underscore his measured tone as he delivers his remarks. \
  At one point, he says, \"I would also believe that this advance in AI recently wasn’t unexpected.\"" \
  --image-path "./assets/single_person.jpg" \
  --save-output

lyz@a100:/workspace/mova-sglang$ uv run sglang generate   --model-path /workspace/mova-gen/MOVA-720p   --prompt "A man in a blue blazer and glasses speaks in a formal indoor setting, \
  framed by wooden furniture and a filled bookshelf. \
  Quiet room acoustics underscore his measured tone as he delivers his remarks. \
  At one point, he says, \"I would also believe that this advance in AI recently wasn’t unexpected.\""   --image-path "./assets/single_person.jpg"   --save-output
[02-25 14:16:21] Automatically enable dit_layerwise_offload for MOVA720PConfig for low memory and performance balance
[02-25 14:16:21] server_args: {"model_path": "/workspace/mova-gen/MOVA-720p", "backend": "auto", "attention_backend": null, "attention_backend_config": {}, "cache_dit_config": null, "nccl_port": null, "trust_remote_code": false, "revision": null, "num_gpus": 1, "tp_size": 1, "sp_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "dp_size": 1, "dp_degree": 1, "enable_cfg_parallel": false, "hsdp_replicate_dim": 1, "hsdp_shard_dim": 1, "dist_timeout": 3600, "pipeline_class_name": null, "lora_path": null, "lora_nickname": "default", "lora_scale": 1.0, "vae_path": null, "lora_target_modules": null, "dit_cpu_offload": true, "dit_layerwise_offload": true, "dit_offload_prefetch_size": 0.0, "text_encoder_cpu_offload": true, "image_encoder_cpu_offload": true, "vae_cpu_offload": true, "use_fsdp_inference": false, "pin_cpu_memory": true, "comfyui_mode": false, "enable_torch_compile": false, "warmup": false, "warmup_resolutions": null, "disable_autocast": false, "master_port": 30092, "host": "127.0.0.1", "port": 30000, "webui": false, "webui_port": 12312, "scheduler_port": 5627, "output_path": "outputs/", "prompt_file_path": null, "model_paths": {}, "model_loaded": {"transformer": true, "vae": true, "video_vae": true, "audio_vae": true, "video_dit": true, "audio_dit": true, "dual_tower_bridge": true}, "boundary_ratio": null, "log_level": "info"}
[02-25 14:16:21] Local mode: True
[02-25 14:16:21] Starting server...
[02-25 14:16:28] Scheduler bind at endpoint: tcp://127.0.0.1:5627
[02-25 14:16:28] Initializing distributed environment with world_size=1, device=cuda:0, timeout=3600
[02-25 14:16:28] Setting distributed timeout to 3600 seconds
[02-25 14:16:29] No pipeline_class_name specified, using model_index.json
[02-25 14:16:29] Diffusers version: 0.36.0
[02-25 14:16:29] Diffusers version: 0.36.0
[02-25 14:16:29] Using native sglang backend for model '/workspace/mova-gen/MOVA-720p'
[02-25 14:16:29] Found model info: ModelInfo(pipeline_cls=<class 'sglang.multimodal_gen.runtime.pipelines.mova_pipeline.MOVAPipeline'>, sampling_param_cls=<class 'sglang.multimodal_gen.configs.sample.mova.MOVA_720P_SamplingParams'>, pipeline_config_cls=<class 'sglang.multimodal_gen.configs.pipeline_configs.mova.MOVA720PConfig'>)
[02-25 14:16:29] Using pipeline from model_index.json: MOVAPipeline
[02-25 14:16:29] Loading pipeline modules...
[02-25 14:16:29] Model already exists locally and is complete
[02-25 14:16:29] Model path: /workspace/mova-gen/MOVA-720p
[02-25 14:16:29] Diffusers version: 0.36.0
[02-25 14:16:29] Loading pipeline modules from config: {'_class_name': 'MOVA', '_diffusers_version': '0.36.0', '_name_or_path': 'checkpoints/MOVA-720p', 'audio_dit': ['mova.diffusion.models.wan_audio_dit', 'WanAudioModel'], 'audio_vae': ['mova.diffusion.models.dac_vae', 'DAC'], 'audio_vae_type': 'dac', 'boundary_ratio': 0.9, 'dual_tower_bridge': ['mova.diffusion.models.interactionv2', 'DualTowerConditionalBridge'], 'scheduler': ['mova.diffusion.schedulers.flow_match_pair', 'FlowMatchPairScheduler'], 'text_encoder': ['transformers', 'UMT5EncoderModel'], 'tokenizer': ['transformers', 'T5TokenizerFast'], 'video_dit': ['mova.diffusion.models.wan_video_dit', 'WanModel'], 'video_dit_2': ['mova.diffusion.models.wan_video_dit', 'WanModel'], 'video_vae': ['diffusers', 'AutoencoderKLWan']}
[02-25 14:16:29] Boundary ratio found in model_index.json without transformers; using it for pipeline config only.
[02-25 14:16:29] Setting boundary ratio to 0.9
[02-25 14:16:29] Loading required components: ['video_vae', 'audio_vae', 'text_encoder', 'tokenizer', 'scheduler', 'video_dit', 'video_dit_2', 'audio_dit', 'dual_tower_bridge']
Loading required modules:   0%|                                                                                                                        | 0/9 [00:00<?, ?it/s][02-25 14:16:29] Loading video_vae from /workspace/mova-gen/MOVA-720p/video_vae. avail mem: 74.13 GB
[02-25 14:16:29] Loaded video_vae: AutoencoderKLWan (sgl-diffusion version). consumed: 0.00 GB, avail mem: 74.13 GB
Loading required modules:  11%|████████████▍                                                                                                   | 1/9 [00:00<00:01,  6.78it/s][02-25 14:16:29] Loading audio_vae from /workspace/mova-gen/MOVA-720p/audio_vae. avail mem: 74.13 GB
[02-25 14:16:31] Loaded audio_vae: DAC (sgl-diffusion version). consumed: 0.00 GB, avail mem: 74.13 GB
Loading required modules:  22%|████████████████████████▉                                                                                       | 2/9 [00:02<00:10,  1.53s/it][02-25 14:16:31] Loading text_encoder from /workspace/mova-gen/MOVA-720p/text_encoder. avail mem: 74.13 GB

Loading safetensors checkpoint shards:   0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  33% Completed | 1/3 [00:01<00:02,  1.06s/it]
Loading safetensors checkpoint shards:  67% Completed | 2/3 [00:01<00:00,  1.37it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:02<00:00,  1.05it/s]

[02-25 14:17:05] Loaded text_encoder: FSDPUMT5EncoderModel (sgl-diffusion version). consumed: 0.82 GB, avail mem: 73.32 GB
Loading required modules:  33%|█████████████████████████████████████▎                                                                          | 3/9 [00:35<01:36, 16.03s/it][02-25 14:17:05] Loading tokenizer from /workspace/mova-gen/MOVA-720p/tokenizer. avail mem: 73.32 GB
[02-25 14:17:05] Loaded tokenizer: T5TokenizerFast (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
Loading required modules:  44%|█████████████████████████████████████████████████▊                                                              | 4/9 [00:36<00:49,  9.87s/it][02-25 14:17:05] Loading scheduler from /workspace/mova-gen/MOVA-720p/scheduler. avail mem: 73.32 GB
[02-25 14:17:05] Loaded scheduler: FlowMatchPairScheduler (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
[02-25 14:17:05] Loading video_dit from /workspace/mova-gen/MOVA-720p/video_dit. avail mem: 73.32 GB
[02-25 14:17:05] Loading WanModel from 3 safetensors files , param_dtype: torch.bfloat16
[02-25 14:17:06] Using FlashAttention (FA3 for hopper, FA4 for blackwell) backend

Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:00<00:00, 32.93it/s]

[02-25 14:17:29] Loaded model with 14.29B parameters
[02-25 14:17:29] Loaded video_dit: WanModel (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
Loading required modules:  67%|██████████████████████████████████████████████████████████████████████████▋                                     | 6/9 [01:00<00:33, 11.04s/it][02-25 14:17:29] Loading video_dit_2 from /workspace/mova-gen/MOVA-720p/video_dit_2. avail mem: 73.32 GB
[02-25 14:17:29] Loading WanModel from 3 safetensors files , param_dtype: torch.bfloat16

Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:00<00:00, 95.22it/s]

[02-25 14:17:47] Loaded model with 14.29B parameters
[02-25 14:17:47] Loaded video_dit_2: WanModel (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
Loading required modules:  78%|███████████████████████████████████████████████████████████████████████████████████████                         | 7/9 [01:17<00:25, 12.75s/it][02-25 14:17:47] Loading audio_dit from /workspace/mova-gen/MOVA-720p/audio_dit. avail mem: 73.32 GB
[02-25 14:17:47] Loading WanAudioModel from 1 safetensors files , param_dtype: torch.bfloat16

Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 41.16it/s]

[02-25 14:17:49] Loaded model with 1.42B parameters
[02-25 14:17:49] Loaded audio_dit: WanAudioModel (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
Loading required modules:  89%|███████████████████████████████████████████████████████████████████████████████████████████████████▌            | 8/9 [01:19<00:09,  9.77s/it][02-25 14:17:49] Loading dual_tower_bridge from /workspace/mova-gen/MOVA-720p/dual_tower_bridge. avail mem: 73.32 GB
[02-25 14:17:49] Loading DualTowerConditionalBridge from 1 safetensors files, default_dtype: torch.bfloat16

Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 58.85it/s]

[02-25 14:17:52] Loaded bridge model with 2659.74M parameters
[02-25 14:17:52] Loaded dual_tower_bridge: DualTowerConditionalBridge (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
Loading required modules: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [01:23<00:00,  9.26s/it]
[02-25 14:17:52] Creating pipeline stages...
[02-25 14:17:52] Pipeline instantiated
[02-25 14:18:10] LayerwiseOffloadManager initialized with num prefetched layer: 1, total num layers: 40
[02-25 14:18:10] Enabled layerwise offload for WanModel on modules: ['blocks']
[02-25 14:18:25] LayerwiseOffloadManager initialized with num prefetched layer: 1, total num layers: 40
[02-25 14:18:25] Enabled layerwise offload for WanModel on modules: ['blocks']
[02-25 14:18:26] LayerwiseOffloadManager initialized with num prefetched layer: 1, total num layers: 30
[02-25 14:18:26] Enabled layerwise offload for WanAudioModel on modules: ['blocks']
[02-25 14:18:26] Worker 0: Initialized device, model, and distributed environment.
[02-25 14:18:26] Worker 0: Scheduler loop started.
[02-25 14:18:26] Diffusers version: 0.36.0
[02-25 14:18:26] Diffusers version: 0.36.0
[02-25 14:18:26] Using native sglang backend for model '/workspace/mova-gen/MOVA-720p'
[02-25 14:18:26] Found model info: ModelInfo(pipeline_cls=<class 'sglang.multimodal_gen.runtime.pipelines.mova_pipeline.MOVAPipeline'>, sampling_param_cls=<class 'sglang.multimodal_gen.configs.sample.mova.MOVA_720P_SamplingParams'>, pipeline_config_cls=<class 'sglang.multimodal_gen.configs.pipeline_configs.mova.MOVA720PConfig'>)
[02-25 14:18:26] Processing prompt 1/1: A man in a blue blazer and glasses speaks in a formal indoor setting,   framed by wooden furniture a
[02-25 14:18:26] Running pipeline stages: ['input_validation_stage', 'text_encoding_stage', 'image_v_a_e_encoding_stage', 'mova_latent_preparation_stage', 'mova_timestep_preparation_stage', 'mova_denoising_stage', 'mova_decoding_stage']
[02-25 14:18:26] [InputValidationStage] started...
[02-25 14:18:26] Center cropping and resizing image to 1280x720
[02-25 14:18:26] [InputValidationStage] finished in 0.0293 seconds
[02-25 14:18:26] [TextEncodingStage] started...
[02-25 14:18:30] [TextEncodingStage] finished in 4.0646 seconds
[02-25 14:18:30] [ImageVAEEncodingStage] started...
[02-25 14:18:50] [ImageVAEEncodingStage] finished in 19.5377 seconds
[02-25 14:18:50] [MOVALatentPreparationStage] started...
[02-25 14:18:50] [MOVALatentPreparationStage] finished in 0.0033 seconds
[02-25 14:18:50] [MOVATimestepPreparationStage] started...
[02-25 14:18:50] [MOVATimestepPreparationStage] finished in 0.0012 seconds
[02-25 14:18:50] [MOVADenoisingStage] started...
  0%|                                                                                                                                                 | 0/50 [00:00<?, ?it/s]
[02-25 14:18:50] [MOVADenoisingStage] Error during execution after 755.2026 ms: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
Traceback (most recent call last):
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in __call__
    result = self.forward(batch, server_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 480, in forward
    pos = self._predict(
          ^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 184, in _predict
    return self.inference_single_step(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 738, in inference_single_step
    visual_x, audio_x = self.forward_dual_tower_dit(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 833, in forward_dual_tower_dit
    visual_x, audio_x = self.dual_tower_bridge(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 638, in forward
    visual_conditioned = self.apply_conditional_control(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 602, in apply_conditional_control
    conditioned_features = conditioner(
                           ^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 393, in forward
    y = self.y_norm(y)
        ^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/normalization.py", line 229, in forward
    return F.layer_norm(
           ^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/functional.py", line 2901, in layer_norm
    return torch.layer_norm(
           ^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
[02-25 14:18:50] Error executing request e385bc70-168d-4ac7-ab6f-0b0e7c2a93fd: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
Traceback (most recent call last):
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/managers/gpu_worker.py", line 227, in execute_forward
    result = self.pipeline.forward(req, self.server_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/composed_pipeline_base.py", line 622, in forward
    return self.executor.execute_with_profiling(self.stages, batch, server_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/executors/pipeline_executor.py", line 57, in execute_with_profiling
    batch = self.execute(stages, batch, server_args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 101, in execute
    batch = self._execute(stages, batch, server_args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 92, in _execute
    batch = stage(batch, server_args)
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in __call__
    result = self.forward(batch, server_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 480, in forward
    pos = self._predict(
          ^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 184, in _predict
    return self.inference_single_step(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 738, in inference_single_step
    visual_x, audio_x = self.forward_dual_tower_dit(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 833, in forward_dual_tower_dit
    visual_x, audio_x = self.dual_tower_bridge(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 638, in forward
    visual_conditioned = self.apply_conditional_control(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 602, in apply_conditional_control
    conditioned_features = conditioner(
                           ^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 393, in forward
    y = self.y_norm(y)
        ^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/normalization.py", line 229, in forward
    return F.layer_norm(
           ^^^^^^^^^^^^^
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/functional.py", line 2901, in layer_norm
    return torch.layer_norm(
           ^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
[02-25 14:18:50] Failed to generate output for prompt 1: Error executing request e385bc70-168d-4ac7-ab6f-0b0e7c2a93fd: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
Traceback (most recent call last):
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/utils/logging_utils.py", line 466, in log_generation_timer
    yield timer
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/entrypoints/diffusion_generator.py", line 192, in generate
    raise Exception(f"{output_batch.error}")
Exception: Error executing request e385bc70-168d-4ac7-ab6f-0b0e7c2a93fd: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
[02-25 14:18:50] Generation failed for prompt 1/1: Error executing request e385bc70-168d-4ac7-ab6f-0b0e7c2a93fd: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
Traceback (most recent call last):
  File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/entrypoints/diffusion_generator.py", line 192, in generate
    raise Exception(f"{output_batch.error}")
Exception: Error executing request e385bc70-168d-4ac7-ab6f-0b0e7c2a93fd: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
[02-25 14:18:50] Completed batch processing. Generated 0 outputs in 24.42 seconds
[02-25 14:18:50] Generator was garbage collected without being shut down. Attempting to shut down the local server and client.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions