-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Description
mkdir mova-sglang
cd mova-sglang
uv init --python 3.12
uv sync
# sglang==0.5.9
uv add sglang
# should also adding missing modules
uv add remote-pdb imageio diffusers addict cache-dit
# download model if not exist
# hf download OpenMOSS-Team/MOVA-720p --local-dir MOVA-720p
# nvcc
export PATH=/usr/local/cuda/bin:$PATH
# specify GPU
export CUDA_VISIBLE_DEVICES=1
uv run sglang generate \
--model-path MOVA-720p \
--prompt "A man in a blue blazer and glasses speaks in a formal indoor setting, \
framed by wooden furniture and a filled bookshelf. \
Quiet room acoustics underscore his measured tone as he delivers his remarks. \
At one point, he says, \"I would also believe that this advance in AI recently wasn’t unexpected.\"" \
--image-path "./assets/single_person.jpg" \
--save-output
uv run sglang generate \
--model-path /workspace/mova-gen/MOVA-720p \
--prompt "A man in a blue blazer and glasses speaks in a formal indoor setting, \
framed by wooden furniture and a filled bookshelf. \
Quiet room acoustics underscore his measured tone as he delivers his remarks. \
At one point, he says, \"I would also believe that this advance in AI recently wasn’t unexpected.\"" \
--image-path "./assets/single_person.jpg" \
--save-output
lyz@a100:/workspace/mova-sglang$ uv run sglang generate --model-path /workspace/mova-gen/MOVA-720p --prompt "A man in a blue blazer and glasses speaks in a formal indoor setting, \
framed by wooden furniture and a filled bookshelf. \
Quiet room acoustics underscore his measured tone as he delivers his remarks. \
At one point, he says, \"I would also believe that this advance in AI recently wasn’t unexpected.\"" --image-path "./assets/single_person.jpg" --save-output
[02-25 14:16:21] Automatically enable dit_layerwise_offload for MOVA720PConfig for low memory and performance balance
[02-25 14:16:21] server_args: {"model_path": "/workspace/mova-gen/MOVA-720p", "backend": "auto", "attention_backend": null, "attention_backend_config": {}, "cache_dit_config": null, "nccl_port": null, "trust_remote_code": false, "revision": null, "num_gpus": 1, "tp_size": 1, "sp_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "dp_size": 1, "dp_degree": 1, "enable_cfg_parallel": false, "hsdp_replicate_dim": 1, "hsdp_shard_dim": 1, "dist_timeout": 3600, "pipeline_class_name": null, "lora_path": null, "lora_nickname": "default", "lora_scale": 1.0, "vae_path": null, "lora_target_modules": null, "dit_cpu_offload": true, "dit_layerwise_offload": true, "dit_offload_prefetch_size": 0.0, "text_encoder_cpu_offload": true, "image_encoder_cpu_offload": true, "vae_cpu_offload": true, "use_fsdp_inference": false, "pin_cpu_memory": true, "comfyui_mode": false, "enable_torch_compile": false, "warmup": false, "warmup_resolutions": null, "disable_autocast": false, "master_port": 30092, "host": "127.0.0.1", "port": 30000, "webui": false, "webui_port": 12312, "scheduler_port": 5627, "output_path": "outputs/", "prompt_file_path": null, "model_paths": {}, "model_loaded": {"transformer": true, "vae": true, "video_vae": true, "audio_vae": true, "video_dit": true, "audio_dit": true, "dual_tower_bridge": true}, "boundary_ratio": null, "log_level": "info"}
[02-25 14:16:21] Local mode: True
[02-25 14:16:21] Starting server...
[02-25 14:16:28] Scheduler bind at endpoint: tcp://127.0.0.1:5627
[02-25 14:16:28] Initializing distributed environment with world_size=1, device=cuda:0, timeout=3600
[02-25 14:16:28] Setting distributed timeout to 3600 seconds
[02-25 14:16:29] No pipeline_class_name specified, using model_index.json
[02-25 14:16:29] Diffusers version: 0.36.0
[02-25 14:16:29] Diffusers version: 0.36.0
[02-25 14:16:29] Using native sglang backend for model '/workspace/mova-gen/MOVA-720p'
[02-25 14:16:29] Found model info: ModelInfo(pipeline_cls=<class 'sglang.multimodal_gen.runtime.pipelines.mova_pipeline.MOVAPipeline'>, sampling_param_cls=<class 'sglang.multimodal_gen.configs.sample.mova.MOVA_720P_SamplingParams'>, pipeline_config_cls=<class 'sglang.multimodal_gen.configs.pipeline_configs.mova.MOVA720PConfig'>)
[02-25 14:16:29] Using pipeline from model_index.json: MOVAPipeline
[02-25 14:16:29] Loading pipeline modules...
[02-25 14:16:29] Model already exists locally and is complete
[02-25 14:16:29] Model path: /workspace/mova-gen/MOVA-720p
[02-25 14:16:29] Diffusers version: 0.36.0
[02-25 14:16:29] Loading pipeline modules from config: {'_class_name': 'MOVA', '_diffusers_version': '0.36.0', '_name_or_path': 'checkpoints/MOVA-720p', 'audio_dit': ['mova.diffusion.models.wan_audio_dit', 'WanAudioModel'], 'audio_vae': ['mova.diffusion.models.dac_vae', 'DAC'], 'audio_vae_type': 'dac', 'boundary_ratio': 0.9, 'dual_tower_bridge': ['mova.diffusion.models.interactionv2', 'DualTowerConditionalBridge'], 'scheduler': ['mova.diffusion.schedulers.flow_match_pair', 'FlowMatchPairScheduler'], 'text_encoder': ['transformers', 'UMT5EncoderModel'], 'tokenizer': ['transformers', 'T5TokenizerFast'], 'video_dit': ['mova.diffusion.models.wan_video_dit', 'WanModel'], 'video_dit_2': ['mova.diffusion.models.wan_video_dit', 'WanModel'], 'video_vae': ['diffusers', 'AutoencoderKLWan']}
[02-25 14:16:29] Boundary ratio found in model_index.json without transformers; using it for pipeline config only.
[02-25 14:16:29] Setting boundary ratio to 0.9
[02-25 14:16:29] Loading required components: ['video_vae', 'audio_vae', 'text_encoder', 'tokenizer', 'scheduler', 'video_dit', 'video_dit_2', 'audio_dit', 'dual_tower_bridge']
Loading required modules: 0%| | 0/9 [00:00<?, ?it/s][02-25 14:16:29] Loading video_vae from /workspace/mova-gen/MOVA-720p/video_vae. avail mem: 74.13 GB
[02-25 14:16:29] Loaded video_vae: AutoencoderKLWan (sgl-diffusion version). consumed: 0.00 GB, avail mem: 74.13 GB
Loading required modules: 11%|████████████▍ | 1/9 [00:00<00:01, 6.78it/s][02-25 14:16:29] Loading audio_vae from /workspace/mova-gen/MOVA-720p/audio_vae. avail mem: 74.13 GB
[02-25 14:16:31] Loaded audio_vae: DAC (sgl-diffusion version). consumed: 0.00 GB, avail mem: 74.13 GB
Loading required modules: 22%|████████████████████████▉ | 2/9 [00:02<00:10, 1.53s/it][02-25 14:16:31] Loading text_encoder from /workspace/mova-gen/MOVA-720p/text_encoder. avail mem: 74.13 GB
Loading safetensors checkpoint shards: 0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 33% Completed | 1/3 [00:01<00:02, 1.06s/it]
Loading safetensors checkpoint shards: 67% Completed | 2/3 [00:01<00:00, 1.37it/s]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:02<00:00, 1.05it/s]
[02-25 14:17:05] Loaded text_encoder: FSDPUMT5EncoderModel (sgl-diffusion version). consumed: 0.82 GB, avail mem: 73.32 GB
Loading required modules: 33%|█████████████████████████████████████▎ | 3/9 [00:35<01:36, 16.03s/it][02-25 14:17:05] Loading tokenizer from /workspace/mova-gen/MOVA-720p/tokenizer. avail mem: 73.32 GB
[02-25 14:17:05] Loaded tokenizer: T5TokenizerFast (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
Loading required modules: 44%|█████████████████████████████████████████████████▊ | 4/9 [00:36<00:49, 9.87s/it][02-25 14:17:05] Loading scheduler from /workspace/mova-gen/MOVA-720p/scheduler. avail mem: 73.32 GB
[02-25 14:17:05] Loaded scheduler: FlowMatchPairScheduler (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
[02-25 14:17:05] Loading video_dit from /workspace/mova-gen/MOVA-720p/video_dit. avail mem: 73.32 GB
[02-25 14:17:05] Loading WanModel from 3 safetensors files , param_dtype: torch.bfloat16
[02-25 14:17:06] Using FlashAttention (FA3 for hopper, FA4 for blackwell) backend
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:00<00:00, 32.93it/s]
[02-25 14:17:29] Loaded model with 14.29B parameters
[02-25 14:17:29] Loaded video_dit: WanModel (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
Loading required modules: 67%|██████████████████████████████████████████████████████████████████████████▋ | 6/9 [01:00<00:33, 11.04s/it][02-25 14:17:29] Loading video_dit_2 from /workspace/mova-gen/MOVA-720p/video_dit_2. avail mem: 73.32 GB
[02-25 14:17:29] Loading WanModel from 3 safetensors files , param_dtype: torch.bfloat16
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:00<00:00, 95.22it/s]
[02-25 14:17:47] Loaded model with 14.29B parameters
[02-25 14:17:47] Loaded video_dit_2: WanModel (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
Loading required modules: 78%|███████████████████████████████████████████████████████████████████████████████████████ | 7/9 [01:17<00:25, 12.75s/it][02-25 14:17:47] Loading audio_dit from /workspace/mova-gen/MOVA-720p/audio_dit. avail mem: 73.32 GB
[02-25 14:17:47] Loading WanAudioModel from 1 safetensors files , param_dtype: torch.bfloat16
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 41.16it/s]
[02-25 14:17:49] Loaded model with 1.42B parameters
[02-25 14:17:49] Loaded audio_dit: WanAudioModel (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
Loading required modules: 89%|███████████████████████████████████████████████████████████████████████████████████████████████████▌ | 8/9 [01:19<00:09, 9.77s/it][02-25 14:17:49] Loading dual_tower_bridge from /workspace/mova-gen/MOVA-720p/dual_tower_bridge. avail mem: 73.32 GB
[02-25 14:17:49] Loading DualTowerConditionalBridge from 1 safetensors files, default_dtype: torch.bfloat16
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 58.85it/s]
[02-25 14:17:52] Loaded bridge model with 2659.74M parameters
[02-25 14:17:52] Loaded dual_tower_bridge: DualTowerConditionalBridge (sgl-diffusion version). consumed: 0.00 GB, avail mem: 73.32 GB
Loading required modules: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [01:23<00:00, 9.26s/it]
[02-25 14:17:52] Creating pipeline stages...
[02-25 14:17:52] Pipeline instantiated
[02-25 14:18:10] LayerwiseOffloadManager initialized with num prefetched layer: 1, total num layers: 40
[02-25 14:18:10] Enabled layerwise offload for WanModel on modules: ['blocks']
[02-25 14:18:25] LayerwiseOffloadManager initialized with num prefetched layer: 1, total num layers: 40
[02-25 14:18:25] Enabled layerwise offload for WanModel on modules: ['blocks']
[02-25 14:18:26] LayerwiseOffloadManager initialized with num prefetched layer: 1, total num layers: 30
[02-25 14:18:26] Enabled layerwise offload for WanAudioModel on modules: ['blocks']
[02-25 14:18:26] Worker 0: Initialized device, model, and distributed environment.
[02-25 14:18:26] Worker 0: Scheduler loop started.
[02-25 14:18:26] Diffusers version: 0.36.0
[02-25 14:18:26] Diffusers version: 0.36.0
[02-25 14:18:26] Using native sglang backend for model '/workspace/mova-gen/MOVA-720p'
[02-25 14:18:26] Found model info: ModelInfo(pipeline_cls=<class 'sglang.multimodal_gen.runtime.pipelines.mova_pipeline.MOVAPipeline'>, sampling_param_cls=<class 'sglang.multimodal_gen.configs.sample.mova.MOVA_720P_SamplingParams'>, pipeline_config_cls=<class 'sglang.multimodal_gen.configs.pipeline_configs.mova.MOVA720PConfig'>)
[02-25 14:18:26] Processing prompt 1/1: A man in a blue blazer and glasses speaks in a formal indoor setting, framed by wooden furniture a
[02-25 14:18:26] Running pipeline stages: ['input_validation_stage', 'text_encoding_stage', 'image_v_a_e_encoding_stage', 'mova_latent_preparation_stage', 'mova_timestep_preparation_stage', 'mova_denoising_stage', 'mova_decoding_stage']
[02-25 14:18:26] [InputValidationStage] started...
[02-25 14:18:26] Center cropping and resizing image to 1280x720
[02-25 14:18:26] [InputValidationStage] finished in 0.0293 seconds
[02-25 14:18:26] [TextEncodingStage] started...
[02-25 14:18:30] [TextEncodingStage] finished in 4.0646 seconds
[02-25 14:18:30] [ImageVAEEncodingStage] started...
[02-25 14:18:50] [ImageVAEEncodingStage] finished in 19.5377 seconds
[02-25 14:18:50] [MOVALatentPreparationStage] started...
[02-25 14:18:50] [MOVALatentPreparationStage] finished in 0.0033 seconds
[02-25 14:18:50] [MOVATimestepPreparationStage] started...
[02-25 14:18:50] [MOVATimestepPreparationStage] finished in 0.0012 seconds
[02-25 14:18:50] [MOVADenoisingStage] started...
0%| | 0/50 [00:00<?, ?it/s]
[02-25 14:18:50] [MOVADenoisingStage] Error during execution after 755.2026 ms: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
Traceback (most recent call last):
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in __call__
result = self.forward(batch, server_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 480, in forward
pos = self._predict(
^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 184, in _predict
return self.inference_single_step(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 738, in inference_single_step
visual_x, audio_x = self.forward_dual_tower_dit(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 833, in forward_dual_tower_dit
visual_x, audio_x = self.dual_tower_bridge(
^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 638, in forward
visual_conditioned = self.apply_conditional_control(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 602, in apply_conditional_control
conditioned_features = conditioner(
^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 393, in forward
y = self.y_norm(y)
^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/normalization.py", line 229, in forward
return F.layer_norm(
^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/functional.py", line 2901, in layer_norm
return torch.layer_norm(
^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
[02-25 14:18:50] Error executing request e385bc70-168d-4ac7-ab6f-0b0e7c2a93fd: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
Traceback (most recent call last):
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/managers/gpu_worker.py", line 227, in execute_forward
result = self.pipeline.forward(req, self.server_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/composed_pipeline_base.py", line 622, in forward
return self.executor.execute_with_profiling(self.stages, batch, server_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/executors/pipeline_executor.py", line 57, in execute_with_profiling
batch = self.execute(stages, batch, server_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 101, in execute
batch = self._execute(stages, batch, server_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/executors/parallel_executor.py", line 92, in _execute
batch = stage(batch, server_args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/base.py", line 203, in __call__
result = self.forward(batch, server_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 480, in forward
pos = self._predict(
^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 184, in _predict
return self.inference_single_step(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 738, in inference_single_step
visual_x, audio_x = self.forward_dual_tower_dit(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/pipelines_core/stages/model_specific_stages/mova.py", line 833, in forward_dual_tower_dit
visual_x, audio_x = self.dual_tower_bridge(
^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 638, in forward
visual_conditioned = self.apply_conditional_control(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 602, in apply_conditional_control
conditioned_features = conditioner(
^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/models/bridges/mova_dual_tower.py", line 393, in forward
y = self.y_norm(y)
^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/modules/normalization.py", line 229, in forward
return F.layer_norm(
^^^^^^^^^^^^^
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/torch/nn/functional.py", line 2901, in layer_norm
return torch.layer_norm(
^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
[02-25 14:18:50] Failed to generate output for prompt 1: Error executing request e385bc70-168d-4ac7-ab6f-0b0e7c2a93fd: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
Traceback (most recent call last):
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/utils/logging_utils.py", line 466, in log_generation_timer
yield timer
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/entrypoints/diffusion_generator.py", line 192, in generate
raise Exception(f"{output_batch.error}")
Exception: Error executing request e385bc70-168d-4ac7-ab6f-0b0e7c2a93fd: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
[02-25 14:18:50] Generation failed for prompt 1/1: Error executing request e385bc70-168d-4ac7-ab6f-0b0e7c2a93fd: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
Traceback (most recent call last):
File "/workspace/mova-sglang/.venv/lib/python3.12/site-packages/sglang/multimodal_gen/runtime/entrypoints/diffusion_generator.py", line 192, in generate
raise Exception(f"{output_batch.error}")
Exception: Error executing request e385bc70-168d-4ac7-ab6f-0b0e7c2a93fd: Expected all tensors to be on the same device, but got weight is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__native_layer_norm)
[02-25 14:18:50] Completed batch processing. Generated 0 outputs in 24.42 seconds
[02-25 14:18:50] Generator was garbage collected without being shut down. Attempting to shut down the local server and client.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels