Skip to content

Runtime crash with FlashInfer on RTX 5090 + installation command not working #41

@Ian-wyy

Description

@Ian-wyy

Hi, thanks for releasing this great project!

I tried to run lingbot-map strictly following the README instructions, but encountered an issue related to FlashInfer on a Blackwell GPU (RTX 5090). I’d like to report both a reproducibility issue and a compatibility issue.


1. Installation issue (FlashInfer)

The README suggests installing FlashInfer with:

pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/

However, this command does not work:

ERROR: Could not find a version that satisfies the requirement flashinfer-python
ERROR: No matching distribution found for flashinfer-python

It seems that this index URL no longer provides the package.

Instead, I had to install it via:

pip install flashinfer-python

So the installation instructions in the README appear to be outdated.

2. Runtime issue

Environment:

  • GPU: RTX 5090
  • CUDA: 12.8
  • PyTorch: 2.9
  • Python: 3.10

Running the demo:

python demo.py \
  --model_path ./model/lingbot-map-long.pt \
  --image_folder ../../datasets/oxford/data/observatory-quarter/2024-03-13-observatory-quarter-01/cam0/data/ \
  --mask_sky

The program fails during streaming inference with the following error:

Loading 5746 images...
Loading images: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5746/5746 [00:16<00:00, 341.74it/s]
Preprocessed images to 518x392 using canonical crop mode
Failed to get device capability: SM 12.x requires CUDA >= 12.9.
Failed to get device capability: SM 12.x requires CUDA >= 12.9.
torchtitan not available for ulysses cp
Building model...
pretrained_path: 
Failed to load pretrained weights: [Errno 2] No such file or directory: ''
Loading checkpoint: ./model/lingbot-map-long.pt
  Missing keys: 62
  Checkpoint loaded.
Total load time: 61.6s
Casting aggregator to torch.bfloat16 (heads kept in fp32)
Input: 5746 frames, shape (5746, 3, 392, 518)
Mode: streaming
GPU mem after load: alloc=16.95 GB, reserved=16.97 GB
Auto-selected --keyframe_interval=18 (num_frames=5746 > 320).
Keyframe streaming enabled: interval=18 (after the first 8 scale frames).
Running streaming inference (dtype=torch.bfloat16)...
Streaming inference:   0%|▏                                                                                                                                         | 8/5746 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/demo.py", line 522, in <module>
    main()
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/demo.py", line 466, in main
    predictions = model.inference_streaming(
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/models/gct_stream.py", line 390, in inference_streaming
    frame_output = self.forward(
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/models/gct_base.py", line 322, in forward
    aggregated_tokens_list, patch_start_idx = self._aggregate_features(
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/models/gct_stream.py", line 225, in _aggregate_features
    aggregated_tokens_list, patch_start_idx = self.aggregator(
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/aggregator/base.py", line 589, in forward
    tokens, global_idx, global_intermediates = self._process_global_attention(
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/aggregator/stream.py", line 409, in _process_global_attention
    return self._process_causal_stream(
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/aggregator/stream.py", line 509, in _process_causal_stream
    tokens = self.global_blocks[global_idx](
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/layers/block.py", line 264, in forward
    attn_x = manager.compute_attention(global_idx, q_nhd)
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/layers/flashinfer_cache.py", line 381, in compute_attention
    self.prefill_wrapper.plan(
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/prefill.py", line 1999, in plan
    self._cached_module = get_batch_prefill_module(
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/prefill.py", line 454, in get_batch_prefill_module
    module = gen_batch_prefill_module(backend, *args).build_and_load()
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/jit/attention/modules.py", line 1058, in gen_batch_prefill_module
    return gen_customize_batch_prefill_module(
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/jit/attention/modules.py", line 1618, in gen_customize_batch_prefill_module
    return gen_jit_spec(uri, source_paths)
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/jit/core.py", line 415, in gen_jit_spec
    check_cuda_arch()
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/jit/core.py", line 108, in check_cuda_arch
    raise RuntimeError("FlashInfer requires GPUs with sm75 or higher")
RuntimeError: FlashInfer requires GPUs with sm75 or higher
[W423 00:55:43.765978640 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())

Do you have a recommended setup for:

  • Blackwell GPUs (SM 12.x)?
  • CUDA ≥ 12.9?
  • Or a specific FlashInfer version that works?

This issue affects reproducibility on newer GPU platforms.

Thanks again for the great work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions