Runtime crash with FlashInfer on RTX 5090 + installation command not working

Hi, thanks for releasing this great project!

I tried to run **lingbot-map** strictly following the README instructions, but encountered an issue related to FlashInfer on a Blackwell GPU (RTX 5090). I’d like to report both a **reproducibility issue** and a **compatibility issue**.

---

## 1. Installation issue (FlashInfer)

The README suggests installing FlashInfer with:

```bash
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
````

However, this command does not work:

```bash
ERROR: Could not find a version that satisfies the requirement flashinfer-python
ERROR: No matching distribution found for flashinfer-python
```

It seems that this index URL no longer provides the package.

Instead, I had to install it via:

```bash
pip install flashinfer-python
```

So the installation instructions in the README appear to be outdated.


## 2. Runtime issue

Environment:

* GPU: RTX 5090
* CUDA: 12.8
* PyTorch: 2.9
* Python: 3.10

Running the demo:

```bash
python demo.py \
  --model_path ./model/lingbot-map-long.pt \
  --image_folder ../../datasets/oxford/data/observatory-quarter/2024-03-13-observatory-quarter-01/cam0/data/ \
  --mask_sky
```

The program fails during streaming inference with the following error:

```text
Loading 5746 images...
Loading images: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5746/5746 [00:16<00:00, 341.74it/s]
Preprocessed images to 518x392 using canonical crop mode
Failed to get device capability: SM 12.x requires CUDA >= 12.9.
Failed to get device capability: SM 12.x requires CUDA >= 12.9.
torchtitan not available for ulysses cp
Building model...
pretrained_path: 
Failed to load pretrained weights: [Errno 2] No such file or directory: ''
Loading checkpoint: ./model/lingbot-map-long.pt
  Missing keys: 62
  Checkpoint loaded.
Total load time: 61.6s
Casting aggregator to torch.bfloat16 (heads kept in fp32)
Input: 5746 frames, shape (5746, 3, 392, 518)
Mode: streaming
GPU mem after load: alloc=16.95 GB, reserved=16.97 GB
Auto-selected --keyframe_interval=18 (num_frames=5746 > 320).
Keyframe streaming enabled: interval=18 (after the first 8 scale frames).
Running streaming inference (dtype=torch.bfloat16)...
Streaming inference:   0%|▏                                                                                                                                         | 8/5746 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/demo.py", line 522, in <module>
    main()
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/demo.py", line 466, in main
    predictions = model.inference_streaming(
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/models/gct_stream.py", line 390, in inference_streaming
    frame_output = self.forward(
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/models/gct_base.py", line 322, in forward
    aggregated_tokens_list, patch_start_idx = self._aggregate_features(
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/models/gct_stream.py", line 225, in _aggregate_features
    aggregated_tokens_list, patch_start_idx = self.aggregator(
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/aggregator/base.py", line 589, in forward
    tokens, global_idx, global_intermediates = self._process_global_attention(
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/aggregator/stream.py", line 409, in _process_global_attention
    return self._process_causal_stream(
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/aggregator/stream.py", line 509, in _process_causal_stream
    tokens = self.global_blocks[global_idx](
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/layers/block.py", line 264, in forward
    attn_x = manager.compute_attention(global_idx, q_nhd)
  File "/mnt/crucial/slam_benchmark/baselines/lingbot-map/lingbot_map/layers/flashinfer_cache.py", line 381, in compute_attention
    self.prefill_wrapper.plan(
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/prefill.py", line 1999, in plan
    self._cached_module = get_batch_prefill_module(
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/prefill.py", line 454, in get_batch_prefill_module
    module = gen_batch_prefill_module(backend, *args).build_and_load()
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/jit/attention/modules.py", line 1058, in gen_batch_prefill_module
    return gen_customize_batch_prefill_module(
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/jit/attention/modules.py", line 1618, in gen_customize_batch_prefill_module
    return gen_jit_spec(uri, source_paths)
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/jit/core.py", line 415, in gen_jit_spec
    check_cuda_arch()
  File "/home/wangyiyu/miniconda3/envs/lingbot-map/lib/python3.10/site-packages/flashinfer/jit/core.py", line 108, in check_cuda_arch
    raise RuntimeError("FlashInfer requires GPUs with sm75 or higher")
RuntimeError: FlashInfer requires GPUs with sm75 or higher
[W423 00:55:43.765978640 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())
```

Do you have a recommended setup for:
* Blackwell GPUs (SM 12.x)?
* CUDA ≥ 12.9?
* Or a specific FlashInfer version that works?


This issue affects reproducibility on newer GPU platforms.

Thanks again for the great work!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime crash with FlashInfer on RTX 5090 + installation command not working #41

1. Installation issue (FlashInfer)

2. Runtime issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Runtime crash with FlashInfer on RTX 5090 + installation command not working #41

Description

1. Installation issue (FlashInfer)

2. Runtime issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions