fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch by richiejp · Pull Request #9557 · mudler/LocalAI

richiejp · 2026-04-25T09:09:03Z

The pinned flash-attn 2.8.3+cu12torch2.7 wheel breaks at import time
once vllm 0.19.1 upgrades torch to its hard-pinned 2.10.0:

ImportError: .../flash_attn_2_cuda...so: undefined symbol:
_ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib

That C10 CUDA symbol is libtorch-version-specific. Dao-AILab has not yet
published flash-attn wheels for torch 2.10 -- the latest release (2.8.3)
tops out at torch 2.8 -- so any wheel pinned here is silently ABI-broken
the moment vllm completes its install.

vllm 0.19.1 lists flashinfer-python==0.6.6 as a hard dep, which already
covers the attention path. The only other use of flash-attn in vllm is
the rotary apply_rotary import in
vllm/model_executor/layers/rotary_embedding/common.py, which is guarded
by find_spec("flash_attn") and falls back cleanly when absent.

Also unpin torch in requirements-cublas12.txt: the 2.7.0 pin only
existed to give the flash-attn wheel a matching torch to link against.
With flash-attn gone, vllm's own torch==2.10.0 dep is the binding
constraint regardless of what we put here.

Notes for Reviewers

I don't know what are the performance implications, but we need them to update flash-attn practically speaking.

Signed commits

Yes, I signed my commits.

mudler · 2026-04-25T11:59:27Z

mh interesting, we should check closer as not having flash attn is a huge slowdown

richiejp · 2026-04-25T12:27:25Z

mh interesting, we should check closer as not having flash attn is a huge slowdown

oh, it's still there, but implemented in flashinfer which seems to be the default in vLLM.

The pinned flash-attn 2.8.3+cu12torch2.7 wheel breaks at import time once vllm 0.19.1 upgrades torch to its hard-pinned 2.10.0: ImportError: .../flash_attn_2_cuda...so: undefined symbol: _ZN3c104cuda29c10_cuda_check_implementationEiPKcS2_ib That C10 CUDA symbol is libtorch-version-specific. Dao-AILab has not yet published flash-attn wheels for torch 2.10 -- the latest release (2.8.3) tops out at torch 2.8 -- so any wheel pinned here is silently ABI-broken the moment vllm completes its install. vllm 0.19.1 lists flashinfer-python==0.6.6 as a hard dep, which already covers the attention path. The only other use of flash-attn in vllm is the rotary apply_rotary import in vllm/model_executor/layers/rotary_embedding/common.py, which is guarded by find_spec("flash_attn") and falls back cleanly when absent. Also unpin torch in requirements-cublas12.txt: the 2.7.0 pin only existed to give the flash-attn wheel a matching torch to link against. With flash-attn gone, vllm's own torch==2.10.0 dep is the binding constraint regardless of what we put here. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>

richiejp force-pushed the fix/vllm-flash-attn branch from bb9f22b to e85629f Compare April 25, 2026 09:22

richiejp force-pushed the fix/vllm-flash-attn branch from e85629f to e011550 Compare April 25, 2026 12:39

mudler approved these changes Apr 25, 2026

View reviewed changes

mudler enabled auto-merge (squash) April 25, 2026 13:11

mudler merged commit 73aacad into mudler:master Apr 25, 2026
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch#9557

fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch#9557
mudler merged 1 commit intomudler:masterfrom
richiejp:fix/vllm-flash-attn

richiejp commented Apr 25, 2026

Uh oh!

mudler commented Apr 25, 2026

Uh oh!

richiejp commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

richiejp commented Apr 25, 2026

Uh oh!

mudler commented Apr 25, 2026

Uh oh!

richiejp commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants