Skip to content

[pull] main from inclusionAI:main#21

Merged
pull[bot] merged 4 commits intoaxistore80-coder:mainfrom
inclusionAI:main
Mar 31, 2026
Merged

[pull] main from inclusionAI:main#21
pull[bot] merged 4 commits intoaxistore80-coder:mainfrom
inclusionAI:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Mar 31, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

garrett4wade and others added 4 commits March 31, 2026 10:35
Restore Qwen VLM training by matching FSDP weight names to the layout expected by SGLang during distributed updates.

Key changes:
- special-case Qwen VL parameter naming for SGLang weight sync
- pin the cuDNN dependency needed for SGLang vision runs
- replace the skipped VLM example with Geometry3K coverage for SGLang and vLLM
Enable blockwise 128x128 FP8 e4m3fn matmuls for Archon engine via
torchao, with BF16 master weights and on-the-fly quantization.

Key changes:
- Add FP8 linear patching (enable_fp8_linear) and expert patching
  (enable_fp8_experts) with deepcopy-safe types.MethodType binding
- Add FP8 checkpoint detection, preparation, and dequantization
  with CPU fallback and Shard(0) DTensor support
- Add ArchonFP8Config with mode/exclude_modules/include_experts/use_triton
- Add post-parallelism shard alignment validation for TP safety
- Disable torch.compile when FP8 is active (incompatible with 2D scales)
- Comprehensive test suite: forward/backward correctness, checkpoint
  detect/prepare/dequant, MoE dispatch, distributed sharded dequant
…ervice (#1112)

Wire VLLMBridgeBackend into the full inference service stack with feature
parity to the SGLang path, including per-process env isolation, context-
window capping, and end-to-end test coverage.

Key changes:
- Add backend_type config field and --backend-type CLI arg for data proxy
- Wire VLLMBridgeBackend selection in data proxy app factory
- Delegate InfBridge abort/resubmit to backend protocol methods instead
  of hardcoded SGLang payload access
- Cap max_new_tokens in VLLMBridgeBackend to match SGLangBridgeBackend
- Add env override support to RPCGuard /fork endpoint
- Pass TRITON_CACHE_PATH, VLLM_CACHE_ROOT, VLLM_ALLOW_RUNTIME_LORA_UPDATING
  as per-process env vars when launching vLLM inference servers
- Add launch_vllm_server() helper to integration_utils
- Add 8 vLLM unit tests and 7 vLLM e2e integration test variants
@pull pull bot locked and limited conversation to collaborators Mar 31, 2026
@pull pull bot added the ⤵️ pull label Mar 31, 2026
@pull pull bot merged commit d366571 into axistore80-coder:main Mar 31, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants