[pull] main from inclusionAI:main#21
Merged
pull[bot] merged 4 commits intoaxistore80-coder:mainfrom Mar 31, 2026
Merged
Conversation
Restore Qwen VLM training by matching FSDP weight names to the layout expected by SGLang during distributed updates. Key changes: - special-case Qwen VL parameter naming for SGLang weight sync - pin the cuDNN dependency needed for SGLang vision runs - replace the skipped VLM example with Geometry3K coverage for SGLang and vLLM
Enable blockwise 128x128 FP8 e4m3fn matmuls for Archon engine via torchao, with BF16 master weights and on-the-fly quantization. Key changes: - Add FP8 linear patching (enable_fp8_linear) and expert patching (enable_fp8_experts) with deepcopy-safe types.MethodType binding - Add FP8 checkpoint detection, preparation, and dequantization with CPU fallback and Shard(0) DTensor support - Add ArchonFP8Config with mode/exclude_modules/include_experts/use_triton - Add post-parallelism shard alignment validation for TP safety - Disable torch.compile when FP8 is active (incompatible with 2D scales) - Comprehensive test suite: forward/backward correctness, checkpoint detect/prepare/dequant, MoE dispatch, distributed sharded dequant
…ervice (#1112) Wire VLLMBridgeBackend into the full inference service stack with feature parity to the SGLang path, including per-process env isolation, context- window capping, and end-to-end test coverage. Key changes: - Add backend_type config field and --backend-type CLI arg for data proxy - Wire VLLMBridgeBackend selection in data proxy app factory - Delegate InfBridge abort/resubmit to backend protocol methods instead of hardcoded SGLang payload access - Cap max_new_tokens in VLLMBridgeBackend to match SGLangBridgeBackend - Add env override support to RPCGuard /fork endpoint - Pass TRITON_CACHE_PATH, VLLM_CACHE_ROOT, VLLM_ALLOW_RUNTIME_LORA_UPDATING as per-process env vars when launching vLLM inference servers - Add launch_vllm_server() helper to integration_utils - Add 8 vLLM unit tests and 7 vLLM e2e integration test variants
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )