[feat] Make ATOM work with SGLang out-of-tree#355
Draft
zhuyuhua-v wants to merge 7 commits intoROCm:mainfrom
Draft
[feat] Make ATOM work with SGLang out-of-tree#355zhuyuhua-v wants to merge 7 commits intoROCm:mainfrom
zhuyuhua-v wants to merge 7 commits intoROCm:mainfrom
Conversation
8612ca6 to
295f1dc
Compare
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
295f1dc to
a1e88c7
Compare
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
Signed-off-by: zhuyuhua-v <yuhzhu@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
ATOM currently integrates with SGLang by maintaining a forked version of SGLang that adds
--model-impl atomsupport. This approach requires invasive modifications to SGLang's codebase (6+ places), and imposes a continuous maintenance burden to keep the fork synchronized with upstream SGLang updates.SGLang PR #13429 introduced the
SGLANG_EXTERNAL_MODEL_PACKAGEmechanism, which allows external packages to register custom model implementations without modifying upstream SGLang. This provides a clean, officially supported extension point that perfectly fits ATOM's use case.By adopting this out-of-tree (OOT) approach, we eliminate the need for a SGLang fork entirely. ATOM users can run optimized models on AMD GPUs using unmodified upstream SGLang with a single environment variable:
Comparison with the fork-based approach
--model-impl atom(fork)SGLANG_EXTERNAL_MODEL_PACKAGE(OOT)ATOMForCausalLM+ModelImpl.ATOMEntryClass.pyfile underatom/plugin/sglang/oot/Design
SGLang's External Model Package Mechanism
SGLang's
SGLANG_EXTERNAL_MODEL_PACKAGE(PR #13429) works through an automatic module discovery mechanism insglang/srt/models/registry.py:When
SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.ootis set, the registry:__init__.py)pkgutil.iter_modulesto discover all.pysubmodulesEntryClassattributescls.__name__, withoverwrite=Trueto replace built-in implementationsATOM OOT Package Structure
Each submodule defines thin wrapper classes that:
__init__,forward,load_weightssignatures)atom.prepare_model(config, engine="sglang")hidden_statesoutput to SGLang'sLogitsProcessorOutputArchitecture Diagram
Execution Flow
Model Wrapper Interface
Each OOT wrapper conforms to SGLang's model interface:
The wrapper class name must match the HF config's
architecturesfield (e.g."Qwen3MoeForCausalLM"), because SGLang usescls.__name__as the registry key.Attention
SGLang provides the
register_attention_backenddecorator for registering custom attention backends. ATOM leverages this mechanism to inject its optimized attention implementation:The
ATOMAttnBackendForSglbackend provides:flash_attn_varlen_funcfrom aiter for variable-length attentionpa_persistent_fwd/pa_fwd_asmfor paged attention with optimized AMD GPU kernelsreshape_and_cache_shuffle_tritonThis registration uses the name
"aiter"to align with SGLang's existing attention backend selection, ensuring the ATOM backend is transparently selected when running on AMD GPUs.Supported Models
DeepseekV3ForCausalLMdeepseek.pyQwen3MoeForCausalLMqwen3_moe.pyHow to Add a New Model
Adding a new model to ATOM's OOT package requires no changes to upstream SGLang and minimal boilerplate:
atom/models/)atom/plugin/register.py's_ATOM_SUPPORTED_MODELS.pyfile underatom/plugin/sglang/oot/with the wrapper class andEntryClassTaking Qwen3 MoE as an example:
No changes to
__init__.py, environment variables, or SGLang code are needed. SGLang'spkgutil.iter_modulesautomatically discovers the new file.Usage
Switching models only requires changing
--model-path. SGLang automatically selects the correct ATOM wrapper based on the model's HFconfig.jsonarchitecturesfield.Limitations
PRs