[pull] main from inclusionAI:main#20
Merged
pull[bot] merged 8 commits intoaxistore80-coder:mainfrom Mar 30, 2026
Merged
Conversation
* feat: support model training in IPv6-only environment --------- Co-authored-by: bingyechen <bingyechen@bytedance.com> Co-authored-by: root <root@dc05-p13-t0-n028.byted.org>
Co-authored-by: truongnp5 <v.truongnp5@vinsmartfuture.tech>
* feat: megatron-bridge-adaptation and dependency conficts resolution - tested TP,PP>1 megatron-bridge integration with mbridge backward compatibility - darwin with x86_64 needs special handling as torch >2.9.1 stops support - some packages conflicts due to megatron-bridge are overridden to previous versions * chore: added docs for the megatron-bridge feature * fix: handing case where load/ save in megatron-bridge does not support critic --------- Co-authored-by: Wei Fu <36355462+garrett4wade@users.noreply.github.com>
…accurate metrics(pre-commit checked) (#1100)
* fix(archon): add missing POST /data/batch endpoint to data proxy PR #1077 added batch RTensor fetching via POST /data/batch but only implemented the endpoint on the Flask RPC server (rpc_server.py), missing the FastAPI data proxy. This caused RTensor.localize() to fail with HTTP 405 in integration tests that use the data proxy. Refs: #1077 * fix(archon): harden data proxy batch endpoint with Flask-parity error handling Align POST /data/batch error responses, JSON parsing, and exception handling with the Flask rpc_server.py counterpart to ensure identical behavior across both servers. Key changes: - Replace HTTPException with JSONResponse for Flask-compatible error bodies - Add outer try/except with traceback logging matching Flask pattern - Normalise falsy/non-dict JSON payloads via or {} + isinstance guard - Add 12 unit tests for all RTensor data proxy endpoints (no GPU) Refs: #1077, #1105
…rOptimWrapper (#1108) Replace hardcoded torch.cuda.Stream, torch.cuda.Event, torch.cuda.stream(), torch.cuda.current_stream(), and torch.cuda.empty_cache() with current_platform equivalents to support non-CUDA accelerators. Resolves two TODO comments about platform abstraction.
- Split weight update into async bucket start + explicit wait - Add _PendingWeightUpdateBucket dataclass for async tracking - Overlap bucket N-1 broadcast with bucket N all-gather - Keep training ranks aligned before entering next collective
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )