[pull] main from inclusionAI:main by pull[bot] · Pull Request #20 · axistore80-coder/AReaL

pull · 2026-03-30T07:21:27Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* feat: support model training in IPv6-only environment --------- Co-authored-by: bingyechen <bingyechen@bytedance.com> Co-authored-by: root <root@dc05-p13-t0-n028.byted.org>

Co-authored-by: truongnp5 <v.truongnp5@vinsmartfuture.tech>

* feat: megatron-bridge-adaptation and dependency conficts resolution - tested TP,PP>1 megatron-bridge integration with mbridge backward compatibility - darwin with x86_64 needs special handling as torch >2.9.1 stops support - some packages conflicts due to megatron-bridge are overridden to previous versions * chore: added docs for the megatron-bridge feature * fix: handing case where load/ save in megatron-bridge does not support critic --------- Co-authored-by: Wei Fu <36355462+garrett4wade@users.noreply.github.com>

…ge init (#1107)

…accurate metrics(pre-commit checked) (#1100)

* fix(archon): add missing POST /data/batch endpoint to data proxy PR #1077 added batch RTensor fetching via POST /data/batch but only implemented the endpoint on the Flask RPC server (rpc_server.py), missing the FastAPI data proxy. This caused RTensor.localize() to fail with HTTP 405 in integration tests that use the data proxy. Refs: #1077 * fix(archon): harden data proxy batch endpoint with Flask-parity error handling Align POST /data/batch error responses, JSON parsing, and exception handling with the Flask rpc_server.py counterpart to ensure identical behavior across both servers. Key changes: - Replace HTTPException with JSONResponse for Flask-compatible error bodies - Add outer try/except with traceback logging matching Flask pattern - Normalise falsy/non-dict JSON payloads via or {} + isinstance guard - Add 12 unit tests for all RTensor data proxy endpoints (no GPU) Refs: #1077, #1105

…rOptimWrapper (#1108) Replace hardcoded torch.cuda.Stream, torch.cuda.Event, torch.cuda.stream(), torch.cuda.current_stream(), and torch.cuda.empty_cache() with current_platform equivalents to support non-CUDA accelerators. Resolves two TODO comments about platform abstraction.

- Split weight update into async bucket start + explicit wait - Add _PendingWeightUpdateBucket dataclass for async tracking - Overlap bucket N-1 broadcast with bucket N all-gather - Keep training ranks aligned before entering next collective

TaoZex and others added 5 commits March 30, 2026 13:12

feat: support model training in IPv6-only environment (#1072)

c961323

* feat: support model training in IPv6-only environment --------- Co-authored-by: bingyechen <bingyechen@bytedance.com> Co-authored-by: root <root@dc05-p13-t0-n028.byted.org>

fix: fix pad_packed_tensor_dict (#1104)

be621da

Co-authored-by: truongnp5 <v.truongnp5@vinsmartfuture.tech>

fix(engine): remove duplicate trust_remote_code kwarg in MegatronBrid…

fdca82d

…ge init (#1107)

fix(dataloader): prevent data drop and padding during validation for …

0405b5c

…accurate metrics(pre-commit checked) (#1100)

pull bot added the ⤵️ pull label Mar 30, 2026

rchardx and others added 3 commits March 30, 2026 16:01

pull bot merged commit 2ddd959 into axistore80-coder:main Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from inclusionAI:main#20

[pull] main from inclusionAI:main#20
pull[bot] merged 8 commits intoaxistore80-coder:mainfrom
inclusionAI:main

pull bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

pull bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

pull bot commented Mar 30, 2026 •

edited

Loading