[pull] main from inclusionAI:main by pull[bot] · Pull Request #24 · axistore80-coder/AReaL

pull · 2026-04-01T13:21:33Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Integrate Trackio (Hugging Face) as a new experiment tracking option alongside existing WandB, SwanLab, and TensorBoard backends. Key changes: - Add TrackioConfig dataclass with mode/project/name/space_id fields - Integrate trackio init/log/finish lifecycle in StatsLogger - Add trackio.log() fallback in logging.py helper function - Add trackio to pyproject.toml dependencies - Update CLI docs generator to include TrackioConfig - Add unit tests for config and StatsLogger integration

…1126) Extract the monolithic rpc_server.py (1109 lines) into a reusable guard package with composable Flask blueprints, enabling code sharing between the RPC server and inference service guard. Key changes: - New areal/infra/rpc/guard/ package with GuardState, create_app(), data_blueprint (RTensor /data/* endpoints), and engine_blueprint (/create_engine, /call, /set_env + engine thread) - rpc_server.py reduced to 62-line thin composition of guard + blueprints - inference_service/guard/app.py reduced to 25-line wrapper over shared guard - /fork endpoint simplified to single raw_cmd mode (removed module-path mode) - Schedulers (local, slurm) migrated to build raw commands client-side via /alloc_ports + /fork with raw_cmd - All guard tests (27) and rtensor tests (32) pass

Move /review-pr classification from change types to a domain/signal\nmodel so the harness can cover newer runtime surfaces without\nplatform drift.\n\nKey changes:\n- add canonical domains-and-signals references and migration guide\n- add sync_review_pr_refs.py to regenerate Claude and OpenCode data\n- align review templates and commit-conventions mirrors, including\n fsdp/megatron runtime signal coverage

* feat(infra): add client-side fetch buffer for RTensor Add a per-process cache (_fetch_buffer) keyed by shard_id so that repeated to_local() / localize() calls for the same rollout batch avoid redundant network round-trips. Entries are evicted by clear_node() at the end of each train step. Key changes: - Cache check in to_local() before backend fetch - Batch buffer resolution in localize() (fetch only misses) - clear_node() evicts buffer entries before deleting remote shards - Add buffer_stats() for operational monitoring - Add strict=True to zip in localize() for safety - Add TestFetchBuffer integration test suite (8 tests) * refactor(rtensor): remove clear_fetch_buffer and buffer_stats functions

guozhihao-224 and others added 4 commits April 1, 2026 17:12

pull bot locked and limited conversation to collaborators Apr 1, 2026

pull bot added the ⤵️ pull label Apr 1, 2026

pull bot merged commit 44d54cf into axistore80-coder:main Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from inclusionAI:main#24

[pull] main from inclusionAI:main#24
pull[bot] merged 4 commits intoaxistore80-coder:mainfrom
inclusionAI:main

pull bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pull bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pull bot commented Apr 1, 2026 •

edited

Loading