[pull] main from inclusionAI:main#24
Merged
pull[bot] merged 4 commits intoaxistore80-coder:mainfrom Apr 1, 2026
Merged
Conversation
Integrate Trackio (Hugging Face) as a new experiment tracking option alongside existing WandB, SwanLab, and TensorBoard backends. Key changes: - Add TrackioConfig dataclass with mode/project/name/space_id fields - Integrate trackio init/log/finish lifecycle in StatsLogger - Add trackio.log() fallback in logging.py helper function - Add trackio to pyproject.toml dependencies - Update CLI docs generator to include TrackioConfig - Add unit tests for config and StatsLogger integration
…1126) Extract the monolithic rpc_server.py (1109 lines) into a reusable guard package with composable Flask blueprints, enabling code sharing between the RPC server and inference service guard. Key changes: - New areal/infra/rpc/guard/ package with GuardState, create_app(), data_blueprint (RTensor /data/* endpoints), and engine_blueprint (/create_engine, /call, /set_env + engine thread) - rpc_server.py reduced to 62-line thin composition of guard + blueprints - inference_service/guard/app.py reduced to 25-line wrapper over shared guard - /fork endpoint simplified to single raw_cmd mode (removed module-path mode) - Schedulers (local, slurm) migrated to build raw commands client-side via /alloc_ports + /fork with raw_cmd - All guard tests (27) and rtensor tests (32) pass
Move /review-pr classification from change types to a domain/signal\nmodel so the harness can cover newer runtime surfaces without\nplatform drift.\n\nKey changes:\n- add canonical domains-and-signals references and migration guide\n- add sync_review_pr_refs.py to regenerate Claude and OpenCode data\n- align review templates and commit-conventions mirrors, including\n fsdp/megatron runtime signal coverage
* feat(infra): add client-side fetch buffer for RTensor Add a per-process cache (_fetch_buffer) keyed by shard_id so that repeated to_local() / localize() calls for the same rollout batch avoid redundant network round-trips. Entries are evicted by clear_node() at the end of each train step. Key changes: - Cache check in to_local() before backend fetch - Batch buffer resolution in localize() (fetch only misses) - clear_node() evicts buffer entries before deleting remote shards - Add buffer_stats() for operational monitoring - Add strict=True to zip in localize() for safety - Add TestFetchBuffer integration test suite (8 tests) * refactor(rtensor): remove clear_fetch_buffer and buffer_stats functions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )