Skip to content

[pull] main from inclusionAI:main#24

Merged
pull[bot] merged 4 commits intoaxistore80-coder:mainfrom
inclusionAI:main
Apr 1, 2026
Merged

[pull] main from inclusionAI:main#24
pull[bot] merged 4 commits intoaxistore80-coder:mainfrom
inclusionAI:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 1, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

guozhihao-224 and others added 4 commits April 1, 2026 17:12
Integrate Trackio (Hugging Face) as a new experiment tracking option
alongside existing WandB, SwanLab, and TensorBoard backends.

Key changes:
- Add TrackioConfig dataclass with mode/project/name/space_id fields
- Integrate trackio init/log/finish lifecycle in StatsLogger
- Add trackio.log() fallback in logging.py helper function
- Add trackio to pyproject.toml dependencies
- Update CLI docs generator to include TrackioConfig
- Add unit tests for config and StatsLogger integration
…1126)

Extract the monolithic rpc_server.py (1109 lines) into a reusable
guard package with composable Flask blueprints, enabling code sharing
between the RPC server and inference service guard.

Key changes:
- New areal/infra/rpc/guard/ package with GuardState, create_app(),
  data_blueprint (RTensor /data/* endpoints), and engine_blueprint
  (/create_engine, /call, /set_env + engine thread)
- rpc_server.py reduced to 62-line thin composition of guard + blueprints
- inference_service/guard/app.py reduced to 25-line wrapper over shared guard
- /fork endpoint simplified to single raw_cmd mode (removed module-path mode)
- Schedulers (local, slurm) migrated to build raw commands client-side
  via /alloc_ports + /fork with raw_cmd
- All guard tests (27) and rtensor tests (32) pass
Move /review-pr classification from change types to a domain/signal\nmodel so the harness can cover newer runtime surfaces without\nplatform drift.\n\nKey changes:\n- add canonical domains-and-signals references and migration guide\n- add sync_review_pr_refs.py to regenerate Claude and OpenCode data\n- align review templates and commit-conventions mirrors, including\n  fsdp/megatron runtime signal coverage
* feat(infra): add client-side fetch buffer for RTensor

Add a per-process cache (_fetch_buffer) keyed by shard_id so that
repeated to_local() / localize() calls for the same rollout batch
avoid redundant network round-trips.  Entries are evicted by
clear_node() at the end of each train step.

Key changes:
- Cache check in to_local() before backend fetch
- Batch buffer resolution in localize() (fetch only misses)
- clear_node() evicts buffer entries before deleting remote shards
- Add buffer_stats() for operational monitoring
- Add strict=True to zip in localize() for safety
- Add TestFetchBuffer integration test suite (8 tests)

* refactor(rtensor): remove clear_fetch_buffer and buffer_stats functions
@pull pull bot locked and limited conversation to collaborators Apr 1, 2026
@pull pull bot added the ⤵️ pull label Apr 1, 2026
@pull pull bot merged commit 44d54cf into axistore80-coder:main Apr 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants