Skip to content

[pull] main from inclusionAI:main#43

Merged
pull[bot] merged 3 commits intoaxistore80-coder:mainfrom
inclusionAI:main
Apr 20, 2026
Merged

[pull] main from inclusionAI:main#43
pull[bot] merged 3 commits intoaxistore80-coder:mainfrom
inclusionAI:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 20, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

TaoZex and others added 3 commits April 20, 2026 15:28
…acking (#1151)

Add KK (Largest Differencing Method) as an alternative to FFD for
micro-batch allocation. KK produces more balanced partitions with
lower max-min spread, beneficial for RL workloads with variable
sequence lengths.

Key changes:
- Add _KKSet, _KKState, _kk_partition, kk_allocate in seqpack.py
- Add packing_algorithm field to MicroBatchSpec (ffd/kk)
- Wire KK allocation through dist_rollout and data utils
- Add sequence_packing docs (en/zh) and CLI reference updates
- Add comprehensive unit tests and torchrun benchmark

Refs: #1151
* fix(infra): move data service seed to worker-level config

Set random seed once at worker startup instead of per-request during
dataset load and epoch reset. This prevents seed re-initialization
from interfering with data shuffling across multiple datasets.

Key changes:
- Add seed field to DataServiceConfig and DataWorkerConfig
- Pass seed as CLI arg to worker process, set once in lifespan
- Remove seed from WorkerLoadDatasetRequest and _DatasetState
- Add datasets_lock for thread-safe dataset load/unload
- Update all trainers to pass seed via DataServiceConfig

* fix(infra): harden data worker lifecycle concurrency

Prevent races between load/unload and stateful endpoints
(fetch, reset, save, load) on the data service worker.

Key changes:
- Add _loading_ids reservation set so load_dataset does not
  hold datasets_lock across slow I/O (asyncio.to_thread)
- Add unloading flag to _DatasetState; unload_dataset drains
  in-flight state ops via state.lock before dict removal
- Introduce _locked_active_state context manager that checks
  the unloading flag; apply to fetch, reset, save, load
- Add 4 deterministic concurrency regression tests covering
  duplicate-load rejection, unload drain, stale-fetch 409,
  and cross-dataset non-blocking

---------

Co-authored-by: Wentai Zhang <zhangwentai.zwt@antgroup.com>
)

Record the first AReaL community biweekly meeting (2026/04/18) with
agenda, slides, and recording links. Add news entry to README
announcing the meeting and the next scheduled session (2026/05/01).

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@pull pull bot locked and limited conversation to collaborators Apr 20, 2026
@pull pull bot added the ⤵️ pull label Apr 20, 2026
@pull pull bot merged commit c3ba6fa into axistore80-coder:main Apr 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants