Add 32b fully async swe agent training example in skyrl by Weili-0234 · Pull Request #26 · ThunderAgent-org/ThunderAgent

Weili-0234 · 2026-03-04T03:42:13Z

This PR extends the existing ThunderAgent SkyRL example with a 32B Qwen3 fully-async RL example, including:

Add a complete 32B training example, including launch orchestration, profiling, and SWE-bench image pre-pull
Add a 6-node SLURM reproducibility pipeline and separate wrappers for default/TR router
Update the training/inference integration path for fully-async Megatron workloads (generation loop, inference endpoint/client integration, worker/runtime behavior, and related trainer logic)
Add docs and setup templates for credentials and cluster-specific configs

HaoKang-Timmy · 2026-03-04T20:44:54Z

I wonder if we could seperate this to another branch of this repo so we could keep improvement, thanks!

Weili-0234 · 2026-03-04T21:12:43Z

Yeah sure!

Weili-0234 added 4 commits March 3, 2026 19:23

migrate(skyrl): sync 32b fully-async core trainer/megatron changes

cb80d72

add 6-node megatron repro pipeline scripts

0f17595

add doc/guide for 32b rl exp on mini swe

8bd9af4

add credential/cluster templates

0438893

Weili-0234 marked this pull request as draft March 4, 2026 21:12

Weili-0234 changed the base branch from main to dev/skyrl-fully-async-swe-32b March 4, 2026 21:17

Weili-0234 marked this pull request as ready for review March 4, 2026 21:18

Provide feedback