Async GRPO by qgallouedec · Pull Request #5293 · huggingface/trl

qgallouedec · 2026-03-16T16:38:02Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Note

Medium Risk
Adds a new async RL training path that introduces background concurrency, HTTP interaction with a vLLM server, and NCCL weight-transfer coordination; these pieces are prone to runtime/deadlock/config issues despite being isolated to experimental.

Overview
Adds a new experimental AsyncGRPOTrainer that decouples rollout generation from training by running an AsyncRolloutWorker in a background asyncio thread and consuming samples from a queue-backed IterableDataset.

Implements vLLM integration for generation and periodic weight synchronization (pause/resume + NCCL weight transfer), plus rollout scoring/advantage computation, tool-call execution support, and metric plumbing through the dataloader.

Updates docs to include a new async_grpo_trainer page in the experimental toctree, and adds a core dependency on aiohttp for async HTTP requests.

^{Written by Cursor Bugbot for commit 7228399. This will update automatically on new commits. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 495f9676da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

trl/experimental/async_grpo/async_grpo_trainer.py

trl/experimental/async_grpo/async_rollout_worker.py

trl/experimental/async_grpo/async_grpo_trainer.py

trl/experimental/async_grpo/async_rollout_worker.py

cursor · 2026-03-16T21:19:20Z

trl/experimental/async_grpo/async_rollout_worker.py

+                    logger.warning(f"Request to {path} timed out (attempt {attempt + 1}/{max_retries}), retrying...")
+                    await asyncio.sleep(1)
+                else:
+                    raise


Raw int passed as aiohttp timeout instead of ClientTimeout

High Severity

The _post, _get, and _get_status_code methods pass a raw int to the timeout parameter of aiohttp request methods. aiohttp expects an aiohttp.ClientTimeout object — passing a plain integer can cause an AttributeError at runtime when aiohttp tries to access .total on the int. Elsewhere in this repo (e.g., examples/scripts/nemo_gym/train_multi_environment.py), aiohttp.ClientTimeout(total=timeout) is used correctly.

Additional Locations (1)

trl/experimental/async_grpo/async_rollout_worker.py#L534-L542

AmineDiro · 2026-03-16T23:55:55Z

trl/experimental/async_grpo/async_grpo_trainer.py

+EnvironmentFactory = Callable[[], _SupportsReset]
+
+
+class StepIntervalCallback(TrainerCallback):


The step StepIntervalCallback could be seen like a function wrapper over Callback that provides the hook. The main issue I see it's not a concrete class, it doesn't impl any behavior that TrainerCallback doesn't provide.

I prefer the WeightSyncCallback, but it's a matter of taste both work.

AmineDiro · 2026-03-16T23:58:17Z

trl/experimental/async_grpo/async_rollout_worker.py

+            except (TimeoutError, asyncio.TimeoutError):
+                if attempt < max_retries - 1:
+                    logger.warning(f"Request to {path} timed out (attempt {attempt + 1}/{max_retries}), retrying...")
+                    await asyncio.sleep(1)


should probably be configurable as vllm_timeout

AmineDiro · 2026-03-16T23:59:38Z

trl/experimental/async_grpo/async_rollout_worker.py

+                else:
+                    raise
+
+    async def _get(self, path: str, timeout: int = 30) -> dict:


get should have same retry mechanism as _post. could be good to have them in separate vllmCllient implementation

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-17T00:18:06Z

trl/experimental/async_grpo/async_rollout_worker.py

+                    logger.warning(f"Request to {path} timed out (attempt {attempt + 1}/{max_retries}), retrying...")
+                    await asyncio.sleep(1)
+                else:
+                    raise


_post silently returns None when retries exhausted

Low Severity

The _post method's for loop has no fallback return or raise after it. If max_retries is 0, the function silently returns None, and the caller (_generate_one_turn) would crash with a TypeError when indexing output["choices"]. While the default max_retries=3 prevents this in normal use, the function's return type annotation (-> dict) makes an implicit None return incorrect.

first commit

495f967

chatgpt-codex-connector bot reviewed Mar 16, 2026

View reviewed changes

trl/experimental/async_grpo/async_grpo_trainer.py Outdated Show resolved Hide resolved

trl/experimental/async_grpo/async_rollout_worker.py Outdated Show resolved Hide resolved

trl/experimental/async_grpo/async_grpo_trainer.py Outdated Show resolved Hide resolved

cursor bot reviewed Mar 16, 2026

View reviewed changes

trl/experimental/async_grpo/async_grpo_trainer.py Show resolved Hide resolved

trl/experimental/async_grpo/async_rollout_worker.py Show resolved Hide resolved

qgallouedec added 2 commits March 16, 2026 16:48

consistency

6a35777

fix

1652027

cursor bot reviewed Mar 16, 2026

View reviewed changes

trl/experimental/async_grpo/async_rollout_worker.py Outdated Show resolved Hide resolved

qgallouedec and others added 4 commits March 16, 2026 17:34

address review

dcd9bdb

wire timeout + fix name tool

a7fecc2

Merge branch 'main' into async-grpo

5d2327e

add buffer queue size metric to samples in AsyncRolloutWorker

9e7706e

cursor bot reviewed Mar 16, 2026

View reviewed changes

added aiohttp

7228399

AmineDiro reviewed Mar 17, 2026

View reviewed changes

cursor bot reviewed Mar 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async GRPO#5293

Async GRPO#5293
qgallouedec wants to merge 8 commits intomainfrom
async-grpo

qgallouedec commented Mar 16, 2026 •

edited by cursor bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Mar 16, 2026

Uh oh!

AmineDiro Mar 16, 2026

Uh oh!

AmineDiro Mar 16, 2026

Uh oh!

AmineDiro Mar 16, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		EnvironmentFactory = Callable[[], _SupportsReset]


		class StepIntervalCallback(TrainerCallback):

Conversation

qgallouedec commented Mar 16, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Mar 16, 2026

Choose a reason for hiding this comment

Raw int passed as aiohttp timeout instead of ClientTimeout

Uh oh!

AmineDiro Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

AmineDiro Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

AmineDiro Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 17, 2026

Choose a reason for hiding this comment

_post silently returns None when retries exhausted

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qgallouedec commented Mar 16, 2026 •

edited by cursor bot

Loading

`_post` silently returns None when retries exhausted