Skip to content

Async GRPO#5293

Open
qgallouedec wants to merge 8 commits intomainfrom
async-grpo
Open

Async GRPO#5293
qgallouedec wants to merge 8 commits intomainfrom
async-grpo

Conversation

@qgallouedec
Copy link
Member

@qgallouedec qgallouedec commented Mar 16, 2026

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.


Note

Medium Risk
Adds a new async RL training path that introduces background concurrency, HTTP interaction with a vLLM server, and NCCL weight-transfer coordination; these pieces are prone to runtime/deadlock/config issues despite being isolated to experimental.

Overview
Adds a new experimental AsyncGRPOTrainer that decouples rollout generation from training by running an AsyncRolloutWorker in a background asyncio thread and consuming samples from a queue-backed IterableDataset.

Implements vLLM integration for generation and periodic weight synchronization (pause/resume + NCCL weight transfer), plus rollout scoring/advantage computation, tool-call execution support, and metric plumbing through the dataloader.

Updates docs to include a new async_grpo_trainer page in the experimental toctree, and adds a core dependency on aiohttp for async HTTP requests.

Written by Cursor Bugbot for commit 7228399. This will update automatically on new commits. Configure here.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 495f9676da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

logger.warning(f"Request to {path} timed out (attempt {attempt + 1}/{max_retries}), retrying...")
await asyncio.sleep(1)
else:
raise
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw int passed as aiohttp timeout instead of ClientTimeout

High Severity

The _post, _get, and _get_status_code methods pass a raw int to the timeout parameter of aiohttp request methods. aiohttp expects an aiohttp.ClientTimeout object — passing a plain integer can cause an AttributeError at runtime when aiohttp tries to access .total on the int. Elsewhere in this repo (e.g., examples/scripts/nemo_gym/train_multi_environment.py), aiohttp.ClientTimeout(total=timeout) is used correctly.

Additional Locations (1)
Fix in Cursor Fix in Web

EnvironmentFactory = Callable[[], _SupportsReset]


class StepIntervalCallback(TrainerCallback):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The step StepIntervalCallback could be seen like a function wrapper over Callback that provides the hook. The main issue I see it's not a concrete class, it doesn't impl any behavior that TrainerCallback doesn't provide.

I prefer the WeightSyncCallback, but it's a matter of taste both work.

except (TimeoutError, asyncio.TimeoutError):
if attempt < max_retries - 1:
logger.warning(f"Request to {path} timed out (attempt {attempt + 1}/{max_retries}), retrying...")
await asyncio.sleep(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably be configurable as vllm_timeout

else:
raise

async def _get(self, path: str, timeout: int = 30) -> dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get should have same retry mechanism as _post. could be good to have them in separate vllmCllient implementation

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

logger.warning(f"Request to {path} timed out (attempt {attempt + 1}/{max_retries}), retrying...")
await asyncio.sleep(1)
else:
raise
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_post silently returns None when retries exhausted

Low Severity

The _post method's for loop has no fallback return or raise after it. If max_retries is 0, the function silently returns None, and the caller (_generate_one_turn) would crash with a TypeError when indexing output["choices"]. While the default max_retries=3 prevents this in normal use, the function's return type annotation (-> dict) makes an implicit None return incorrect.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants