Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 495f9676da
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| logger.warning(f"Request to {path} timed out (attempt {attempt + 1}/{max_retries}), retrying...") | ||
| await asyncio.sleep(1) | ||
| else: | ||
| raise |
There was a problem hiding this comment.
Raw int passed as aiohttp timeout instead of ClientTimeout
High Severity
The _post, _get, and _get_status_code methods pass a raw int to the timeout parameter of aiohttp request methods. aiohttp expects an aiohttp.ClientTimeout object — passing a plain integer can cause an AttributeError at runtime when aiohttp tries to access .total on the int. Elsewhere in this repo (e.g., examples/scripts/nemo_gym/train_multi_environment.py), aiohttp.ClientTimeout(total=timeout) is used correctly.
Additional Locations (1)
| EnvironmentFactory = Callable[[], _SupportsReset] | ||
|
|
||
|
|
||
| class StepIntervalCallback(TrainerCallback): |
There was a problem hiding this comment.
The step StepIntervalCallback could be seen like a function wrapper over Callback that provides the hook. The main issue I see it's not a concrete class, it doesn't impl any behavior that TrainerCallback doesn't provide.
I prefer the WeightSyncCallback, but it's a matter of taste both work.
| except (TimeoutError, asyncio.TimeoutError): | ||
| if attempt < max_retries - 1: | ||
| logger.warning(f"Request to {path} timed out (attempt {attempt + 1}/{max_retries}), retrying...") | ||
| await asyncio.sleep(1) |
There was a problem hiding this comment.
should probably be configurable as vllm_timeout
| else: | ||
| raise | ||
|
|
||
| async def _get(self, path: str, timeout: int = 30) -> dict: |
There was a problem hiding this comment.
get should have same retry mechanism as _post. could be good to have them in separate vllmCllient implementation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| logger.warning(f"Request to {path} timed out (attempt {attempt + 1}/{max_retries}), retrying...") | ||
| await asyncio.sleep(1) | ||
| else: | ||
| raise |
There was a problem hiding this comment.
_post silently returns None when retries exhausted
Low Severity
The _post method's for loop has no fallback return or raise after it. If max_retries is 0, the function silently returns None, and the caller (_generate_one_turn) would crash with a TypeError when indexing output["choices"]. While the default max_retries=3 prevents this in normal use, the function's return type annotation (-> dict) makes an implicit None return incorrect.


What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Note
Medium Risk
Adds a new async RL training path that introduces background concurrency, HTTP interaction with a vLLM server, and NCCL weight-transfer coordination; these pieces are prone to runtime/deadlock/config issues despite being isolated to
experimental.Overview
Adds a new experimental
AsyncGRPOTrainerthat decouples rollout generation from training by running anAsyncRolloutWorkerin a background asyncio thread and consuming samples from a queue-backedIterableDataset.Implements vLLM integration for generation and periodic weight synchronization (pause/resume + NCCL weight transfer), plus rollout scoring/advantage computation, tool-call execution support, and metric plumbing through the dataloader.
Updates docs to include a new
async_grpo_trainerpage in the experimental toctree, and adds a core dependency onaiohttpfor async HTTP requests.Written by Cursor Bugbot for commit 7228399. This will update automatically on new commits. Configure here.