Fix accuracy_reward crash when called from non-main thread#5281
Fix accuracy_reward crash when called from non-main thread#5281qgallouedec merged 2 commits intomainfrom
accuracy_reward crash when called from non-main thread#5281Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| # Suppress the "Timeout is disabled" warnings from math_verify when we intentionally disable timeouts | ||
| if not is_main_thread: | ||
| logging.getLogger("math_verify.parser").setLevel(logging.ERROR) | ||
| logging.getLogger("math_verify.grader").setLevel(logging.ERROR) |
There was a problem hiding this comment.
Logger level permanently modified as global side effect
Medium Severity
logging.getLogger() returns a process-wide singleton, so calling setLevel(logging.ERROR) permanently suppresses warnings from math_verify.parser and math_verify.grader for the entire process. After any single call from a non-main thread, all subsequent calls — including from the main thread — will have these warnings silenced. The log levels are never restored after the function returns. A scoped approach (e.g., saving and restoring the original level in a try/finally) would avoid this persistent global side effect.


Summary
math_verify'sparse()andverify()usesignal.alarm()internally, which raisesValueErrorwhen called from a non-main thread (required for asynchronous training)parsing_timeout=None,timeout_seconds=None)accuracy_rewardandreasoning_accuracy_rewardReproduce
Before:
After
Note
Medium Risk
Changes reward evaluation behavior by disabling
math_verifytimeouts off the main thread, which avoids crashes but can allow parses/verifications to run unbounded in worker threads.Overview
Fixes
accuracy_rewardandreasoning_accuracy_rewardcrashing in non-main threads by detecting thread context and disablingmath_verify’s signal-basedparse/verifytimeouts when not on the main thread.Also suppresses the expected "timeout disabled" warnings from
math_verifyin worker threads and adds a regression test that callsaccuracy_rewardfrom athreading.Thread.Written by Cursor Bugbot for commit e0e6a76. This will update automatically on new commits. Configure here.