Add dense true-on-policy CI coverage#1245
Conversation
There was a problem hiding this comment.
Code Review
This pull request removes the --sglang-rl-on-policy-target fsdp argument across several scripts, refactors imports in miles/backends/training_utils/data.py to optimize imports, and fixes a bug in policy_loss_function by correctly assigning log_probs to train_scored_log_probs. It also introduces new end-to-end tests for Megatron true on-policy configurations and registers CPU CI for fast tests. The reviewer feedback suggests sanitizing the GITHUB_COMMIT_NAME environment variable in the new test scripts to prevent command-line or logging issues, and formatting long lines in run_simple.py and run_mcore_fsdp.py to comply with PEP 8 line length recommendations.
| commit_name = os.environ.get("GITHUB_COMMIT_NAME") | ||
| run_id = f"{commit_name}-tp2-cp4" if commit_name else f"qwen3-dense-top-tp2-cp4-ci-{U.create_run_id()}" |
There was a problem hiding this comment.
If GITHUB_COMMIT_NAME is set (for example, to a commit message or branch name), it may contain spaces, colons, slashes, or other special characters. Using it directly in run_id can cause issues with command-line parsing, directory creation, or external logging services (like WandB). Please sanitize the commit_name to ensure it only contains safe alphanumeric characters, hyphens, or underscores.
commit_name = os.environ.get("GITHUB_COMMIT_NAME")
if commit_name:
sanitized = "".join(c if c.isalnum() or c in "-_" else "_" for c in commit_name)
run_id = f"{sanitized}-tp2-cp4"
else:
run_id = f"qwen3-dense-top-tp2-cp4-ci-{U.create_run_id()}"| commit_name = os.environ.get("GITHUB_COMMIT_NAME") | ||
| run_id = f"{commit_name}-tp4-cp2" if commit_name else f"qwen3-dense-top-tp4-cp2-ci-{U.create_run_id()}" |
There was a problem hiding this comment.
If GITHUB_COMMIT_NAME is set (for example, to a commit message or branch name), it may contain spaces, colons, slashes, or other special characters. Using it directly in run_id can cause issues with command-line parsing, directory creation, or external logging services (like WandB). Please sanitize the commit_name to ensure it only contains safe alphanumeric characters, hyphens, or underscores.
commit_name = os.environ.get("GITHUB_COMMIT_NAME")
if commit_name:
sanitized = "".join(c if c.isalnum() or c in "-_" else "_" for c in commit_name)
run_id = f"{sanitized}-tp4-cp2"
else:
run_id = f"qwen3-dense-top-tp4-cp2-ci-{U.create_run_id()}"| "--deterministic-mode " | ||
| "--true-on-policy-mode " | ||
| ) | ||
| true_on_policy_args = "--sglang-enable-deterministic-inference " "--deterministic-mode " "--true-on-policy-mode " |
There was a problem hiding this comment.
The line length here is 115 characters, which violates the PEP 8 style guide recommendation of keeping lines under 79 characters. For better readability and consistency with other files (such as examples/true_on_policy/run_simple.py), please keep the parenthesized multi-line format.
true_on_policy_args = (
"--sglang-enable-deterministic-inference "
"--deterministic-mode "
"--true-on-policy-mode "
)References
- PEP 8 recommends limiting all lines to a maximum of 79 characters. (link)
| true_on_policy_args = ( | ||
| "--sglang-enable-deterministic-inference " | ||
| "--sglang-rl-on-policy-target fsdp " | ||
| "--deterministic-mode " | ||
| "--true-on-policy-mode " | ||
| "--sglang-enable-deterministic-inference " "--deterministic-mode " "--true-on-policy-mode " | ||
| ) |
There was a problem hiding this comment.
For better readability and adherence to PEP 8 line length limits, please split the concatenated string arguments across multiple lines within the parentheses, consistent with other configuration blocks in this repository.
true_on_policy_args = (
"--sglang-enable-deterministic-inference "
"--deterministic-mode "
"--true-on-policy-mode "
)References
- PEP 8 recommends limiting all lines to a maximum of 79 characters and using parenthesized implicit line continuation for readability. (link)
Summary
Tests