Skip to content

Add dense true-on-policy CI coverage#1245

Open
maocheng23 wants to merge 4 commits into
mainfrom
maocheng/dense-ci-coverage-pr1198
Open

Add dense true-on-policy CI coverage#1245
maocheng23 wants to merge 4 commits into
mainfrom
maocheng/dense-ci-coverage-pr1198

Conversation

@maocheng23
Copy link
Copy Markdown
Contributor

@maocheng23 maocheng23 commented May 28, 2026

Summary

  • build on the dense Qwen3 true-on-policy e2e gates from [1/12] Add dense Qwen3 true-on-policy CI gates #1198 and make the per-layout run IDs unique when GITHUB_COMMIT_NAME is set
  • register the existing true-on-policy fast/unit tests in stage-a-cpu
  • wire the existing run-ci-true-on-policy GitHub label through the CI registry so it selects the dense H100 e2e cases
  • fix stale true-on-policy test imports/config assumptions and the train-vs-rollout logprob diff metric
  • make tests/ci/run_suite.py directly executable by ensuring the repo root is importable

Tests

  • git diff --check
  • python tests/ci/run_suite.py --hw cpu --suite stage-a-cpu --list-only
  • python tests/ci/run_suite.py --hw cuda --suite stage-c-8-gpu-h100 --labels run-ci-true-on-policy --list-only
  • python tests/ci/run_suite.py --hw cuda --suite stage-c-8-gpu-h100 --labels run-ci-precision --list-only
  • python -m pytest -q -o addopts='' --confcutdir=tests/fast/true_on_policy tests/fast/true_on_policy/test_config.py tests/fast/true_on_policy/test_run_qwen3_4b.py
  • python -m pytest -q -o addopts='' --confcutdir=tests/fast/backends/training_utils tests/fast/backends/training_utils/test_true_on_policy_data_log_dtype.py tests/fast/backends/training_utils/test_true_on_policy_loss_metrics.py
  • python -m pytest -q -o addopts='' --confcutdir=tests/fast/utils tests/fast/utils/test_true_on_policy_logprobs.py
  • python -m pytest -q tests/ci/test_ci_register.py tests/ci/test_run_suite.py tests/ci/test_labels.py

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the --sglang-rl-on-policy-target fsdp argument across several scripts, refactors imports in miles/backends/training_utils/data.py to optimize imports, and fixes a bug in policy_loss_function by correctly assigning log_probs to train_scored_log_probs. It also introduces new end-to-end tests for Megatron true on-policy configurations and registers CPU CI for fast tests. The reviewer feedback suggests sanitizing the GITHUB_COMMIT_NAME environment variable in the new test scripts to prevent command-line or logging issues, and formatting long lines in run_simple.py and run_mcore_fsdp.py to comply with PEP 8 line length recommendations.

Comment on lines +17 to +18
commit_name = os.environ.get("GITHUB_COMMIT_NAME")
run_id = f"{commit_name}-tp2-cp4" if commit_name else f"qwen3-dense-top-tp2-cp4-ci-{U.create_run_id()}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If GITHUB_COMMIT_NAME is set (for example, to a commit message or branch name), it may contain spaces, colons, slashes, or other special characters. Using it directly in run_id can cause issues with command-line parsing, directory creation, or external logging services (like WandB). Please sanitize the commit_name to ensure it only contains safe alphanumeric characters, hyphens, or underscores.

    commit_name = os.environ.get("GITHUB_COMMIT_NAME")
    if commit_name:
        sanitized = "".join(c if c.isalnum() or c in "-_" else "_" for c in commit_name)
        run_id = f"{sanitized}-tp2-cp4"
    else:
        run_id = f"qwen3-dense-top-tp2-cp4-ci-{U.create_run_id()}"

Comment on lines +17 to +18
commit_name = os.environ.get("GITHUB_COMMIT_NAME")
run_id = f"{commit_name}-tp4-cp2" if commit_name else f"qwen3-dense-top-tp4-cp2-ci-{U.create_run_id()}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If GITHUB_COMMIT_NAME is set (for example, to a commit message or branch name), it may contain spaces, colons, slashes, or other special characters. Using it directly in run_id can cause issues with command-line parsing, directory creation, or external logging services (like WandB). Please sanitize the commit_name to ensure it only contains safe alphanumeric characters, hyphens, or underscores.

    commit_name = os.environ.get("GITHUB_COMMIT_NAME")
    if commit_name:
        sanitized = "".join(c if c.isalnum() or c in "-_" else "_" for c in commit_name)
        run_id = f"{sanitized}-tp4-cp2"
    else:
        run_id = f"qwen3-dense-top-tp4-cp2-ci-{U.create_run_id()}"

"--deterministic-mode "
"--true-on-policy-mode "
)
true_on_policy_args = "--sglang-enable-deterministic-inference " "--deterministic-mode " "--true-on-policy-mode "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The line length here is 115 characters, which violates the PEP 8 style guide recommendation of keeping lines under 79 characters. For better readability and consistency with other files (such as examples/true_on_policy/run_simple.py), please keep the parenthesized multi-line format.

    true_on_policy_args = (
        "--sglang-enable-deterministic-inference "
        "--deterministic-mode "
        "--true-on-policy-mode "
    )
References
  1. PEP 8 recommends limiting all lines to a maximum of 79 characters. (link)

Comment thread scripts/run_mcore_fsdp.py
Comment on lines 217 to 219
true_on_policy_args = (
"--sglang-enable-deterministic-inference "
"--sglang-rl-on-policy-target fsdp "
"--deterministic-mode "
"--true-on-policy-mode "
"--sglang-enable-deterministic-inference " "--deterministic-mode " "--true-on-policy-mode "
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and adherence to PEP 8 line length limits, please split the concatenated string arguments across multiple lines within the parentheses, consistent with other configuration blocks in this repository.

        true_on_policy_args = (
            "--sglang-enable-deterministic-inference "
            "--deterministic-mode "
            "--true-on-policy-mode "
        )
References
  1. PEP 8 recommends limiting all lines to a maximum of 79 characters and using parenthesized implicit line continuation for readability. (link)

@maocheng23 maocheng23 added the run-ci-true-on-policy Run dense true-on-policy E2E CI label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-ci-true-on-policy Run dense true-on-policy E2E CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant