Add gh-aw rerun review scanner#35685
Conversation
Teach /review rerun to run a deterministic activity check for new comments or commits and apply s/agent-ready-for-rerun when another AI review is justified. Also add rerun guidance to generated AI Summary comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Treat label application as successful when the ready-for-rerun label is present after the GitHub API call, avoiding false failures from brittle gh exit-code handling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Post label additions with the GitHub Issues API JSON payload shape so /review rerun can reliably apply s/agent-ready-for-rerun from GitHub Actions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Grant pull-requests: write to the /review rerun labeling job so it can apply s/agent-ready-for-rerun to pull requests after deterministic eligibility passes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Generate a deterministic rerun context artifact listing new comments and commits since the latest AI Summary or previous /review rerun checkpoint, and instruct pre-flight to read it before reviewing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an hourly gh-aw scanner that reviews PRs labeled s/agent-ready-for-rerun, asks AI for a trigger/skip decision, then uses a custom safe-output job to react to the /review rerun comment, remove the queue label, and trigger the existing AzDO review pipeline deterministically. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Allow Resolve-RerunEligibility.ps1 to be dot-sourced by the queued PR query helper without binding required CLI parameters. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the GitHub repository owner context in the gh-aw scanner safe-output job while keeping the repository name fixed, so fork dry-runs can inspect their own queued PRs without template injection risk. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| --- | ||
| on: | ||
| schedule: | ||
| - cron: "0 * * * *" |
There was a problem hiding this comment.
What is your concurrency strategy for reviews?
What prevents 2+ reviews for the same PR to run ?
There was a problem hiding this comment.
I was thinking about running it once per hour. WDYT?
There was a problem hiding this comment.
Well the job that is triggered happens in an async-fashion, and I do not see anything acting like a "lock" (either repo-memory function of GH AW - to keep track if disposed review ;; or perhaps some label or reaction on the PR to review).
If the run at 9AM triggers the (async) job to do the review, what prevents the run at 10AM to do the same? (if the first one is not finished)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pass Owner and Repo through when dot-sourcing Resolve-RerunEligibility so Query-RerunReadyPRs does not reset to dotnet/maui and can scan fork/test repositories correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Parse the latest normal /review command and carry its platform and pipeline branch through rerun scanner candidates and safe-output triggering so /review rerun follows the same target rules as the original review request. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Require /review rerun commenters to have OWNER, MEMBER, COLLABORATOR, or CONTRIBUTOR association before applying the rerun-ready label. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen
left a comment
There was a problem hiding this comment.
Adversarial PR Review — 3 reviewers, consensus-validated
3 independent reviewers (different model families) analyzed this PR in parallel without seeing each other's findings or this PR's existing review threads. Findings below survived adversarial consensus (3/3 unanimous, 2/3 corroborated, or 1/3 with direct source-code verification).
Verdict
Concerns before merge. Three blocking issues plus a 2/3-consensus race condition that directly corroborates @T-Gro's unresolved concurrency concern.
Blocking (❌)
| # | Issue | Consensus |
|---|---|---|
| 1 | /review rerun bypasses the actor permission check that /review enforces |
3/3 unanimous |
| 2 | Scanner aborts the entire batch on a skip with missing rerun_comment_id — which the prompt itself instructs the agent to send as 0 |
1/3 + verified |
| 3 | Manual /review never removes s/agent-ready-for-rerun, causing a duplicate review on the next hourly scanner run |
1/3 + verified |
Should fix (⚠️ )
| # | Issue | Consensus |
|---|---|---|
| 4 | Label-based "lock" is non-atomic check-then-set, and the scanner and /review workflows have no shared concurrency group — directly corroborates @T-Gro's concern |
2/3 |
| 5 | /review --branch refs/heads/foo produces refs/heads/refs/heads/foo and breaks the AzDO trigger |
1/3 + verified |
| 6 | --platform=ios (equals form) is silently ignored — user thinks iOS was selected; pipeline runs on Android |
1/3 + verified |
| 7 | Editing the AI Summary advances the rerun checkpoint via updated_at, silently dropping eligible reruns |
1/3 |
| 8 | +1 reaction posted before AzDO trigger; on failure, the misleading +1 stays as a "review started" signal |
1/3 + verified |
(Inline comments below have file:line context, repro scenarios, and fix suggestions.)
Design-level observations (not posted as inline)
expected_head_shaempty-string bypass (Invoke-RerunReviewTrigger.ps1:181) —if ($expectedHeadSha -and …)short-circuits on empty string, skipping the head-SHA staleness check. gh-aw'srequired: trueenforces presence, not non-emptiness; treating empty as invalid is cheap defense-in-depth.- Stale-lock recovery direction (
Test-AgentReviewInProgressIsStale) — on events-API error or missinglabeledevent, returns$false(fresh). If the labeling event ages out or is unreadable, a lock that was never cleared can wedge a PR indefinitely. Falling back to issueupdated_atwould let the 18-hour window still elapse. - Heredoc delimiter
EOF(rerun-review-scanner.md:114-116) — un-randomized. Practically near-impossible to trigger via JSON-encoded content, butEOF_$(openssl rand -hex 8)is the documented hardening. - Test coverage gap —
Invoke-RerunReviewTrigger.ps1andQuery-RerunReadyPRs.ps1have no Pester tests. Issues #2, #5, #6, #8 above would have been caught by basic argument-validation andNormalize-PipelineRefcoverage.
Prior review threads (acknowledgement)
- @T-Gro asked twice (2026-06-01 and again on 2026-06-02 against
b84cc7f3f): "What is your concurrency strategy? What prevents 2+ reviews for the same PR?" — Finding #4 above (the 2/3-consensus race) is the same issue, now with a concrete trigger sequence: scanner-vs-/reviewworkflows have no shared concurrency group andSet-AgentReviewInProgressis check-then-set. Hourly serialization only protects scanner-vs-scanner. - @JanKrivanek asked where
trigger_rerun_reviewis defined. It is generated by gh-aw from thesafe-outputs.jobs.trigger-rerun-review:block in.github/workflows/rerun-review-scanner.md. The hyphenated job key compiles into an underscored tool name (trigger_rerun_review) for the agent. See compiled lock.yml around line 469.
Methodology
3 reviewers with different model families ran in parallel:
- Reviewer 1: deep-reasoning model, focused on subtle logic and concurrency walkthrough
- Reviewer 2: pattern-matching model, focused on security and common bug classes
- Reviewer 3: cross-family model, focused on edge cases
Disputed (1/3) findings were either verified directly against source code or discarded. Findings about issues already raised in existing review threads were merged with attribution rather than re-flagged. CI build/test status was intentionally out of scope.
| group: review-rerun-${{ github.event.issue.number }} | ||
| cancel-in-progress: false | ||
| timeout-minutes: 5 | ||
| permissions: |
There was a problem hiding this comment.
❌ Security — mark-rerun-ready lacks the actor permission check that trigger-review enforces (lines 127-138 in this same file). Any commenter — including a first-time contributor on their own PR — can post /review rerun and successfully apply s/agent-ready-for-rerun, since their own comment is fresh evidence past the AI Summary checkpoint. The hourly scanner then picks it up and triggers a full AzDO review. This is a privilege escalation: the direct-trigger path is gated; the rerun path is not.
Flagged by: 3/3 reviewers (unanimous)
Fix: Add the same Check actor permission step (write/maintain/admin) to mark-rerun-ready before invoking Resolve-RerunEligibility.ps1 -ApplyLabel.
| if ($prNumber -le 0) { | ||
| throw "Invalid PR number '$($item.pr_number)'." | ||
| } | ||
| if ($rerunCommentId -le 0) { |
There was a problem hiding this comment.
❌ Logic — This validation runs before branching on $decision. But the scanner prompt explicitly tells the agent: "If it is missing, choose skip and use 0." (rerun-review-scanner.md:157). And Query-RerunReadyPRs.ps1:99 already produces rerunCommentId = $null (→ 0 after [Int64] cast) whenever a PR was queued without a /review rerun comment (e.g. a maintainer applied the queue label manually).
So a valid skip with 0 always throws here. Worse: the foreach at line 158 has no per-item try/catch, so $ErrorActionPreference = 'Stop' propagates the throw and aborts processing of every remaining candidate in the batch — one bad item silently drops the rest.
Flagged by: 1/3 reviewer; verified directly against source.
Fix: Only require rerun_comment_id > 0 on the trigger branch; allow skip with 0 (and guard Add-CommentReaction on >0). Wrap the per-candidate body in try/catch { Write-Error; continue } so one malformed item doesn't kill the whole batch.
| id: review_lock | ||
| shell: pwsh | ||
| env: | ||
| GH_TOKEN: ${{ github.token }} |
There was a problem hiding this comment.
❌ State Bug — trigger-review acquires s/agent-review-in-progress here, but the job never removes s/agent-ready-for-rerun if it's present.
Scenario: PR has s/agent-ready-for-rerun (queued). Maintainer types /review directly. trigger-review sets the in-progress lock, runs AzDO, AzDO's CleanupReviewLock stage removes the in-progress label. The queue label is still there. Next hourly scanner pass sees a queued PR with no in-progress lock → triggers a second review.
Flagged by: 1/3 reviewer; verified directly against source.
Fix: After successful Set-AgentReviewInProgress, also remove s/agent-ready-for-rerun (best-effort with --silent is fine — it may not be present).
| # ============================================================ | ||
| # Set-AgentReviewInProgress | ||
| # ============================================================ | ||
| function Set-AgentReviewInProgress { |
There was a problem hiding this comment.
Set-AgentReviewInProgress is check-then-set: Get-AgentLabels → if (-not contains) → Add-Label → Get-AgentLabels (verify). GitHub's add-label API is idempotent — two concurrent callers will both pass the -notcontains check and both call Add-Label, and both success returns will report "the label is present".
This matters because the scanner workflow uses concurrency: gh-aw-${{ github.workflow }} (scanner-vs-scanner only) and review-trigger.yml uses concurrency: review-trigger-${{ github.event.issue.number }} (per-PR direct trigger only). These are different groups in different workflows, so the safe-output job and the /review job can race on the same PR with no serialization. This is exactly the unresolved concern @T-Gro raised twice on this PR.
Flagged by: 2/3 reviewers (lowered to
Fix: Either (a) route both workflows through the same concurrency group keyed by PR number (e.g. pr-review-${{ pr_number }}), or (b) make acquisition tag-based — write the label with a unique run-id marker (e.g. as a comment or via a label-prefix the script stamps with its own run-id), then re-read and abort if the marker doesn't match this run.
| resources = @{ | ||
| repositories = @{ | ||
| self = @{ | ||
| refName = "refs/heads/$PipelineRef" |
There was a problem hiding this comment.
refName = "refs/heads/$PipelineRef" blindly prepends refs/heads/ even when $PipelineRef already starts with it. Normalize-PipelineRef strips most metacharacters but allows /, so refs/heads/main survives unchanged → final refName is refs/heads/refs/heads/main, which AzDO rejects.
User scenario: someone types /review --branch refs/heads/main (a perfectly reasonable expectation given the rest of .github/ uses refs/heads/...). The pipeline trigger fails, the catch block re-throws, the entire scanner batch aborts (see comment on line 171). Same hazard exists in the bash version of this prepend at review-trigger.yml:369.
Flagged by: 1/3 reviewer; verified directly against source.
Fix: In Normalize-PipelineRef (and the equivalent block in Resolve-RerunEligibility.ps1 and review-trigger.yml), strip a leading refs/heads/ before sanitization.
| } | ||
| continue | ||
| } | ||
| '^(--platform|-p)$' { |
There was a problem hiding this comment.
^(--platform|-p)$ only matches the flag in its bare form. /review --platform=ios (with an equals sign — the form most CLIs accept) does NOT match. Falls through to the default case which checks $candidate = $token.ToLowerInvariant() against the valid platform list — --platform=ios is not a valid platform name, so it's silently dropped and $platform stays empty.
User thinks they selected iOS; the pipeline runs on android (the fallback in Get-PlatformFromLabels).
Same hazard exists in review-trigger.yml's bash parser (lines 170-200) — it also only handles the space-separated form.
Flagged by: 1/3 reviewer; verified directly against source.
Fix: Pre-process tokens to split on = before iterating ($tokens = $tokens | ForEach-Object { $_ -split '=', 2 }), or extend the regex: ^(--platform|-p)(=(.+))?$ and capture group 3.
|
|
||
| return @($Comments | | ||
| Where-Object { $_.body -and ([string]$_.body).Contains($AISummaryMarker) } | | ||
| Sort-Object @{ Expression = { Get-ObjectDate $_ 'updated_at' }; Descending = $true }, @{ Expression = { [Int64]$_.id }; Descending = $true } | |
There was a problem hiding this comment.
Get-LatestAISummaryComment sorts by updated_at, and downstream summaryUpdatedAt is used as the eligibility checkpoint (lines 353, 490). If a maintainer edits the AI Summary comment after new reviewer comments/commits have landed (typo fix, formatting), updated_at jumps ahead of that real activity → Test-HasEvidenceCommentAfter and Test-HasCommitAfter see nothing past the checkpoint → eligibility returns no-new-comments-or-commits and a legitimate /review rerun is silently dropped.
Flagged by: 1/3 reviewer.
Fix: Anchor the checkpoint to the comment's created_at rather than updated_at, or (better) to the SHA embedded in the SESSION marker via Get-LatestReviewedSha, which is the actual state the previous review observed.
| } | ||
| } | ||
|
|
||
| Add-CommentReaction -CommentId $rerunCommentId -Content '+1' |
There was a problem hiding this comment.
+1 reaction is posted before Invoke-AzDOReviewPipeline. If the AzDO trigger throws, the catch block at line 221 clears the in-progress lock but does NOT remove the +1 reaction or restore the queue label. The user sees a +1 (meaning "your rerun is starting") but no review actually started. Combined with the loop-abort issue at line 171, the same throw also kills processing of every later candidate in the batch.
Flagged by: 1/3 reviewer; verified directly against source.
Fix: Post the +1 reaction only after Invoke-AzDOReviewPipeline returns successfully. On failure, either remove the reaction, post a -1, or leave the queue label in place so the next scanner pass retries.
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Summary
/review reruncommand implementation out of Enhance MauiBot AI summary review output #35677 and into this follow-up PR./review reruneligibility checks for new comments or commits and appliess/agent-ready-for-rerunwhen another review is justified.rerun-review-scanner.md+ compiled.lock.yml) that scans queued PRs, asks AI for atrigger/skipdecision, and uses a custom safe-output job for reactions, queue-label cleanup, and AzDO triggering.Validation
gh aw compile .github/workflows/rerun-review-scanner.mdSystem.Management.Automation.Language.Parser..github/workflows/rerun-review-scanner.lock.ymland.github/workflows/review-trigger.ymlas YAML.Invoke-Pester .github/scripts/Resolve-RerunEligibility.Tests.ps1,.github/scripts/Post-AISummaryComment.Tests.ps1,.github/scripts/Remove-StaleMauiBotComments.Tests.ps1 -CIQuery-RerunReadyPRs.ps1 -MaxPRs 1 -OutputPath /tmp/rerun-candidates.jsonDepends on
/review reruninstruction text.