Skip to content

Add gh-aw rerun review scanner#35685

Open
kubaflo wants to merge 13 commits into
feature/enhanced-reviewerfrom
feature/rerun-review-scanner
Open

Add gh-aw rerun review scanner#35685
kubaflo wants to merge 13 commits into
feature/enhanced-reviewerfrom
feature/rerun-review-scanner

Conversation

@kubaflo
Copy link
Copy Markdown
Contributor

@kubaflo kubaflo commented Jun 1, 2026

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Summary

  • Moves the actual /review rerun command implementation out of Enhance MauiBot AI summary review output #35677 and into this follow-up PR.
  • Adds deterministic /review rerun eligibility checks for new comments or commits and applies s/agent-ready-for-rerun when another review is justified.
  • Adds deterministic rerun context for pre-flight so the next review can focus on activity since the prior AI Summary / rerun checkpoint.
  • Adds an hourly gh-aw workflow (rerun-review-scanner.md + compiled .lock.yml) that scans queued PRs, asks AI for a trigger / skip decision, and uses a custom safe-output job for reactions, queue-label cleanup, and AzDO triggering.
  • Adds helper scripts for querying queued PRs and processing scanner decisions.

Validation

  • gh aw compile .github/workflows/rerun-review-scanner.md
  • Parsed changed PowerShell scripts with System.Management.Automation.Language.Parser.
  • Parsed .github/workflows/rerun-review-scanner.lock.yml and .github/workflows/review-trigger.yml as YAML.
  • Invoke-Pester .github/scripts/Resolve-RerunEligibility.Tests.ps1,.github/scripts/Post-AISummaryComment.Tests.ps1,.github/scripts/Remove-StaleMauiBotComments.Tests.ps1 -CI
  • Read-only query helper run: Query-RerunReadyPRs.ps1 -MaxPRs 1 -OutputPath /tmp/rerun-candidates.json

Depends on

Copilot AI added 8 commits June 1, 2026 15:26
Teach /review rerun to run a deterministic activity check for new comments or commits and apply s/agent-ready-for-rerun when another AI review is justified. Also add rerun guidance to generated AI Summary comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Treat label application as successful when the ready-for-rerun label is present after the GitHub API call, avoiding false failures from brittle gh exit-code handling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Post label additions with the GitHub Issues API JSON payload shape so /review rerun can reliably apply s/agent-ready-for-rerun from GitHub Actions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Grant pull-requests: write to the /review rerun labeling job so it can apply s/agent-ready-for-rerun to pull requests after deterministic eligibility passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Generate a deterministic rerun context artifact listing new comments and commits since the latest AI Summary or previous /review rerun checkpoint, and instruct pre-flight to read it before reviewing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an hourly gh-aw scanner that reviews PRs labeled s/agent-ready-for-rerun, asks AI for a trigger/skip decision, then uses a custom safe-output job to react to the /review rerun comment, remove the queue label, and trigger the existing AzDO review pipeline deterministically.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Allow Resolve-RerunEligibility.ps1 to be dot-sourced by the queued PR query helper without binding required CLI parameters.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the GitHub repository owner context in the gh-aw scanner safe-output job while keeping the repository name fixed, so fork dry-runs can inspect their own queued PRs without template injection risk.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added the area-infrastructure CI, Maestro / Coherency, upstream dependencies/versions label Jun 1, 2026
---
on:
schedule:
- cron: "0 * * * *"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is your concurrency strategy for reviews?
What prevents 2+ reviews for the same PR to run ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about running it once per hour. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the job that is triggered happens in an async-fashion, and I do not see anything acting like a "lock" (either repo-memory function of GH AW - to keep track if disposed review ;; or perhaps some label or reaction on the PR to review).

If the run at 9AM triggers the (async) job to do the review, what prevents the run at 10AM to do the same? (if the first one is not finished)

Comment thread .github/workflows/rerun-review-scanner.md Outdated
@PureWeen PureWeen added the area-ai-agents Copilot CLI agents, agent skills, AI-assisted development label Jun 1, 2026
Copilot AI added 2 commits June 2, 2026 13:14
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI added 3 commits June 2, 2026 21:26
Pass Owner and Repo through when dot-sourcing Resolve-RerunEligibility so Query-RerunReadyPRs does not reset to dotnet/maui and can scan fork/test repositories correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Parse the latest normal /review command and carry its platform and pipeline branch through rerun scanner candidates and safe-output triggering so /review rerun follows the same target rules as the original review request.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Require /review rerun commenters to have OWNER, MEMBER, COLLABORATOR, or CONTRIBUTOR association before applying the rerun-ready label.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@PureWeen PureWeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial PR Review — 3 reviewers, consensus-validated

3 independent reviewers (different model families) analyzed this PR in parallel without seeing each other's findings or this PR's existing review threads. Findings below survived adversarial consensus (3/3 unanimous, 2/3 corroborated, or 1/3 with direct source-code verification).

Verdict

Concerns before merge. Three blocking issues plus a 2/3-consensus race condition that directly corroborates @T-Gro's unresolved concurrency concern.

Blocking (❌)

# Issue Consensus
1 /review rerun bypasses the actor permission check that /review enforces 3/3 unanimous
2 Scanner aborts the entire batch on a skip with missing rerun_comment_id — which the prompt itself instructs the agent to send as 0 1/3 + verified
3 Manual /review never removes s/agent-ready-for-rerun, causing a duplicate review on the next hourly scanner run 1/3 + verified

Should fix (⚠️)

# Issue Consensus
4 Label-based "lock" is non-atomic check-then-set, and the scanner and /review workflows have no shared concurrency group — directly corroborates @T-Gro's concern 2/3
5 /review --branch refs/heads/foo produces refs/heads/refs/heads/foo and breaks the AzDO trigger 1/3 + verified
6 --platform=ios (equals form) is silently ignored — user thinks iOS was selected; pipeline runs on Android 1/3 + verified
7 Editing the AI Summary advances the rerun checkpoint via updated_at, silently dropping eligible reruns 1/3
8 +1 reaction posted before AzDO trigger; on failure, the misleading +1 stays as a "review started" signal 1/3 + verified

(Inline comments below have file:line context, repro scenarios, and fix suggestions.)

Design-level observations (not posted as inline)

  • expected_head_sha empty-string bypass (Invoke-RerunReviewTrigger.ps1:181) — if ($expectedHeadSha -and …) short-circuits on empty string, skipping the head-SHA staleness check. gh-aw's required: true enforces presence, not non-emptiness; treating empty as invalid is cheap defense-in-depth.
  • Stale-lock recovery direction (Test-AgentReviewInProgressIsStale) — on events-API error or missing labeled event, returns $false (fresh). If the labeling event ages out or is unreadable, a lock that was never cleared can wedge a PR indefinitely. Falling back to issue updated_at would let the 18-hour window still elapse.
  • Heredoc delimiter EOF (rerun-review-scanner.md:114-116) — un-randomized. Practically near-impossible to trigger via JSON-encoded content, but EOF_$(openssl rand -hex 8) is the documented hardening.
  • Test coverage gapInvoke-RerunReviewTrigger.ps1 and Query-RerunReadyPRs.ps1 have no Pester tests. Issues #2, #5, #6, #8 above would have been caught by basic argument-validation and Normalize-PipelineRef coverage.

Prior review threads (acknowledgement)

  • @T-Gro asked twice (2026-06-01 and again on 2026-06-02 against b84cc7f3f): "What is your concurrency strategy? What prevents 2+ reviews for the same PR?" — Finding #4 above (the 2/3-consensus race) is the same issue, now with a concrete trigger sequence: scanner-vs-/review workflows have no shared concurrency group and Set-AgentReviewInProgress is check-then-set. Hourly serialization only protects scanner-vs-scanner.
  • @JanKrivanek asked where trigger_rerun_review is defined. It is generated by gh-aw from the safe-outputs.jobs.trigger-rerun-review: block in .github/workflows/rerun-review-scanner.md. The hyphenated job key compiles into an underscored tool name (trigger_rerun_review) for the agent. See compiled lock.yml around line 469.

Methodology

3 reviewers with different model families ran in parallel:

  • Reviewer 1: deep-reasoning model, focused on subtle logic and concurrency walkthrough
  • Reviewer 2: pattern-matching model, focused on security and common bug classes
  • Reviewer 3: cross-family model, focused on edge cases

Disputed (1/3) findings were either verified directly against source code or discarded. Findings about issues already raised in existing review threads were merged with attribution rather than re-flagged. CI build/test status was intentionally out of scope.

group: review-rerun-${{ github.event.issue.number }}
cancel-in-progress: false
timeout-minutes: 5
permissions:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Securitymark-rerun-ready lacks the actor permission check that trigger-review enforces (lines 127-138 in this same file). Any commenter — including a first-time contributor on their own PR — can post /review rerun and successfully apply s/agent-ready-for-rerun, since their own comment is fresh evidence past the AI Summary checkpoint. The hourly scanner then picks it up and triggers a full AzDO review. This is a privilege escalation: the direct-trigger path is gated; the rerun path is not.
Flagged by: 3/3 reviewers (unanimous)

Fix: Add the same Check actor permission step (write/maintain/admin) to mark-rerun-ready before invoking Resolve-RerunEligibility.ps1 -ApplyLabel.

if ($prNumber -le 0) {
throw "Invalid PR number '$($item.pr_number)'."
}
if ($rerunCommentId -le 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic — This validation runs before branching on $decision. But the scanner prompt explicitly tells the agent: "If it is missing, choose skip and use 0." (rerun-review-scanner.md:157). And Query-RerunReadyPRs.ps1:99 already produces rerunCommentId = $null (→ 0 after [Int64] cast) whenever a PR was queued without a /review rerun comment (e.g. a maintainer applied the queue label manually).

So a valid skip with 0 always throws here. Worse: the foreach at line 158 has no per-item try/catch, so $ErrorActionPreference = 'Stop' propagates the throw and aborts processing of every remaining candidate in the batch — one bad item silently drops the rest.
Flagged by: 1/3 reviewer; verified directly against source.

Fix: Only require rerun_comment_id > 0 on the trigger branch; allow skip with 0 (and guard Add-CommentReaction on >0). Wrap the per-candidate body in try/catch { Write-Error; continue } so one malformed item doesn't kill the whole batch.

id: review_lock
shell: pwsh
env:
GH_TOKEN: ${{ github.token }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State Bugtrigger-review acquires s/agent-review-in-progress here, but the job never removes s/agent-ready-for-rerun if it's present.

Scenario: PR has s/agent-ready-for-rerun (queued). Maintainer types /review directly. trigger-review sets the in-progress lock, runs AzDO, AzDO's CleanupReviewLock stage removes the in-progress label. The queue label is still there. Next hourly scanner pass sees a queued PR with no in-progress lock → triggers a second review.
Flagged by: 1/3 reviewer; verified directly against source.

Fix: After successful Set-AgentReviewInProgress, also remove s/agent-ready-for-rerun (best-effort with --silent is fine — it may not be present).

# ============================================================
# Set-AgentReviewInProgress
# ============================================================
function Set-AgentReviewInProgress {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Race ConditionSet-AgentReviewInProgress is check-then-set: Get-AgentLabelsif (-not contains)Add-LabelGet-AgentLabels (verify). GitHub's add-label API is idempotent — two concurrent callers will both pass the -notcontains check and both call Add-Label, and both success returns will report "the label is present".

This matters because the scanner workflow uses concurrency: gh-aw-${{ github.workflow }} (scanner-vs-scanner only) and review-trigger.yml uses concurrency: review-trigger-${{ github.event.issue.number }} (per-PR direct trigger only). These are different groups in different workflows, so the safe-output job and the /review job can race on the same PR with no serialization. This is exactly the unresolved concern @T-Gro raised twice on this PR.
Flagged by: 2/3 reviewers (lowered to ⚠️ per consensus rule).

Fix: Either (a) route both workflows through the same concurrency group keyed by PR number (e.g. pr-review-${{ pr_number }}), or (b) make acquisition tag-based — write the label with a unique run-id marker (e.g. as a comment or via a label-prefix the script stamps with its own run-id), then re-read and abort if the marker doesn't match this run.

resources = @{
repositories = @{
self = @{
refName = "refs/heads/$PipelineRef"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ LogicrefName = "refs/heads/$PipelineRef" blindly prepends refs/heads/ even when $PipelineRef already starts with it. Normalize-PipelineRef strips most metacharacters but allows /, so refs/heads/main survives unchanged → final refName is refs/heads/refs/heads/main, which AzDO rejects.

User scenario: someone types /review --branch refs/heads/main (a perfectly reasonable expectation given the rest of .github/ uses refs/heads/...). The pipeline trigger fails, the catch block re-throws, the entire scanner batch aborts (see comment on line 171). Same hazard exists in the bash version of this prepend at review-trigger.yml:369.
Flagged by: 1/3 reviewer; verified directly against source.

Fix: In Normalize-PipelineRef (and the equivalent block in Resolve-RerunEligibility.ps1 and review-trigger.yml), strip a leading refs/heads/ before sanitization.

}
continue
}
'^(--platform|-p)$' {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Logic — The regex ^(--platform|-p)$ only matches the flag in its bare form. /review --platform=ios (with an equals sign — the form most CLIs accept) does NOT match. Falls through to the default case which checks $candidate = $token.ToLowerInvariant() against the valid platform list — --platform=ios is not a valid platform name, so it's silently dropped and $platform stays empty.

User thinks they selected iOS; the pipeline runs on android (the fallback in Get-PlatformFromLabels).

Same hazard exists in review-trigger.yml's bash parser (lines 170-200) — it also only handles the space-separated form.
Flagged by: 1/3 reviewer; verified directly against source.

Fix: Pre-process tokens to split on = before iterating ($tokens = $tokens | ForEach-Object { $_ -split '=', 2 }), or extend the regex: ^(--platform|-p)(=(.+))?$ and capture group 3.


return @($Comments |
Where-Object { $_.body -and ([string]$_.body).Contains($AISummaryMarker) } |
Sort-Object @{ Expression = { Get-ObjectDate $_ 'updated_at' }; Descending = $true }, @{ Expression = { [Int64]$_.id }; Descending = $true } |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ LogicGet-LatestAISummaryComment sorts by updated_at, and downstream summaryUpdatedAt is used as the eligibility checkpoint (lines 353, 490). If a maintainer edits the AI Summary comment after new reviewer comments/commits have landed (typo fix, formatting), updated_at jumps ahead of that real activity → Test-HasEvidenceCommentAfter and Test-HasCommitAfter see nothing past the checkpoint → eligibility returns no-new-comments-or-commits and a legitimate /review rerun is silently dropped.
Flagged by: 1/3 reviewer.

Fix: Anchor the checkpoint to the comment's created_at rather than updated_at, or (better) to the SHA embedded in the SESSION marker via Get-LatestReviewedSha, which is the actual state the previous review observed.

}
}

Add-CommentReaction -CommentId $rerunCommentId -Content '+1'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Error Handling — The +1 reaction is posted before Invoke-AzDOReviewPipeline. If the AzDO trigger throws, the catch block at line 221 clears the in-progress lock but does NOT remove the +1 reaction or restore the queue label. The user sees a +1 (meaning "your rerun is starting") but no review actually started. Combined with the loop-abort issue at line 171, the same throw also kills processing of every later candidate in the batch.
Flagged by: 1/3 reviewer; verified directly against source.

Fix: Post the +1 reaction only after Invoke-AzDOReviewPipeline returns successfully. On failure, either remove the reaction, post a -1, or leave the queue label in place so the next scanner pass retries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai-agents Copilot CLI agents, agent skills, AI-assisted development area-infrastructure CI, Maestro / Coherency, upstream dependencies/versions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants