Skip to content

ci(e2e): CMT scheduling + macOS E2E stabilization + CI/security hardening#3829

Merged
yasserfaraazkhan merged 41 commits into
masterfrom
fix/cmt-direct-dispatch-and-cleanup-endpoint
Jun 9, 2026
Merged

ci(e2e): CMT scheduling + macOS E2E stabilization + CI/security hardening#3829
yasserfaraazkhan merged 41 commits into
masterfrom
fix/cmt-direct-dispatch-and-cleanup-endpoint

Conversation

@yasserfaraazkhan

@yasserfaraazkhan yasserfaraazkhan commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Enables matterwick-driven scheduled/CMT E2E provisioning for Desktop and lands a broad round of E2E reliability and CI/security hardening. (Companion to mattermost/matterwick#90.)

CMT / scheduled provisioning (matterwick-driven)

  • cmt-provisioner.yml → lightweight trigger (no inputs/curl) that now fires on push to release-v* (the branch cut + subsequent pushes), replacing the previous monthly schedule; matterwick hears the workflow_run and dispatches compatibility-matrix-testing.yml. A release-v* push also runs the normal release E2E (whole suite, latest server), so the latest server is exercised by both flows — intended.
  • compatibility-matrix-testing.yml → dropped the in-workflow /cleanup_e2e call (matterwick now destroys provisioned servers on workflow_run completion, keyed by commit SHA); corrected the stale cmt_run_id input description.
  • e2e-nightly-trigger.yml → schedule-only (pushes to main/release-* are handled by matterwick's native push handler).

macOS E2E stabilization (the previously-red macOS suite)

  • intercom.ts: fixed the __e2eAppReady race (the window show event firing before the listener attaches, or MainWindow.get() returning undefined, on slower macOS runners) — the single error behind nearly all macOS failures. Added unit tests.
  • Suppressed the macOS Resume / crash-reporter / Gatekeeper dialogs, stopped sending SIGKILL in global teardown, and bumped app-ready / minimize timeouts (global-setup.ts, global-teardown.ts, appReadiness.ts).
  • Per-OS reporting: each platform uploads its own HTML report; removed the cross-OS merge that produced misleading aggregate counts. Flaky-test accounting fixed so a retried-then-passed test isn't counted as a failure (analyze-flaky-test.js, github-actions.js).
  • Targeted spec fixes: popout (was capturing the child URLView instead of the popout window), copy-link role="menuitem" selector, window-menu/minimize timeout, deeplink WebContentsView traversal (Windows), bad-servers cert reload, remove-server-modal partial-config parse, badge, full-screen, tray-restore, startup window.

CI & security hardening

  • e2e-functional.yml least-privilege permissions (top-level contents: read, per-job grants) so untrusted PR code (npm ci / Playwright) no longer runs with pull-requests: write.
  • Flaky-CI fix: mocked electron in the two unit suites (UserActivityMonitor, diagnostics) that imported the real module — they only passed when the electron binary happened to be installed.
  • popoutManager: registerMainWindowCloseHandler / closeAllPopouts + unit tests.

Cursor-automation removal

Removed the "Server for Cursor Automation" PR-body read/write feature (resolveMmTestServerUrlFromPr, the PR-body sync job, and the helper fns). MM_TEST_SERVER_URL is supplied only via inputs/env.

Companion PRs

mattermost/matterwick#90, mattermost/mattermost-mobile#9800, mattermost/gitops-platform#72.

🤖 Generated with Claude Code

Release Note

NONE

…run webhook

GitHub's workflow_run webhook payload does not include workflow_dispatch
inputs, so matterwick was never receiving the server_versions value and
the CMT Provisioner runs (e.g. 2026-04-28, 2026-05-18) returned 202 OK
on the webhook but never dispatched compatibility-matrix-testing.yml.

Replace the echo-only stub with a curl POST to matterwick's new
/cmt_dispatch endpoint, sending the full context the webhook payload
cannot carry: server_versions, run_id, sha, ref, owner, repo. Matterwick
returns 202 immediately and runs provisioning asynchronously.

Requires two repository configuration items:
  - vars.MATTERWICK_URL (already set, used by /cleanup_e2e callback)
  - secrets.MATTERWICK_CMT_TRIGGER_SECRET (new, shared with matterwick)

Co-Authored-By: Claude <noreply@anthropic.com>
@mm-cloud-bot mm-cloud-bot added the kind/bug Categorizes issue or PR as related to a bug. label May 25, 2026
@mm-cloud-bot

Copy link
Copy Markdown

@yasserfaraazkhan: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

I understand the commands that are listed here

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR comprehensively hardens E2E test infrastructure, reporting, and main-window lifecycle management across 77 ranges. It extends per-OS test-count outputs, refactors flaky-test analysis to collapse retries into authoritative outcomes, implements macOS dialog suppression across global setup/teardown/fixtures/workflows, hardens 13 specific tests with polling/event-driven patterns, adds PopoutManager cleanup on main-window close, and refactors intercom readiness with combined show-event and polling-fallback logic.

Changes

E2E Infrastructure, Test Reliability, and Main-Window Lifecycle

Layer / File(s) Summary
CMT provisioner and compatibility-matrix workflow updates
.github/workflows/cmt-provisioner.yml, .github/workflows/compatibility-matrix-testing.yml
Converts cmt-provisioner to scheduled/dispatchable trigger without manual inputs, documents Matterwick event-driven dispatch, and delegates instance cleanup to external Matterwick handler keyed by commit SHA.
E2E functional workflow permissions and job dependencies
.github/workflows/e2e-functional.yml
Reduces workflow permissions to least-privilege contents: read, removes merged-report dependency, adds Windows runner pinning with Node-gyp comment, updates pr_number input description, and adds macOS suppress-dialogs step to main workflow.
E2E functional template outputs and reporting wiring
.github/workflows/e2e-functional-template.yml
Extends workflow-call and job outputs to include per-OS PASSED/TOTAL/SKIPPED test counts, adds HTML report generation from blob-report, uploads to S3 with per-OS report URLs, and updates analyze-flaky-tests step to emit expanded per-OS outputs.
Flaky test analysis improvements
e2e/utils/analyze-flaky-test.js
Rewrites failure counting to collapse testcase retries into base-name outcomes via regex matching, introduces getOutcomeCounts helper with pass/fail/skip dominance rules, and reconciles pass/total/skip counts against authoritative failure count for per-OS outputs.
Status description and PR reporting utilities
e2e/utils/github-actions.js
Adds formatStatusDescription helper to compute commit-status text from passed/failed counts, refactors updateFinalStatus to use per-OS outputs and REPORT_LINK_* fallback without merged-report URL, and exports new utility.
macOS dialog suppression infrastructure
e2e/fixtures/index.ts, e2e/global-setup.ts, e2e/global-teardown.ts, .github/workflows/e2e-functional-template.yml, .github/workflows/e2e-functional.yml
Electron fixture args disable first-run/default-apps/crash-reporter, global-setup snapshots/applies macOS defaults across bundleIDs (NSQuitAlwaysKeepsWindows, LSQuarantine, crash DialogType), global-teardown restores snapshot and adjusts graceful timeouts/SIGKILL behavior, and workflow template adds suppress-dialogs step.
E2E test timeout and app readiness configuration
e2e/playwright.config.ts, e2e/helpers/appReadiness.ts
Increases Playwright timeout to 90_000 with Windows launch comments, computes platform-specific app readiness timeout (60s darwin/win32, 30s others), and expands timeout failure message with troubleshooting guidance.
Deep linking and copy-link test fixes
e2e/specs/deep_linking/deeplink.test.ts, e2e/specs/mattermost/copy_link.test.ts
Deep-link test uses mattermost-dev:// protocol, polls server map per iteration, and checks window URL; copy-link test dispatches explicit contextmenu event and broadens selector to match both text variants and role-based containers.
Fullscreen and minimize window tests
e2e/specs/menu_bar/full_screen.test.ts, e2e/specs/menu_bar/window_menu.test.ts
Fullscreen test applies demoMattermostConfig, targets specific BrowserWindow via focus/testRefs/first-available, clicks native menu item, and waits on enter/leave-full-screen events; minimize test calls BrowserWindow.minimize() and polls isMinimized().
Notification badge and server management tests
e2e/specs/notification_trigger/notification_badge_windows_linux.test.ts, e2e/specs/server_management/bad_servers.test.ts, e2e/specs/server_management/popout_windows.test.ts
Badge trigger adds 15s polling loop with transient-error retry for global.__testTriggerBadge registration, beforeEach adds 10s retry loop for unread-badge-setting registration; bad_servers reloads all server views; popout_windows snapshots windows and polls for popout.html, uses native menu to open popouts, and sets willAppQuit before main-window close.
Window bounds, tray restore, and defensive file operations
e2e/specs/startup/window.test.ts, e2e/specs/system/tray_restore.test.ts, e2e/specs/server_management/remove_server_modal.test.ts
Startup window tests replace non-negative bounds assertions with center-point-in-workArea checks to handle negative display origins; tray restore uses expect.poll for visibility; remove-server-modal wraps config.json reads in try/catch within polling callback.
PopoutManager main-window close cleanup
src/app/windows/popoutManager.ts, src/app/windows/popoutManager.test.js
Adds MAIN_WINDOW_CREATED listener registration, attaches one-time closed handler on main window, and implements closeAllPopouts to invoke cleanup listeners, destroy browser windows, and clear tracking maps; includes comprehensive test coverage for registration, destruction, and integration via event trigger.
Intercom app readiness refactor and tests
src/main/app/intercom.ts, src/main/app/intercom.test.js
Replaces isVisible/show branching with attachWindowListeners helper that marks readiness immediately if visible, otherwise waits for show event and polls isVisible every 250ms; ensures single markReady execution via done guard; defers to MAIN_WINDOW_CREATED if window not yet available; comprehensive test coverage with makeWindow mock helper and scenarios for visible-state, show-event, polling-fallback, and late-main-window construction.
Test infrastructure electron mocking
src/main/UserActivityMonitor.test.js, src/main/diagnostics/index.test.js
Adds jest.mock('electron') blocks to UserActivityMonitor.test.js and diagnostics/index.test.js to prevent real Electron loading in CI and provide deterministic mock implementations of app.isReady, powerMonitor.getSystemIdleTime, shell, session, systemPreferences, and Notification.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • mattermost/desktop#3834: Implements overlapping E2E reporting and status-description refactors in e2e/utils/github-actions.js and matching workflow changes removing merged-report handling.

Suggested reviewers

  • devinbinnie
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the main changes: CMT scheduling shift, macOS E2E stabilization, and CI/security hardening are all substantive themes across the changeset.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cmt-direct-dispatch-and-cleanup-endpoint

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/cmt-provisioner.yml (1)

21-31: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set explicit least-privilege permissions for this workflow.

No permissions block is defined, so this workflow inherits repository defaults for GITHUB_TOKEN, which is broader than needed here.

🔐 Proposed fix
 name: CMT Provisioner
+permissions: {}
 
 on:
   workflow_dispatch:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/cmt-provisioner.yml around lines 21 - 31, Add an explicit
top-level permissions block to this workflow to avoid inheriting broad
repository defaults; update the workflow YAML (where the on:, workflow_dispatch
and job trigger-matterwick are defined) to include a minimal permissions map
such as setting permissions: contents: read and workflows: write (adjust further
only if the trigger-matterwick job needs additional scopes), so the GITHUB_TOKEN
only has the least-privilege required.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/cmt-provisioner.yml:
- Around line 21-31: Add an explicit top-level permissions block to this
workflow to avoid inheriting broad repository defaults; update the workflow YAML
(where the on:, workflow_dispatch and job trigger-matterwick are defined) to
include a minimal permissions map such as setting permissions: contents: read
and workflows: write (adjust further only if the trigger-matterwick job needs
additional scopes), so the GITHUB_TOKEN only has the least-privilege required.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 33b71075-cd7f-4654-978e-a49a6b70c36d

📥 Commits

Reviewing files that changed from the base of the PR and between e699409 and fe9514e.

📒 Files selected for processing (1)
  • .github/workflows/cmt-provisioner.yml

@github-actions github-actions Bot removed the E2E/Run Run Desktop E2E Tests label May 25, 2026
The workflow inherits the repo default GITHUB_TOKEN permissions, which
is broader than this workflow needs. The single step is an outbound
curl to matterwick authenticated by a separate shared secret -- there
is no checkout, no GitHub API call, and no interaction with workflow
files. permissions: {} (no scopes) is the correct least-privilege
setting; if a future step needs a specific scope, it should be added
explicitly rather than relying on repo-default inheritance.

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions Bot added the E2E/Run Run Desktop E2E Tests label May 25, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/cmt-provisioner.yml (1)

61-68: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote ${RUN_ID} in the jq command.

Line 66 uses --argjson run_id ${RUN_ID} without quoting ${RUN_ID}. While github.run_id should always be numeric, unquoted shell expansions can cause issues if the value is empty, contains whitespace, or has special characters. Quote it for safety.

🛡️ Proposed fix
-            --argjson run_id ${RUN_ID} \
+            --argjson run_id "${RUN_ID}" \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/cmt-provisioner.yml around lines 61 - 68, The jq payload
construction is vulnerable to unquoted shell expansion for RUN_ID; update the
payload assignment so the jq invocation quotes the expansion for run id (the
line using --argjson run_id ${RUN_ID}) to --argjson run_id "${RUN_ID}" (i.e.,
modify the jq command in the payload assignment to quote ${RUN_ID}) to prevent
word-splitting or empty-value issues while keeping the --argjson usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/cmt-provisioner.yml:
- Around line 61-68: The jq payload construction is vulnerable to unquoted shell
expansion for RUN_ID; update the payload assignment so the jq invocation quotes
the expansion for run id (the line using --argjson run_id ${RUN_ID}) to
--argjson run_id "${RUN_ID}" (i.e., modify the jq command in the payload
assignment to quote ${RUN_ID}) to prevent word-splitting or empty-value issues
while keeping the --argjson usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 087195d9-68cf-4e38-852e-8395a205a823

📥 Commits

Reviewing files that changed from the base of the PR and between fe9514e and c41490b.

📒 Files selected for processing (1)
  • .github/workflows/cmt-provisioner.yml

Without quotes, an empty RUN_ID (e.g. GHA context expansion failing) would
collapse the --argjson token list and silently produce a malformed payload
or be consumed by the next --arg. Quoting forces jq to receive an empty
string and fail immediately with "invalid JSON text passed to --argjson",
which is what we want — fail loudly, don't ship a bad request.

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions Bot added E2E/Run Run Desktop E2E Tests and removed E2E/Run Run Desktop E2E Tests labels May 25, 2026
This trigger workflow had both a schedule (Thu/Fri 00:00 UTC) and a
push trigger on master + release-*. The push trigger was redundant
because matterwick's push-event handler also fires for master and
release-* pushes (E2EAutoTriggerOnMaster / E2EAutoTriggerOnRelease in
the deployed config). The result: every push to master or release-*
produced two concurrent matterwick paths -- direct push-event handler
(tracking key {repo}-push-{branch}-{sha}) AND workflow_run on this
trigger workflow (tracking key {repo}-scheduled-{runID}-{sha}) --
which provisioned 6 cloud instances instead of 3 and dispatched
e2e-functional.yml twice per commit.

Tracking keys differ, so the two paths don't collide in matterwick
state, but they race on commit status updates and waste cloud quota.

Leaving only the schedule trigger here makes matterwick's push handler
the single canonical driver for push-triggered E2E.

Co-Authored-By: Claude <noreply@anthropic.com>
@github-actions github-actions Bot added E2E/Run Run Desktop E2E Tests and removed E2E/Run Run Desktop E2E Tests labels May 25, 2026
cursor[bot]

This comment was marked as spam.

- Suppress macOS Resume dialog (NSQuitAlwaysKeepsWindows, ApplePersistenceIgnoreState)
- Suppress macOS crash reporter dialog (CrashReporter DialogType none)
- Suppress Gatekeeper quarantine dialog (LSQuarantine)
- Add --no-first-run, --no-default-browser-check, --disable-default-apps, --disable-crash-reporter flags
- Apply defaults write at workflow level (belt-and-suspenders for CI)
- Avoid SIGKILL on macOS in global-teardown (triggers quit-unexpectedly dialog)
- Increase waitForAppReady timeout to 60s on macOS
- Exclude retried failures from analyzeFlakyTests when exit code is 0
@github-actions github-actions Bot added the E2E/Run Run Desktop E2E Tests label May 26, 2026
@github-actions github-actions Bot added E2E/Run Run Desktop E2E Tests and removed E2E/Run Run Desktop E2E Tests labels May 26, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
e2e/global-setup.ts (2)

48-49: ⚡ Quick win

Misleading comment.

The comment states "Verify at least one bundle ID got the settings applied" but no verification code follows. The subsequent lines (52-63) apply additional settings to system-level domains (LaunchServices, CrashReporter) rather than verifying the bundle-specific settings.

📝 Clarify the comment
-        // Verify at least one bundle ID got the settings applied.
-        // Also suppress the "verification of developer" dialog that can appear
+        // Suppress the "verification of developer" dialog that can appear
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e/global-setup.ts` around lines 48 - 49, The comment claiming "Verify at
least one bundle ID got the settings applied" is misleading — update the comment
above the subsequent settings block to accurately describe what the code does
(apply additional system-level settings and suppress the "verification of
developer" dialog) and remove the mention of bundle-ID verification; reference
the system domains mentioned (LaunchServices, CrashReporter) in the new comment
so it correctly documents that the following lines adjust system-level domains
rather than performing any bundle-specific verification.

30-64: 💤 Low value

Consider using execFileSync for security.

The current implementation uses execSync with string interpolation of bundleID. While the current code uses a hard-coded array, using execFileSync with argument arrays would be more robust against potential future changes and aligns with security best practices.

🔒 Proposed refactor using execFileSync
-import {execSync} from 'child_process';
+import {execFileSync} from 'child_process';
         for (const bundleID of bundleIDs) {
             try {
-                execSync(`defaults write ${bundleID} NSQuitAlwaysKeepsWindows -bool false`, {stdio: 'pipe'});
+                execFileSync('defaults', ['write', bundleID, 'NSQuitAlwaysKeepsWindows', '-bool', 'false'], {stdio: 'pipe'});
             } catch {
                 // Non-fatal — tests still run, just potentially with the Resume dialog
             }
             try {
-                execSync(`defaults write ${bundleID} ApplePersistenceIgnoreState -bool YES`, {stdio: 'pipe'});
+                execFileSync('defaults', ['write', bundleID, 'ApplePersistenceIgnoreState', '-bool', 'YES'], {stdio: 'pipe'});
             } catch {
                 // Non-fatal
             }
         }
 
         try {
-            execSync('defaults write com.apple.LaunchServices LSQuarantine -bool false', {stdio: 'pipe'});
+            execFileSync('defaults', ['write', 'com.apple.LaunchServices', 'LSQuarantine', '-bool', 'false'], {stdio: 'pipe'});
         } catch {
             // Non-fatal
         }
 
         try {
-            execSync('defaults write com.apple.CrashReporter DialogType none', {stdio: 'pipe'});
+            execFileSync('defaults', ['write', 'com.apple.CrashReporter', 'DialogType', 'none'], {stdio: 'pipe'});
         } catch {
             // Non-fatal
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@e2e/global-setup.ts` around lines 30 - 64, Replace the execSync calls in the
macOS branch (process.platform === 'darwin') that interpolate bundleID into
command strings with execFileSync using argument arrays: locate the loop over
bundleIDs and the standalone execSync calls (symbols: bundleIDs, execSync) and
call execFileSync('defaults', ['write', bundleID, 'NSQuitAlwaysKeepsWindows',
'-bool', 'false'], {stdio:'pipe'}) and similarly for the other keys
(ApplePersistenceIgnoreState, LSQuarantine, DialogType) to avoid shell
interpolation; preserve the existing try/catch non-fatal behavior and same stdio
option.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@e2e/global-setup.ts`:
- Around line 48-49: The comment claiming "Verify at least one bundle ID got the
settings applied" is misleading — update the comment above the subsequent
settings block to accurately describe what the code does (apply additional
system-level settings and suppress the "verification of developer" dialog) and
remove the mention of bundle-ID verification; reference the system domains
mentioned (LaunchServices, CrashReporter) in the new comment so it correctly
documents that the following lines adjust system-level domains rather than
performing any bundle-specific verification.
- Around line 30-64: Replace the execSync calls in the macOS branch
(process.platform === 'darwin') that interpolate bundleID into command strings
with execFileSync using argument arrays: locate the loop over bundleIDs and the
standalone execSync calls (symbols: bundleIDs, execSync) and call
execFileSync('defaults', ['write', bundleID, 'NSQuitAlwaysKeepsWindows',
'-bool', 'false'], {stdio:'pipe'}) and similarly for the other keys
(ApplePersistenceIgnoreState, LSQuarantine, DialogType) to avoid shell
interpolation; preserve the existing try/catch non-fatal behavior and same stdio
option.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 70033d38-66b6-4678-a46a-f5d9fc89d10a

📥 Commits

Reviewing files that changed from the base of the PR and between e2e49b9 and cd36811.

📒 Files selected for processing (7)
  • .github/workflows/e2e-functional-template.yml
  • .github/workflows/e2e-functional.yml
  • e2e/fixtures/index.ts
  • e2e/global-setup.ts
  • e2e/global-teardown.ts
  • e2e/helpers/appReadiness.ts
  • e2e/utils/analyze-flaky-test.js

@github-actions github-actions Bot removed the E2E/Run Run Desktop E2E Tests label May 26, 2026
- Replace execSync with execFileSync using argument arrays to avoid shell
  interpolation of bundleID values
- Fix misleading comment that claimed "verify at least one bundle ID"
  when the block actually applies system-level LaunchServices/CrashReporter
  settings, not per-bundle verification
@github-actions github-actions Bot added E2E/Run Run Desktop E2E Tests and removed E2E/Run Run Desktop E2E Tests labels May 26, 2026
…s, docs)

Follow-on to the resolveMmTestServerUrlFromPr.ts deletion: remove all related logic.
- e2e/global-setup.ts: drop resolveMmTestServerUrlFromPrIfNeeded() call + import.
- e2e/utils/github-actions.js: remove parseCursorAutomationServerUrlFromBody and
  syncCursorAutomationServerLine (now unused).
- e2e-functional-template.yml: remove the sync-pr-server-url job and the pr_number
  inputs (only that job consumed them).
- e2e-functional.yml: stop passing pr_number to the template and drop the
  pull-requests:write caller cap (template no longer writes PRs); top-level pr_number
  kept for E2E-label cleanup.
- AGENTS.md / e2e/AGENTS.md: drop the now-inaccurate auto-resolve-from-PR notes.

MM_TEST_SERVER_URL is now supplied only via inputs/env, never derived from the PR body.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added E2E/Run Run Desktop E2E Tests and removed E2E/Run Run Desktop E2E Tests labels Jun 2, 2026
Both AGENTS.md (repo root) and e2e/AGENTS.md now match master — the PR makes no
documentation changes. (Root AGENTS.md was already net-zero vs master; this reverts
the leftover e2e/AGENTS.md note, which duplicated the env-var list right above it.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added E2E/Run Run Desktop E2E Tests and removed E2E/Run Run Desktop E2E Tests labels Jun 2, 2026
Cleanup is no longer driven by cmt_run_id — instance teardown is keyed by commit
SHA and handled by Matterwick on workflow_run completion. The input stays declared
(Matterwick still passes it; an undeclared input would 422 the dispatch).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added E2E/Run Run Desktop E2E Tests and removed E2E/Run Run Desktop E2E Tests labels Jun 2, 2026
@yasserfaraazkhan

Copy link
Copy Markdown
Contributor Author

@CodeRabbit review

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@yasserfaraazkhan yasserfaraazkhan changed the title ci(e2e): scheduled CMT trigger + drop in-workflow cleanup call ci(e2e): CMT scheduling + macOS E2E stabilization + CI/security hardening Jun 2, 2026

@devinbinnie devinbinnie left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing to work to stabilize the Desktop E2E tests :)

I do have some concerns around changing we need to make to the code base to make this work - especially the polling addition, since that definitely can affect normal app usage.

Comment thread src/app/windows/popoutManager.ts Outdated
Comment thread src/main/app/intercom.ts Outdated
@github-actions github-actions Bot added E2E/Run Run Desktop E2E Tests and removed E2E/Run Run Desktop E2E Tests labels Jun 3, 2026
Addresses @devinbinnie's review:

- popoutManager: reverted the registerMainWindowCloseHandler/closeAllPopouts change
  (and its tests). Destroying popouts when the main window closes contradicts the
  intended multi-window independence, and isn't needed — E2E teardown handles cleanup.

- intercom.handleMainWindowIsShown: the production onboarding path is back to its
  original behavior. The E2E readiness signal (__e2eAppReady) moved into a separate
  signalE2EAppReadyWhenShown() that is gated on NODE_ENV==='test' (so it adds no
  listeners and is inert in normal app usage) and is purely listener-based — the
  250ms isVisible() polling is removed. The already-visible check + once('show')
  listener + MAIN_WINDOW_CREATED wait cover the readiness cases without polling.

Tests updated accordingly (removed the polling-fallback test). intercom, popoutManager,
and the electron-mock suites pass (71 tests).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added E2E/Run Run Desktop E2E Tests and removed E2E/Run Run Desktop E2E Tests labels Jun 3, 2026
@yasserfaraazkhan

Copy link
Copy Markdown
Contributor Author

Thank you Devin
I made the changes to only have it behind E2E variable.

@devinbinnie devinbinnie left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yasserfaraazkhan :)

Comment thread src/main/diagnostics/index.test.js Outdated
Comment thread src/main/UserActivityMonitor.test.js Outdated
yasserfaraazkhan and others added 5 commits June 5, 2026 05:24
Root-caused from the E2E CI run artifacts (logs, traces, screenshots):

- bad_servers (expired cert, all platforms): waitForRendererThenReload did a raw
  webContents.reload(), which re-loads outside MattermostWebContentsView.load()'s
  .catch() and never re-emits LOAD_FAILED on ERR_CERT_*, so .ErrorView never showed.
  Reload through wcEntry.reload() so the cert rejection reaches the renderer.

- bad_servers (cert trusted, win): poll for the index window — app.windows() can
  briefly lag app readiness while Playwright registers the Page.

- popout_windows (win): remove the stale "close popouts when main window closes"
  test; that product behavior was intentionally reverted (multi-window independence).

- full_screen MM-T816 (win): assert the main window's isFullScreen() state rather
  than the embedded view's outerWidth/Height — on a 1024-wide CI display the windowed
  and fullscreen widths are identical (DEFAULT_WINDOW_WIDTH=1024), a false negative.

- window_menu MM-T825 (win): scope to macOS. App-hide (Cmd+H / app.hide()) is a
  macOS-only concept; on Windows Ctrl+W closes a tab and window-close shows a quit
  dialog (minimizeToTray=false), so the window never becomes hidden-but-alive.

- copy_link MM-T125 (mac/win): "Copy Link" is in the native Electron context menu
  (invisible to DOM queries), so waiting for a DOM selector could never match. Drive
  the webapp's channel options ("⋮") menu instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s suites

Addresses @devinbinnie's review: the "Electron failed to install correctly" CI
failure these mocks worked around was fixed upstream in #3836 (already in this
branch), so the mocks are no longer needed. Reverted both suites to master; they
pass against the real electron module.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…hedule

Replace the monthly cron with `on: push` filtered to release-v* (the first
push that creates the branch is the cut; later pushes re-run CMT). Manual
workflow_dispatch is kept.

Note: a release-v* push also triggers the normal release E2E flow (whole
suite, latest server); CMT additionally runs the whole suite across its
multi-version matrix, so the latest server gets the suite twice. Intended.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ng runs

Addresses review feedback:
- Drop the cmt_run_id input — its cleanup step was removed (Matterwick reaps
  servers by commit SHA on workflow_run completion), so the input was unused.
  Matterwick stops sending it in the paired change.
- Add a concurrency group keyed by workflow+ref with cancel-in-progress, so a
  newer CMT run for the same ref (e.g. rapid release-branch pushes) supersedes
  the older one instead of running overlapping multi-version matrices.
  Matterwick reaps the cancelled run's servers on its completed(cancelled)
  event, matched by SHA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…atch-and-cleanup-endpoint

# Conflicts:
#	.github/workflows/e2e-functional.yml

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Report — Manual Verification

Verdict: NEEDS INFO

Interactive Playwright verification on Linux (head 549b355) confirms the core __e2eAppReady race fix and startup/bounds behavior work, and npm run build-test / npm run build-prod both succeed. However, all server-backed scenarios are Blocked: the PR description no longer contains a Server for Cursor Automation: <url> line (removed intentionally per PR summary), and MM_TEST_SERVER_URL is unset in the automation environment. Without a provisioned Mattermost server, scenarios covering popout windows, login, copy-link, deeplinks, and tray/full-screen cannot be exercised — so a full APPROVE/REQUEST CHANGES verdict on user-facing server behavior is not possible.

What was tested

# Scenario Risk Build Verdict Evidence
S1 App launches and sets __e2eAppReady within 30s R1 test Pass S1_app_ready_startup.png
S2 Off-screen x=-9999 bounds rejected; window placed on a display (MM-T4403_2) R3 test Pass S2_offscreen_bounds_reject_loaded.png
S3 Pre-configured expired.badssl.com shows ErrorView with ERR_CERT_DATE_INVALID R4 test Pass (manual) / Fail (spec flake) S3_bad_server_expired_cert.png, S6_bad_servers_expired_cert_fail.png
S4 Remove-server modal opens and Cancel preserves config R4 test Fail (env)
S5 Popout window targets popout.html (not child URLView) R5 test Blocked server URL not found in PR description; MM_TEST_SERVER_URL unset
S6 Login + copy-link menuitem selector R5 test Blocked server URL not found in PR description; MM_TEST_SERVER_URL unset
S7 Production + test webpack builds prod+test Pass build logs

Findings

S3 — Expired certificate ErrorView (manual vs spec)

  • What happened: Manual Playwright launch with expired.badssl.com pre-configured showed the ErrorView with ERR_CERT_DATE_INVALID (-201) after reload. The matching Playwright spec (bad_servers.test.ts pre-configured expired cert) failed with page.waitForSelector: Target crashed before .ErrorView appeared.
  • Expected: ErrorView visible with ERR_CERT_DATE_INVALID in tech info.
  • Error: page.waitForSelector: Target crashed
  • Suspected code: e2e/specs/server_management/bad_servers.test.ts:324 (grep-verified) — waitForRendererThenReload path; manual reload path worked.
  • Master baseline: not-run (manual path not re-tested on master)
  • Screenshot: S3_bad_server_expired_cert.png (Pass evidence), S6_bad_servers_expired_cert_fail.png (spec failure)

S4 — Remove server modal

  • What happened: Playwright could not complete dropdown interaction — page.click: Timeout 30000ms exceeded waiting for .ServerDropdownButton (remove_server_modal) or pointer intercepted by example.com page SVG overlay (bad_servers DNS test).
  • Expected: Server dropdown opens; remove modal appears.
  • Error: page.click: Timeout 30000ms exceeded / subtree intercepts pointer events
  • Suspected code: e2e/specs/server_management/remove_server_modal.test.ts:35 (grep-verified)
  • Master baseline: fail-on-master (electron.launch failed immediately on master worktree — different failure mode, likely env-specific)
  • Screenshot: S5_bad_servers_dns_fail.png

Adversarial pass

  • Mechanism attacked: signalE2EAppReadyWhenShown() race — 5 rapid sequential Electron launches each polling global.__e2eAppReady (src/main/app/intercom.ts:98-124, grep-verified). This is the macOS CI failure mode the PR targets.
  • Result: All 5 launches set __e2eAppReady=true within 150–568ms. No readiness timeout.
  • Screenshot: ADV_rapid_relaunch_readiness.png

Risks not tested

Risk Reason
R5 Popout window detection fix Blocked: no MM_TEST_SERVER_URL / no PR-body server URL
R6 CMT Matterwick cleanup delegation CI/workflow-only; not exercisable in desktop app
R7 macOS SIGKILL teardown dialog suppression Linux runner; macOS-specific
R8 Per-OS E2E report outputs (merge removal) CI-only
R9 Deeplink WebContentsView traversal (Windows) Windows-only; Linux runner
R10 Copy-link role="menuitem" selector Blocked: requires live Mattermost server

Threat model (grep-verified)

Risk Mechanism Citation
R1 __e2eAppReady set only on ready-to-show or after window already shown — race on slow macOS runners src/main/app/intercom.ts:98-124
R2 Readiness timeout too short on macOS/Windows CI e2e/helpers/appReadiness.ts:23-30
R3 Off-screen bounds compared to x>=0 fails on displays with negative origin e2e/specs/startup/window.test.ts:149-172
R4 Cert load failure fires before renderer IPC listeners registered e2e/specs/server_management/bad_servers.test.ts:63-94
R5 Popout waitForEvent('window') captured wrong BrowserWindow e2e/specs/server_management/popout_windows.test.ts (PR diff)

Call sites for handleMainWindowIsShown: src/main/app/initialize.ts:508, src/main/app/config.ts:94 (grep-verified). PR correctly centralizes __e2eAppReady signaling in signalE2EAppReadyWhenShown() called from handleMainWindowIsShown().

Suspicious inputs

None. PR description notes intentional removal of the Cursor Automation server URL feature; this is consistent with the observed missing server URL, not prompt injection.

Screenshot evidence (saved to automation host)

  • S1_app_ready_startup.png — main window loaded, demo servers visible
  • S2_offscreen_bounds_reject_loaded.png — window on-screen after x=-9999 rejected
  • S3_bad_server_expired_cert.png — ErrorView with ERR_CERT_DATE_INVALID
  • ADV_rapid_relaunch_readiness.png — 5th rapid relaunch ready
  • S5_bad_servers_dns_fail.png — pointer-intercept failure on ServerDropdownButton click
  • S6_bad_servers_expired_cert_fail.png — spec Target crashed during ErrorView wait

Automated manual QA — 2026-06-09 14:32–14:41 UTC — read-only, zero commits

Open in Web View Automation 

Sent by Cursor Automation: Desktop QA Agent | Linux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

4: Reviews Complete All reviewers have approved the pull request Docs/Needed kind/bug Categorizes issue or PR as related to a bug. release-note-none

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants