fix(e2e): webhook-driven, auto-versioned CMT + fix non-PR dispatch 422#90
fix(e2e): webhook-driven, auto-versioned CMT + fix non-PR dispatch 422#90yasserfaraazkhan wants to merge 19 commits into
Conversation
GitHub's workflow_run webhook payload does not carry workflow_dispatch
inputs (verified against the workflow_run schema), so CMT Provisioner
runs were never able to start provisioning -- the server_versions input
was always read as empty and the handler returned without dispatching
compatibility-matrix-testing.yml.
This change moves CMT trigger out of the webhook path and onto two new
authenticated HTTP endpoints, called directly by the workflows:
POST /cmt_dispatch -- cmt-provisioner.yml POSTs server_versions plus
full context here; matterwick provisions in a
goroutine and returns 202 immediately.
POST /cleanup_e2e -- compatibility-matrix-testing.yml POSTs the
CMT provisioner's run_id here on completion;
matterwick destroys tracked instances under
{repo}-cmt-{run_id}-* keys.
Auth: each endpoint takes a shared secret in a request header
(X-Trigger-Token, X-Cleanup-Token) compared with constant-time
comparison. Endpoints reject 503 if the corresponding secret is
unconfigured rather than running unauthenticated.
The broken "requested" branch in handleWorkflowRunEventWithInputs that
read payload.WorkflowRun.Inputs["server_versions"] is removed. The
"completed" branch is kept as a defensive fallback for stuck instances.
Co-Authored-By: Claude <noreply@anthropic.com>
Adds 18 unit tests covering the synchronous validation surface of both
new endpoints: missing/wrong tokens, malformed JSON, missing required
fields, edge inputs (",,," server_versions, unknown repo type, zero
run_id), and the cleanup map-deletion logic across matching,
non-matching, and multi-key scenarios.
Also drops the CheckLimitRateAndAbortRequest call from both handlers.
The /github_event handler keeps it because synchronous webhook
responses may need to make GitHub API calls before answering, but the
new endpoints only call GitHub from background goroutines (cmt_dispatch)
or talk to the cloud provisioner (cleanup_e2e). Keeping the rate check
would force every request to block on a live GitHub round-trip, which
makes the endpoints untestable in unit tests (no live token) for no
production benefit. Inline comments explain the rationale.
End-to-end coverage of the happy path (provisioning + workflow
dispatch) is deliberately left to e2e_dryrun_test.go, which already
mocks the cloud and github clients; duplicating that mocking here adds
noise without catching anything new. The validation rejection paths
above are the bug-prone surface.
Co-Authored-By: Claude <noreply@anthropic.com>
|
@yasserfaraazkhan: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsI understand the commands that are listed here |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughCMT provisioning routing now matches a configured trigger name and reads configured server versions. Instance cleanup switches from tracking-key in-memory deletion to SHA-suffix scanning; all dispatch functions drop tracking-key inputs and call sites are updated. ChangesSHA-Based Cleanup and CMT Routing Refactor
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
server/cmt_dispatch_test.go (1)
47-57: 💤 Low valueConsider using
httptest.NewRequestWithContextto satisfy linter.Static analysis flags
httptest.NewRequestusage. While functionally equivalent in tests, usingNewRequestWithContextaligns with best practices and silences the linter.♻️ Suggested fix
func doCMTDispatchRequest(t *testing.T, s *Server, body, token string) *httptest.ResponseRecorder { t.Helper() - req := httptest.NewRequest(http.MethodPost, "/cmt_dispatch", bytes.NewBufferString(body)) + req := httptest.NewRequestWithContext(context.Background(), http.MethodPost, "/cmt_dispatch", bytes.NewBufferString(body)) req.Header.Set("Content-Type", "application/json") if token != "" { req.Header.Set("X-Trigger-Token", token) } w := httptest.NewRecorder() s.handleCMTDispatch(w, req) return w }You'll need to add
"context"to the imports.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/cmt_dispatch_test.go` around lines 47 - 57, The test helper doCMTDispatchRequest uses httptest.NewRequest which the linter flags; update doCMTDispatchRequest to use httptest.NewRequestWithContext with a background context (e.g., context.Background()) when creating the request, and add "context" to the imports; specifically modify the call in doCMTDispatchRequest (the NewRequest call that constructs req) to NewRequestWithContext and ensure the X-Trigger-Token header handling and subsequent call to s.handleCMTDispatch remain unchanged.server/cleanup_e2e_test.go (1)
35-45: 💤 Low valueConsider using
httptest.NewRequestWithContextto satisfy linter.Same as in
cmt_dispatch_test.go, static analysis flagshttptest.NewRequest. UsingNewRequestWithContextwould align with the linter expectations.♻️ Suggested fix
func doCleanupE2ERequest(t *testing.T, s *Server, body, token string) *httptest.ResponseRecorder { t.Helper() - req := httptest.NewRequest(http.MethodPost, "/cleanup_e2e", bytes.NewBufferString(body)) + req := httptest.NewRequestWithContext(context.Background(), http.MethodPost, "/cleanup_e2e", bytes.NewBufferString(body)) req.Header.Set("Content-Type", "application/json") if token != "" { req.Header.Set("X-Cleanup-Token", token) } w := httptest.NewRecorder() s.handleCleanupE2E(w, req) return w }You'll need to add
"context"to the imports.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/cleanup_e2e_test.go` around lines 35 - 45, Replace httptest.NewRequest with httptest.NewRequestWithContext in doCleanupE2ERequest and pass a context (e.g., context.Background()); also add "context" to the imports. Specifically, update the call in doCleanupE2ERequest (the helper that builds the POST to "/cleanup_e2e") to use httptest.NewRequestWithContext(context.Background(), http.MethodPost, ... ) and ensure the package imports include "context".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@server/cleanup_e2e_test.go`:
- Around line 35-45: Replace httptest.NewRequest with
httptest.NewRequestWithContext in doCleanupE2ERequest and pass a context (e.g.,
context.Background()); also add "context" to the imports. Specifically, update
the call in doCleanupE2ERequest (the helper that builds the POST to
"/cleanup_e2e") to use httptest.NewRequestWithContext(context.Background(),
http.MethodPost, ... ) and ensure the package imports include "context".
In `@server/cmt_dispatch_test.go`:
- Around line 47-57: The test helper doCMTDispatchRequest uses
httptest.NewRequest which the linter flags; update doCMTDispatchRequest to use
httptest.NewRequestWithContext with a background context (e.g.,
context.Background()) when creating the request, and add "context" to the
imports; specifically modify the call in doCMTDispatchRequest (the NewRequest
call that constructs req) to NewRequestWithContext and ensure the
X-Trigger-Token header handling and subsequent call to s.handleCMTDispatch
remain unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7b08b625-de55-47f7-b1f4-061630a8a168
📒 Files selected for processing (7)
server/cleanup_e2e.goserver/cleanup_e2e_test.goserver/cmt_dispatch.goserver/cmt_dispatch_test.goserver/config.goserver/server.goserver/workflow_run.go
The "Primary" lookup at handleWorkflowRunEventWithInputs read
payload.WorkflowRun.Inputs["mw_tracking_key"], which is the same field
that broke CMT provisioning -- GitHub's workflow_run webhook payload
does not include workflow_dispatch inputs (verified against the Octokit
schema and the REST API for a real workflow run). The field is always
empty in production, so the "primary" branch never fires; only the
SHA-based fallback runs. That fallback works because every tracking-key
format ends with -{sha}.
Removing the dead read eliminates the misleading "Primary"/"Fallback"
labels and makes the SHA-based scan the documented canonical mechanism.
The mw_tracking_key value is still written when dispatching test
workflows -- it remains queryable via the REST API for any future code
path that wants to do a key-based lookup -- but the webhook-driven
cleanup intentionally does not depend on it.
Co-Authored-By: Claude <noreply@anthropic.com>
…g_key CMT now runs off a lightweight scheduled trigger workflow that matterwick hears as a workflow_run event, provisioning one server per version from a hardcoded set in config (CMTServerVersions) and dispatching compatibility-matrix-testing.yml. This removes the /cmt_dispatch and /cleanup_e2e HTTP endpoints (and their CMTTriggerSecret/CleanupSecret config), which existed only to pass server_versions past the workflow_run webhook. Also stop sending the undeclared `mw_tracking_key` workflow_dispatch input: GitHub rejects a dispatch carrying an undeclared input with HTTP 422, which broke every non-PR dispatch (release, scheduled, and CMT). Cleanup is keyed off the test workflow's workflow_run completed event, matched by commit SHA. - workflow_run.go: add handleCMTTrigger, provision on CMTTriggerWorkflowName 'requested'; remove handleCMTRunCleanup fallback - config.go: add CMTServerVersions (+ defaultCMTServerVersions) and CMTTriggerWorkflowName; remove CMTTriggerSecret/CleanupSecret - remove cmt_dispatch.go, cleanup_e2e.go and their tests - drop dead trackingKey plumbing from desktop/mobile/CMT dispatch paths Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/e2e_tests.go`:
- Around line 1051-1055: The dispatch uses branch "ref" which can advance and
desync cleanup keyed by commit SHA; update dispatchDesktopE2EWorkflow and
dispatchMobileE2EWorkflow to pin the run to the tracked commit by including the
commit SHA as a declared workflow input (e.g., "tracked_sha") and pass sha into
the workflow inputs, or alternatively set "ref" to the exact commit SHA when
calling the GitHub Actions API; also update the corresponding workflow files to
declare the tracked_sha input (so workflow_run.inputs can be consulted) and
adjust the cleanup path (findAndDestroyInstancesBySHA / createCloudInstallation
cleanup logic) to read workflow_run.inputs.tracked_sha (or ensure head_sha
equals the tracked SHA) so provisioning and cleanup use the same unique
identifier.
In `@server/workflow_run.go`:
- Around line 108-113: The cleanup currently calls
s.findAndDestroyInstancesBySHA(repoName, headSHA) which destroys all entries
matching {repo}-{sha} and will tear down concurrent runs; change the cleanup to
preserve and use the per-dispatch correlation key (runID) instead of SHA-only.
Update the webhook handling to extract runID from the payload and call a new or
updated method (e.g., s.findAndDestroyInstancesByRunID(repoName, headSHA, runID)
or s.findAndDestroyInstances(repoName, headSHA, runID)) so matching uses the
full tracking key that includes runID, and only fall back to SHA-only logic when
runID is absent; modify the matching/teardown logic inside the corresponding
method to filter by runID (and repoName/headSHA as needed) rather than deleting
all {repo}-{sha} entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c8ce2a0e-2229-48cb-b0e6-4517747ca6a5
📒 Files selected for processing (5)
config/config-matterwick.default.jsonserver/config.goserver/e2e_tests.goserver/push_events.goserver/workflow_run.go
✅ Files skipped from review due to trivial changes (1)
- config/config-matterwick.default.json
…t reaped
When two flows fire on the same commit SHA (notably the monthly CMT trigger and
the nightly trigger on the same master HEAD), the completing test workflow would
destroy the other flow's still-active servers, because cleanup matched every
{repo}-…-{sha} key. Scope cleanup to the completing workflow's flow: CMT test
completions reap only {repo}-cmt-… keys, other E2E completions reap only
non-CMT (push/scheduled) keys.
We can't key on a run ID as CodeRabbit suggested: GitHub's workflow_run payload
carries the *test* run's id (not the trigger run id embedded in the tracking
key) and omits workflow_dispatch inputs, so head_sha + flow is the only viable
correlation. Branch-advance SHA desync remains handled by the periodic orphan
cleanup scan.
Extracted instanceKeyMatchesSHA + added TestInstanceKeyMatchesSHA covering the
collision.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
….18, 11.7.1) Use the latest patch of the v10.11 ESR line and the v11.7 feature line (both verified present on Docker Hub) instead of the .0 GA placeholders, so CMT tests the currently-shipping builds. Keeps the code default in sync with the gitops config (mattermost/gitops-platform#72). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
One cloud instance is provisioned per version, so 10 versions = 10 instances. Allows the full ESR-to-current version set to run in a single CMT run. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Instead of a hardcoded list that needs monthly edits, matterwick now resolves the CMT version set at trigger time from the mattermost/mattermost GitHub releases: the active ESR line(s) + the latest 3 stable minor lines + the current release candidate, latest patch of each, deduped and capped at 10. - resolveCMTServerVersions() (server/e2e_tests.go): fresh fetch per CMT trigger (no cache; CMT runs rarely), per_page=100, parses tag_name/draft/prerelease/body. ESR lines detected via 'Extended Support Release' in the release body. RC included only when newer than the latest stable. Falls back to defaultCMTServerVersions on API error. Adds a small internal semver parser/comparator (no new dependency). - cmtServerVersions(): explicit Config.CMTServerVersions is an optional manual override; empty (normal) => auto-resolve. Removed the old Config.CMTVersions(). - handleCMTTrigger uses s.cmtServerVersions(). - Tests: auto-derived set, manual override (no API call), API-error fallback, RC exclusion when stale, and parseCMTVersion edge cases. Raised the cap test to 10. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
NARSimoes
left a comment
There was a problem hiding this comment.
Thanks @yasserfaraazkhan , just a few clarifications.
| "E2EInstanceMaxAge": 6 | ||
| "E2EInstanceMaxAge": 6, | ||
| "CMTTriggerWorkflowName": "CMT Provisioner", | ||
| "CMTServerVersions": ["10.11.18", "11.7.1"] |
There was a problem hiding this comment.
How are we planning to update these versions? Might better discuss in DM.
If understood correctly, the logic is to set "CMTServerVersions: []" in order to allow matterwick get the latest supported version and provisions instances for those versions.
There was a problem hiding this comment.
@NARSimoes https://github.com/mattermost/gitops-platform/pull/72/changes#diff-d242a4ffee35793c04e9591a799f9136234ba74b276617420390641ae142aa9fR96
the deployed values will be empty so it will fetch the latest versions. In this config the values is just placeholders for demo purpose.
| req, err := client.NewRequest("POST", | ||
| fmt.Sprintf("/repos/%s/%s/actions/workflows/compatibility-matrix-testing.yml/dispatches", repoOwner, repoName), | ||
| map[string]interface{}{ | ||
| "ref": branch, |
There was a problem hiding this comment.
Sanity check: It seems for the new CMT logic we are provisioning using headBranch as ref but the clean up findAndDestroyInstancesBySHA is using headSHA. If this is accurate we'll keep leftovers for CMT runs
There was a problem hiding this comment.
Thanks @NARSimoes , the CMT will now use the SHA and clean up after completion
Non-PR flows (CMT, nightly/scheduled, push/release) provision instances keyed by the trigger SHA, then dispatch the test workflow with ref=branch (workflow_dispatch cannot take a commit SHA). GitHub records the dispatched run's head_sha as the branch HEAD at dispatch time, which can differ from the trigger SHA when the branch advances during the (often long) provisioning window. Completion cleanup matches instances by the run's head_sha, so a moved branch made the primary SHA-match miss its instances. Resolve the branch HEAD via the GitHub commits API right before dispatch and key the tracking entry on that SHA so it aligns with the run's head_sha. Falls back to the trigger SHA on error; the periodic age-based scan remains the backstop either way. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
PR E2E instances had no completion-based teardown and were explicitly skipped by the periodic scan, so a long-open PR could keep its servers alive indefinitely (only PR-close or the E2E/Reset-Servers label tore them down). Extend the periodic scan to also reap PR instances past a longer, separate max-age (E2EPRInstanceMaxAge, default 24h vs 3h for non-PR), and evict the reaped PR's in-memory tracking entry. Eviction is what makes a re-applied E2E/Run provision a fresh set: with the stale entry gone, the reuse path in handleE2ETestRequest finds no live instances and creates new servers instead of dispatching against deleted ones. Renames cleanupStaleNonPRE2EInstances -> cleanupStaleE2EInstances since it now handles both kinds. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CMT is now driven by release-branch events instead of a monthly schedule: desktop fires on push to release-v* (the cut + subsequent pushes), mobile fires only on the release-v* branch cut via `on: create`. Because `create` delivers a workflow_run webhook for every branch/tag creation (even when the trigger workflow's job is skipped), Matterwick must gate independently. Add shouldTriggerCMT: provision CMT only when the run was a manual workflow_dispatch (any branch) or ran on a release branch (head_branch matches isReleaseBranch). This filters mobile's create-event noise and non-release pushes while keeping ad-hoc manual runs working. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The compatibility-matrix-testing.yml cleanup step that consumed cmt_run_id was removed (Matterwick now reaps servers by commit SHA on workflow_run completion), leaving the input dead. Drop it from the dispatch payload and remove the now-unused sha/runID params from dispatchCMTWorkflow. This pairs with removing the cmt_run_id input declaration from the desktop/mobile CMT workflows. Deploy ordering: this Matterwick image must be live before/with those workflow changes — once the input is undeclared, sending it would 422 (the mw_tracking_key class). Prod matterwick (v0.4.12) does not dispatch these workflows, so the coordinated rollout is safe. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace branch-based CMT triggering with RC-tag triggering. The desktop &
mobile RC cut (e.g. v6.2.0-rc.1) is the moment the release team actually
wants compatibility-matrix validation — earlier than a release branch
push, scoped to the artifact that will ship.
For tag pushes GitHub sets workflow_run.head_branch to the tag name (no
refs/tags/ prefix), so the gate matches the tag string directly:
- isRCTag: ^v?\d+\.\d+\.\d+-rc[.\-]?\d+$
matches v6.2.0-rc.1, v2.41.0-rc.10, 6.2.0-rc1; rejects GA tags
(v6.2.0), beta/alpha (v1.0.22-beta), and nightly (6.3.0-nightly.x).
- shouldTriggerCMT also keeps release-branch acceptance as
defense-in-depth and continues to allow manual workflow_dispatch on
any ref.
Unit tests lock in the matching/rejection set so future tag conventions
can't silently slip through.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
NARSimoes
left a comment
There was a problem hiding this comment.
Just two minor / nit comments that can be addressed in a a follow up. Looks good to me but would be nice a matterwick owner do a sanity check on edge cases before merge
| // desktop and mobile repos. It identifies CMT test-workflow completions so SHA-based cleanup | ||
| // only reaps CMT instances (keys "{repo}-cmt-…-{sha}") and leaves a concurrent nightly/push | ||
| // run on the same SHA untouched (and vice versa). | ||
| const cmtTestWorkflowName = "Compatibility Matrix Testing" |
There was a problem hiding this comment.
Curiosity: any particular reason to have CMT hardcoded while others are config driven?
|
|
||
| const maxVersions = 10 | ||
| if len(chosen) > maxVersions { | ||
| chosen = chosen[len(chosen)-maxVersions:] // keep the newest if somehow over the cap |
There was a problem hiding this comment.
nit: I think it's effectively the same due the amount of versions. Just highlighting to double check: the resolveCMTServerVersions keeps the newest 10 while handleCMTWithServerVersions the first 10 entris
Mobile doesn't cut RC git tags; its analog is the `build-release-NNN`
branch (3+ digit build number, e.g. build-release-786 / build-release-1100)
created by the "Mattermost Mobile Release" workflow when a release build
is dispatched to TestFlight / Play Store. Treat that as the RC moment for
mobile and run CMT against it.
- New isBuildReleaseBranch helper with regex `^build-release-\d{3,}$`
(3+ digits — rejects test artifacts like build-release-1 without
locking the upper bound, so a future move to 5-digit numbers needs
no code change). Platform-specific variants build-release-ios-NNN,
-sim-NNN, -android-NNN are intentionally excluded (don't start with
a digit after the prefix).
- shouldTriggerCMT extended to accept this in addition to manual
dispatch / RC tag / release branch.
- TestIsBuildReleaseBranch + extended TestShouldTriggerCMT cover the
boundary cases: real 3/4/5-digit build numbers (true), 1- and 2-digit
test artifacts (false), platform variants (false), no overlap with
existing gates.
The paired mobile workflow change adds the `build-release-[0-9][0-9][0-9]*`
branch pattern to cmt-provisioner.yml's push trigger.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The release team treats RCs as their go/no-go test gate and re-cuts on failures. Desktop CMT already runs the whole E2E suite across the multi-version matrix at each RC tag cut, and that matrix includes the latest server — so the per-push release E2E was redundant coverage for desktop. Skip it for desktop release-branch pushes. Mobile is unchanged: mobile CMT runs only smoke tests, so the whole-suite release E2E on `release-*` push remains the per-commit signal there. Other auto-triggers are unaffected: - master/main push trigger: still fires - nightly trigger: still fires - PR label trigger: still fires - mobile release-branch push: still fires Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Every push to a mobile release-2.X branch was firing a full PR-style E2E suite — 3 cloud servers + 10 macOS shards + 10 ubuntu shards + iOS/Android builds, ~60-90 min wall-clock, ~$27/push in GitHub Actions alone. Recent cadence on real release branches: release-2.41: 191 commits (~$5,160) release-2.40: 158 commits (~$4,270) release-2.39: 114 commits (~$3,080) release-2.38: 90 commits (~$2,430) release-2.37: 52 commits (~$1,400) ~$16K across these 5 branches alone, with ~15% of runs triggered by pure noise (translation updates, "Bump app build number", docs/CI tweaks) that can't meaningfully fail differently than the previous run. The release team treats RC builds as the test gate (re-cut on failure), which makes per-cherry-pick E2E during stabilization redundant. Same logic that already removed the desktop release-branch trigger; now applied to mobile too. After this change, auto E2E coverage on release-side: - master/main push -> full PR-style E2E (unchanged, single per-commit signal) - build-release-NNN push -> CMT (smoke x N server versions, mobile only) - RC tag cut -> CMT (desktop only; mobile doesn't tag) - PR label E2E/Run -> full PR-style E2E (unchanged) Full-suite-on-build-release-NNN is NOT wired today (separate decision). E2EAutoTriggerOnRelease and E2EReleasePatternPrefix config fields remain in place — still used by isReleaseBranch() as defense-in-depth in the CMT workflow_run gate (shouldTriggerCMT). Leaving them avoids an unrelated gitops/config diff in this change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Every push to mobile main was firing a full PR-style E2E suite (3 cloud
servers, 10 macOS shards + 10 ubuntu shards, ~$27/push, ~60-90 min). Main
sees ~5-20 merges/day during active development -> tens of thousands of
$/year for post-merge regression signal that PR-label E2E already covers
pre-merge.
Skip the master/main gate for mobile; desktop master push still fires.
Coverage that remains for mobile main:
- PR-label E2E pre-merge (developer-driven, single PR's branch tested)
- Manual workflow_dispatch
- CMT (smoke x N versions) on build-release-NNN cut
Full-suite E2E on build-release-NNN remains ungated. If desired, two ways
to wire it:
1. Separate matterwick push-handler gate for isBuildReleaseBranch
2. Promote latest version inside CMT to whole-suite (covers RC fully
without a parallel matterwick path)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…er code
The mobile CMT_MATRIX now marks the highest-semver server entry as
`latest: true`. The mobile workflow uses that flag (via expressions in
`with:`, which has matrix context unlike job-level `if:`) to run the
whole detox suite against latest and smoke-only against older versions
— closing the full-suite-on-RC gap without a separate matterwick path.
Matterwick stays repo-agnostic: it only signals "which is latest". The
test-directory / parallelism choices that flow from that live in the
mobile workflow, where the repo's structure is known.
Dead-code cleanup now that the surrounding triggers are gone:
- handleNightlyE2ETrigger (~95 lines) + its workflow_run gate. Both
repos removed e2e-nightly-trigger.yml in prior commits, so no
workflow with name "E2E Nightly Trigger" can fire matterwick on
workflow_run.requested any longer.
- E2ENightlyTriggerWorkflowName config field — no readers left.
- E2EAutoTriggerOnRelease config field — the release-branch push gate
that read it was removed in the prior commit; no readers left.
- runType MASTER/RELEASE branching in trigger*ForPushEvent — only
master/main pushes reach those functions now (release-branch and
mobile-main paths are skipped earlier), so RELEASE is unreachable.
Hardcode "MASTER".
- Stale log fields (auto_release, release_pattern_prefix,
is_release_branch) in the push-event fallthrough log.
What stays in use:
- isReleaseBranch + E2EReleasePatternPrefix — defense-in-depth in
shouldTriggerCMT (the CMT workflow_run gate also accepts release-*
refs from manual dispatches).
- cmtOnly cleanup scoping — cheap insurance against hypothetical
same-SHA collisions between flows.
Net: -158 lines (-95 nightly handler, -36 config refs, -27 misc), +new
tests for the latest-marker boundary cases (5-element ESR+minors+RC,
stable vs same-X.Y.Z RC, multi-digit rc.N, single version, all
unparseable fallback).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Matterwick's nightly handler was removed in prior commits; this dispatch
function still took a `nightly bool` parameter and sent `"nightly": ...`
in the e2e-functional.yml workflow inputs map. Push-event callers always
passed `false`, and the PR-label path was a separate function that never
went through here at all — so the param was dead.
Companion change in desktop (paired): remove the `nightly` workflow input
from e2e-functional.yml + e2e-functional-template.yml and the
`if inputs.nightly` block that built `desktop-nightly-${OS}` build IDs.
- dispatchDesktopE2EWorkflow: drop `nightly bool` parameter
- drop "nightly": fmt.Sprintf("%t", nightly) from workflowInputs
- update sole caller (triggerDesktopE2EWorkflowForPushEvent)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Reworks Compatibility Matrix Testing (CMT) to a webhook-driven, self-updating model, fixes the bug that broke every non-PR E2E dispatch, and makes instance cleanup correct end-to-end across all trigger flows.
Root-cause fix: undeclared
mw_tracking_key→ HTTP 422matterwick was sending an
mw_tracking_keyworkflow_dispatchinput that none of the target workflows declare. GitHub rejects a dispatch carrying an undeclared input with HTTP 422, so release, scheduled, and CMT dispatches all failed (PR-label worked only because it never sent it). Removed from every dispatch path.CMT is webhook-driven (no HTTP endpoints)
Since the version set is resolved by matterwick, nothing needs to be passed from the workflow — so the
/cmt_dispatchand/cleanup_e2eendpoints (and theirCMTTriggerSecret/CleanupSecret) are removed. Instead a lightweightCMT Provisionerworkflow fires → matterwick hearsworkflow_run.requested→handleCMTTriggerprovisions one server per version → dispatchescompatibility-matrix-testing.yml.Trigger (replaces the monthly schedule): CMT now runs on release branches. Desktop fires on push to
release-v*(the cut + subsequent pushes); mobile fires only on therelease-v*branch cut (on: create). Becausecreatedelivers aworkflow_runwebhook for every branch/tag creation, matterwick gates withshouldTriggerCMT: provision only when the run was a manualworkflow_dispatch(any branch) or ran on a release branch. Manual runs from the Actions tab still work for both repos.Note — intentional duplicate coverage: a
release-v*push/cut also triggers matterwick's normal release E2E (handlePushEventE2E: whole suite, latest server). CMT additionally runs its matrix (desktop: whole suite per version; mobile: smoke per version), whose set includes the latest server — so the latest server is exercised twice. Expected and accepted.Auto-derived CMT version set (no monthly edits)
resolveCMTServerVersions()fetches themattermost/mattermostGitHub releases at each trigger and builds the set: active ESR line(s) + latest 3 stable minor lines + current RC, latest patch each, deduped, capped at 10 (raised from 5). ESR lines are detected from the release notes ("Extended Support Release").Config.CMTServerVersionsis an optional manual override;defaultCMTServerVersions(10.11.18,11.7.1) is only a fallback for a GitHub-API outage.Cleanup — correct across every flow
Two layers, audited end-to-end:
Primary, completion-based (non-PR: CMT / nightly / push-release). On the test workflow's
workflow_runcompleted event, instances are matched by commit-SHA suffix, scoped per flow (CMT vs non-CMT) so two runs sharing a SHA (e.g. the monthly nightly + CMT on the samemastercommit) can't tear down each other's servers.ref: <branch>(workflow_dispatch can't take a SHA), so the run'shead_shais the branch HEAD at dispatch time. Previously we keyed on the trigger SHA, so if the branch advanced during the long provisioning window the primary match missed its own instances. matterwick now resolves the branch HEAD via the commits API right before dispatch and keys on that, so it lines up with the completed run'shead_sha. Falls back to the trigger SHA on API error.Backstop, age-based periodic scan (
cleanupStaleE2EInstances). Catches anything the primary path misses.E2EInstanceMaxAge(default 3h).E2EPRInstanceMaxAge(default 24h). When a PR instance is reaped, its in-memory tracking entry is evicted — this is what makes a re-appliedE2E/Runprovision a fresh set instead of dispatching against deleted servers.PR teardown triggers unchanged otherwise:
E2E/Reset-Serverslabel = explicit tear-down-now; PR close = cleanup; removingE2E/Runis intentionally a no-op (so it doesn't kill the still-finishing run).Key files
server/workflow_run.go—handleCMTTrigger;shouldTriggerCMTrelease-branch/manual gate; flow-scoped SHA cleanup; dispatch-time HEAD SHA re-key for CMT + nightly; removedhandleCMTRunCleanup; droppedmw_tracking_key.server/push_events.go— dispatch-time HEAD SHA re-key for push/release.server/e2e_tests.go—resolveCMTServerVersions()+cmtServerVersions()+ semver comparator;resolveBranchHeadSHA();cleanupStaleE2EInstances(renamed, now also reaps aged-out PR instances) +evictReapedPRInstances(); removedmw_tracking_keyfrom dispatch.server/config.go+ default JSON —CMTServerVersions/CMTTriggerWorkflowName/E2EPRInstanceMaxAge; removed the secrets.server/cmt_dispatch.go,server/cleanup_e2e.go(+ tests).Companion PRs
mattermost/desktop#3829,mattermost/mattermost-mobile#9800,mattermost/gitops-platform#72.Verification
go build ./...,go vet ./server/..., andgo test ./server/...pass. New unit tests coverresolveBranchHeadSHA, the PR max-age knob, andevictReapedPRInstances(the re-provision-after-expiry guarantee). The CMT resolver was validated against live GitHub data — today it yields10.11.19, 11.5.7, 11.6.4, 11.7.2, 11.8.0-rc3(all confirmed as real Docker tags).🤖 Generated with Claude Code
Release Note