Percona-Lab · plebioda · Jun 22, 2026 · Jun 22, 2026 · Jun 26, 2026 · Jun 26, 2026
diff --git a/README.md b/README.md
@@ -72,6 +72,18 @@ fork:
       - important_files:
           - src/core/critical_module.cpp
           - src/api/public_api.h
+  merge_gate:
+    min_commits: 50          # merge once this many unmerged commits accumulate
+    max_age_days: 2          # ...or sooner if the oldest unmerged commit is older than this
+    max_commits: 150         # never advance more than this many commits in one merge
+    force_strategies:        # ...or as soon as one of these strategies matches
+      - conflict
+      - important_files
+  ai_pick:
+    enabled: false                             # false = deterministic pick (default)
+    agent: claude-cli:claude-opus-4-5          # same format as resolve.agent
+    rules_file: .mergai/merge_pick_rules.md    # project-specific rules (optional)
+    fallback: deterministic                    # on agent error / invalid sha: deterministic | error
 
 resolve:
   # Agent format: <agent-type>[:<model>]
@@ -130,6 +142,47 @@ The `fork.merge_picks.strategies` list defines how `mergai fork merge-pick` prio
 
 Set `most_recent_fallback: true` to select the most recent unmerged commit when no strategy matches.
 
+### Merge Gate and AI Pick
+
+A deterministic **gate** decides *when* to merge; the **pick** decides *which* upstream commit to merge to. The gate is a pure go/no-go decision over already-computed fork status, so it needs no AI tokens.
+
+`fork.merge_gate` opens the gate when any of the following hold (in order):
+
+| Setting | Default | Opens the gate when... |
+|---------|---------|------------------------|
+| `force_strategies` | `[conflict, important_files]` | any prioritized commit matches one of these strategies (reason `force:<name>`) |
+| `min_commits` | `50` | at least this many unmerged commits have accumulated |
+| `max_age_days` | `2` | the oldest unmerged commit is at least this many days old |
+| `max_commits` | `150` | (not a trigger) batch ceiling: a single merge never advances more than this many commits |
+
+`max_commits` defines the **candidate window** - the oldest `max_commits` unmerged commits (`base..base+max_commits`). It bounds both the merge batch size and the AI prompt size; commits newer than the window are omitted (counted) and drained by later merges. Defaults are tied to the historical merge cadence (median ~47 commits/merge, p75 ~67).
+
+`fork.ai_pick` configures the AI pick (`mergai fork merge-pick --ai`):
+
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `agent` | `""` | Agent descriptor (e.g. `claude-cli:claude-opus-4-5`), same format as `resolve.agent`. Empty falls back to `resolve.agent`. |
+| `rules_file` | `""` | Optional path to a project-specific merge-pick rules markdown file, appended to the built-in system prompt. |
+| `fallback` | `deterministic` | On agent error / invalid sha: `deterministic` (resilient) or `error`. |
+
+The gate decision and the picks are separate, explicit commands:
+
+```bash
+mergai fork merge-pick --plan            # token-free gate decision (JSON): action + reason
+mergai fork merge-pick --gate            # gate-respecting deterministic pick; prints the chosen sha
+mergai fork merge-pick --ai --next       # AI pick within the window; prints the chosen sha
+mergai fork merge-pick --ai --force      # skip the gate re-check and pick regardless
+```
+
+`--plan` emits the gate's go/no-go decision the periodic workflow consumes, e.g.:
+
+```json
+{ "action": "wait", "reason": "wait (12 < 50 commits; oldest 0.3d < 2d)" }
+{ "action": "merge", "reason": "min_commits (63 >= 50)" }
+```
+
+Which commit to merge to is then chosen explicitly with `--gate` (deterministic) or `--ai`. The gate decision is also surfaced in `mergai fork status` (text and `--json`).
+
 ### Branch Naming Format
 
 The `branch.name_format` setting controls how mergai names branches. Available tokens:
@@ -196,6 +249,9 @@ Notes are automatically attached when using `mergai commit` subcommands. Use `me
 |---------|-------------|
 | `mergai config` | Configure git settings (conflictstyle, notes display) |
 | `mergai fork merge-pick` | Get prioritized commits from upstream based on configured strategies |
+| `mergai fork merge-pick --plan` | Token-free merge-gate decision (JSON): whether to merge and which sha (deterministic mode) |
+| `mergai fork merge-pick --gate` | Token-free gate-respecting deterministic pick within the candidate window (bare sha) |
+| `mergai fork merge-pick --ai` | AI-assisted pick of the merge boundary within the candidate window |
 | `mergai fork fetch` | Fetch upstream repository |
 | `mergai context init` | Initialize merge context with commit SHA and target branch |
 | `mergai notes update` | Fetch and merge notes from remote |

diff --git a/src/mergai/ci/context_builders/bazel.py b/src/mergai/ci/context_builders/bazel.py
@@ -88,15 +88,24 @@ def build_context(
         )
 
         failures: list[dict[str, Any]] = []
-        bep_path: Path | None = None
+        bep_paths: list[Path] = []
         if artifact_dir is not None:
-            candidate = artifact_dir / "bazel-bep.json"
-            if candidate.is_file():
-                bep_path = candidate
-                failures = self._parse_bep(candidate)
+            # Discover every BEP stream in the artifact. The build/unittests
+            # jobs upload a single `bazel-bep.json`, but the jstests job runs
+            # resmoke in several invocations and uploads one BEP per invocation
+            # (`bazel-bep.json` for the reliable batch plus
+            # `bazel-bep-<suite>.json` per load-sensitive suite). Parsing only
+            # the fixed name would miss failures isolated to a load-sensitive
+            # suite, so glob and concatenate all of them.
+            bep_paths = sorted(
+                p for p in artifact_dir.glob("bazel-bep*.json") if p.is_file()
+            )
+            if bep_paths:
+                for p in bep_paths:
+                    failures.extend(self._parse_bep(p))
             else:
                 log.info(
-                    "Artifact %s has no bazel-bep.json; BEP summary unavailable",
+                    "Artifact %s has no bazel-bep*.json; BEP summary unavailable",
                     artifact_dir.name,
                 )
         else:
@@ -136,7 +145,7 @@ def build_context(
         details = self._render_details(
             artifacts_dir=artifacts_dir,
             artifact_dir=artifact_dir,
-            bep_path=bep_path,
+            bep_paths=bep_paths,
             failures=failures,
             job_logs=job_logs,
         )
@@ -205,7 +214,7 @@ def _render_details(
         *,
         artifacts_dir: str,
         artifact_dir: Path | None,
-        bep_path: Path | None,
+        bep_paths: list[Path],
         failures: list[dict[str, Any]],
         job_logs: list[tuple[str, Path]],
     ) -> str:
@@ -231,24 +240,25 @@ def _render_details(
 
         if failures:
             lines = ["## Failing bazel targets"]
-            if bep_path is not None:
-                lines.append(f"_Source: `{bep_path}`_")
+            if bep_paths:
+                src = ", ".join(f"`{p}`" for p in bep_paths)
+                lines.append(f"_Source: {src}_")
             lines.append("")
             for entry in failures[:_MAX_FAILURE_LINES]:
                 lines.append(f"- `{entry['label']}` ({entry['kind']})")
             if len(failures) > _MAX_FAILURE_LINES:
                 lines.append(
                     f"- ...{len(failures) - _MAX_FAILURE_LINES} more "
-                    f"(read `{bep_path}` for the full list)"
+                    f"(read the Build Event Protocol stream(s) for the full list)"
                 )
             sections.append("\n".join(lines))
 
         nav_lines = ["## Where to find more"]
         nav_lines.append(f"- Artifacts directory: `{artifacts_dir}`")
         if artifact_dir is not None:
             nav_lines.append(f"- Bazel artifact directory: `{artifact_dir}`")
-        if bep_path is not None:
-            nav_lines.append(f"- Build Event Protocol stream: `{bep_path}`")
+        for p in bep_paths:
+            nav_lines.append(f"- Build Event Protocol stream: `{p}`")
         nav_lines.append("")
         nav_lines.append(
             "Use your filesystem tools (Read, Bash, Glob, Grep) to "

diff --git a/src/mergai/ci/dispatch.py b/src/mergai/ci/dispatch.py
@@ -252,6 +252,23 @@ def act(
         return skip("incomplete")
 
     if run.conclusion == "cancelled":
+        # A cancelled run is usually nothing to fix: a user/timeout
+        # cancellation, or a run superseded by a newer push. But under the
+        # default fail-fast matrix policy GitHub cancels the sibling jobs the
+        # moment one matrix job fails, and the run conclusion rolls up to
+        # `cancelled` even though that one job has a genuine failing step.
+        # Detect that and route it through the failure path so `ci fix`
+        # handles it instead of skipping. Unlike a `failure` run (whose cheap
+        # default is "actionable", with side-calls only downgrading it), a
+        # cancelled run's cheap default is "skip", so the per-job look is the
+        # only way to surface the masked failure, and it fails *closed*
+        # (`fail_open=False`): without positive evidence of a failing step a
+        # plain cancellation stays skipped rather than spinning the agent
+        # against an empty context. Only head-current cancelled runs reach
+        # here (superseded/obsolete short-circuit above), so the one extra
+        # call is bounded even on the `ci list` path.
+        if pr_number is not None and _has_failing_step(app, run, fail_open=False):
+            return act("failure", findings_queried=False)
         return skip("cancelled")
 
     if run.conclusion == "failure":
@@ -381,21 +398,33 @@ def _approval_was_rejected(
     )
 
 
-def _has_failing_step(app: AppContext, run: "github.WorkflowRun.WorkflowRun") -> bool:
+def _has_failing_step(
+    app: AppContext,
+    run: "github.WorkflowRun.WorkflowRun",
+    *,
+    fail_open: bool = True,
+) -> bool:
     """Whether any job in ``run`` has a step that reported a failure.
 
     General safety net for non-code failures: a rejected approval, a
     cancelled/timed-out run, runner death, or ``startup_failure`` all yield a
     ``failure`` conclusion with no failing step, so there is nothing for the
-    agent to fix. Errors fail open (return ``True``) so a side-call failure
-    never blocks a genuine fix.
+    agent to fix. It also distinguishes a fail-fast-cancelled run (one matrix
+    job failed, siblings cancelled) from a plain cancellation.
+
+    ``fail_open`` sets the side-call error default. For a ``failure``
+    conclusion (``fail_open=True``) errors return ``True`` so a flaky call
+    never blocks a genuine fix; the run already looks broken. For a
+    ``cancelled`` conclusion (``fail_open=False``) promotion to a fix relies
+    on positive evidence, so errors return ``False`` and the run keeps its
+    plain-cancellation skip.
     """
     try:
         return any(
             s.conclusion == "failure" for j in run.jobs() for s in (j.steps or [])
         )
-    except Exception:  # noqa: BLE001 — best-effort detection; fail open
-        return True
+    except Exception:  # noqa: BLE001 - best-effort detection
+        return fail_open
 
 
 def _skip_message(