feat(review): flag merge conflicts in posted review body by aiolibsbot · Pull Request #1 · aiolibsbot/koan

aiolibsbot · 2026-05-17T16:33:30Z

What

Make the /review skill surface PR merge-conflict state in the posted review comment.

Why

The review skill never read the PR's mergeable field, so reviewers got a clean technical critique with no mention of conflicts. Real-world hit: aio-libs/yarl#1638 was in CONFLICTING state when reviewed — the bot stayed silent about it, leaving the author to discover the merge block on their own.

How

fetch_pr_context() now requests mergeable from gh pr view --json … and surfaces it as context["mergeable"] (defaults to "UNKNOWN" for compatibility).
run_review() prepends a > ⚠️ **Merge conflicts detected** … blockquote to the final review_body whenever mergeable == "CONFLICTING", before posting. The notice references the base branch by name so the author knows what to rebase against.
Runs after both the JSON and regex-fallback formatting paths, so the warning shows regardless of how Claude's response was parsed.

Testing

New tests in test_review_runner.py::TestRunReview:
- test_conflict_notice_prepended_for_conflicting_pr — verifies the warning is inserted before ## PR Review when mergeable=CONFLICTING.
- test_no_conflict_notice_for_mergeable_pr — no warning for healthy PRs.
New test_rebase_pr.py::TestFetchPrContext::test_extracts_mergeable_status plus default-UNKNOWN assertion.
make lint clean (also collapsed a pre-existing SIM105 in cli_exec.py).
make test: 12 780 pass; one pre-existing flake (test_git_sync.py::TestGitSyncCLI::test_cli_runs_sync) — passes in isolation.

🤖 Generated with Claude Code

Adds ruff lint config with the PERF ruleset, excludes tests/ from PERF checks (fixture loops add little perf value), and rewrites the production + skill violations: list append in for-loop → list.extend / comprehension, dict iterator key→value, dict-build loop → dict comprehension, slice copy → list.copy. All 12633 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`gh run list --branch X --limit 1` was returning the "Dependabot auto-merge" workflow (conclusion=skipped) on non-Dependabot PRs, which check_ci_status, wait_for_ci, and check_existing_ci all treated as a failure. drain_one then kept injecting /ci_check fix missions against a PR whose real CI was green — see aio-libs/yarl#1681 for the field report. Add aggregate_ci_runs(runs) and fetch_branch_ci_runs(branch, repo) helpers in claude_step. The aggregator filters out skipped, cancelled, neutral, and action_required conclusions, then reduces the remaining runs to (status, run_id) with failure > pending > success priority. All three CI status callers now share the same filtering, fetching up to 20 runs per branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When GitHub's Private Vulnerability Reporting is enabled on a target repo, /security_audit now submits critical/high findings as private security advisories instead of public issues. Lower-severity findings remain public. Configurable per-project via projects.yaml security section (pvrs mode + threshold). Graceful fallback to public issues on any PVRS API failure. Implements: Anantys-oss#1341 Plan: PR Anantys-oss#1269 (docs/plan-pvrs-security-audit.md) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ails publicly

Ensure all remote tracking refs for the base branch are fresh before any mission work begins — both new branches and rebases. Two complementary changes: - git_prep: after primary sync, fetch base from all secondary remotes - claude_step: pre-fetch all relevant remotes before the rebase loop Addresses review feedback on PR Anantys-oss#1333 about keeping main up to date regardless of its name before working on branches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a Linting section to CLAUDE.md documenting ruff as the project's linter, and add a `make lint` Makefile target so contributors can check compliance before pushing. Addresses review feedback on Anantys-oss#1345. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds a third messaging provider option (KOAN_MESSAGING_PROVIDER=matrix) implemented as a thin synchronous wrapper around the Matrix Client-Server HTTP API. No new Python dependencies — uses the existing requests package, mirroring the Telegram provider's style. What: - app/messaging/matrix.py — MatrixProvider with send/poll/typing using PUT /_matrix/client/v3/rooms/{roomId}/send/m.room.message/{txnId} and long-polling GET /_matrix/client/v3/sync. - Auto-registered via _PROVIDER_MODULES in app/messaging/__init__.py. - Filters: only m.room.message events with msgtype=m.text, ignores the bot's own user_id and events from rooms other than the configured one. - Initial /sync discards historical events; only the next_batch cursor is kept, so the bot doesn't replay history on startup. - 5xx errors are retried via retry_with_backoff; 4xx fail fast. - Onboarding wizard gains a Matrix option that writes the four env vars. - docs/messaging-matrix.md walks through setup (account, access token, room ID); README, INSTALL.md, and cross-doc links updated. - 29 unit tests cover config validation, send chunking, URL encoding, sync cursor handling, message filtering, and the typing indicator. Why: The codebase already had a clean MessagingProvider abstraction with Telegram and Slack implementations — Matrix slots in without touching any of the bridge code, giving self-hosted / federated users a third option without reaching for slack-sdk or a Telegram bot account. Note: end-to-end encryption (Olm/Megolm) is intentionally out of scope. Operators should use unencrypted rooms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…legacy

…ppress Add SIM105 to the enforced ruff rule set and refactor all 64 existing violations to use `contextlib.suppress(...)`. This catches the try/except/pass pattern automatically so future PRs cannot reintroduce it. Tests untouched by intent — only the suppression idiom changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…trying When a fork PR is opened by a first-time contributor, GitHub Actions gates every workflow run on a maintainer "Approve and run" click. The gh API exposes these runs with status="action_required" (or "waiting" for environment-protected jobs) and no conclusion. Until this change, aggregate_ci_runs() classified them as "pending" and drain_one() left the PR in ## CI indefinitely — every iteration polled again, and retried the /ci_check fix loop on stagnation. See aio-libs/aiohttp#12553 where Kōan retried multiple times against runs that physically could not start until a human clicked Approve. Now aggregate_ci_runs() returns a new "blocked_approval" status when any run is gated this way (failure still wins so a genuinely broken workflow on the same push gets surfaced). drain_one() removes the PR from ## CI and notifies the outbox so the operator can either approve in the GitHub UI or ping the maintainer. run_ci_check_and_fix() and rebase_pr._run_ci_check_and_fix() exit early without attempting fixes when CI is blocked, since pushing more commits triggers the same gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Skill runners (rebase_pr, recreate_pr) invoke Claude via run_claude(), which used subprocess.run with capture_output=True. That buffers all stdout until the subprocess exits, so the run.py liveness watchdog (600s no-output kill) fires whenever a single Claude call exceeds the threshold — observed on the aioesphomeapi #1660 rebase. A secondary failure mode: subprocess.run timeout sends SIGKILL to the direct child but not grandchildren. When Claude has spawned helper subprocesses that inherit stdout, communicate()'s post-kill drain can block indefinitely, so the inner 120s/300s/600s timeouts never fire. Switch run_claude to popen_cli with start_new_session=True and stream stdout line-by-line through sys.stdout. Every emitted line resets the parent watchdog and surfaces real-time progress to /live. A threading Timer-backed watchdog kills the entire process group via os.killpg on timeout, breaking the grandchild-pipe hang. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…og race

…eanup

Three file writes used non-atomic operations (write_text / json.dump), risking corruption if the process is killed mid-write: - quarantine cap (_enforce_quarantine_cap): truncated quarantine could lose entries on crash during write - outbox staging (flush): partial staging file causes garbled Telegram messages on recovery - stagnation retry tracker: corrupted JSON resets all retry counters All three now use atomic_write / atomic_write_json (temp + fsync + rename). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace raw "blocked_approval" string literals across ci_queue_runner, claude_step, and rebase_pr with a single exported constant to prevent typos and improve grep-ability. Closes review comment from PR Anantys-oss#1351. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Five pre-existing isolation bugs surfaced when running the suite under pytest-xdist. Each failure was a real test pollution issue that also showed up in stand-alone subset runs — xdist just made it routine. - conftest: give every xdist worker its own KOAN_ROOT directory. `app.utils` snapshots KOAN_ROOT at import time, so workers that inherit the same env race on shared files under that path. - test_github_command_handler / test_github_subscribe: autouse-mock `_is_subject_closed`, which otherwise hits the real GitHub API (network-flaky in serial, parallel-unsafe under xdist). - test_mission_retry: autouse-reset the `_last_mission_*` flags on `app.run`. Other tests leave them set; xdist surfaced the leak. - test_skill_dispatch: patch parse_pr_url on the handler module's local binding instead of on `app.github_url_parser`. The handler does `from app.github_url_parser import parse_pr_url` at module load, so patching the source after that import leaks a stale binding into later tests that import the same handler. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The full suite takes ~3m38s serial. Adding pytest-xdist drops it to ~1m58s on 2 workers (1.85x) and ~1m08s with -n auto (3.2x) on this box. Makefile picks the worker count based on environment: - CI / GitHub Actions -> -n auto - Remote SSH session -> -n 2 (be polite on shared hosts) - Local terminal -> -n auto Override via `make test PYTEST_WORKERS=N` (PYTEST_WORKERS=0 disables xdist). All commands use `--dist loadfile` so each test file lands on one worker; that keeps file-internal collection order intact and avoids surprises from class-level state. CI (.github/workflows/tests.yml) installs pytest-xdist and passes `-n auto --dist loadfile` to every pytest invocation.

…eshold Adds two behavioral tests to TestCompactLearnings confirming that the git subprocess (_get_file_tree) is never spawned when compaction is skipped due to the line-count threshold or hash-unchanged guard. Closes Anantys-oss#1322 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When a /rebase, /review, /recreate, /squash, /ci_check, /fix, /check, or /gh_request mission is queued for a URL that already has an identical command pending or in progress, the insert is silently skipped and the user is notified with a warning message. This prevents the queue from filling with redundant missions when the same action is triggered from multiple sources (Telegram, GitHub notifications, check_runner). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e duplicate handling boilerplate

The stale-module detector in `_refresh_stale_app_modules` only reloaded `app.*` modules whose mtime had changed since a previously cached value. On the very first observation of a module the function silently recorded the current mtime without reloading, on the assumption that the first encounter happens during a fresh boot when in-memory and on-disk content already agree. That assumption breaks when auto-update (or a manual `git pull`) rewrites a file BEFORE the first post-update skill invocation. The new file mtime is then recorded as the baseline while `sys.modules` still holds the pre-update module, and no later refresh ever notices the divergence. Reproduction: a Kōan process booted before `5742248` (`refactor(missions): consolidate project-tag regexes`) pulled the commit, then received `/list`. The handler does `from app.utils import PROJECT_NAME_CHARS` and crashed with: Skill error (core.list): cannot import name 'PROJECT_NAME_CHARS' from 'app.utils' (.../koan/app/utils.py) Because the trap is "any new symbol added to any `app.*` module after the process started", every future addition to `app.utils` (or any sibling) would re-trigger the same failure on first use after update. Fix: capture `_PROCESS_START_TIME = time.time()` when `app.skills` is imported, and treat a first-time observation as stale whenever the source file's mtime is greater than that timestamp. Clean boots stay zero-cost (file mtime ≤ process start ⇒ skip the reload, just record the baseline) while post-update first observations now reload as intended. Tests: update the existing "first encounter, no reload" case to stub `_PROCESS_START_TIME` past the fake file's mtime so it keeps exercising the old-file branch, and add a new regression test asserting that a first observation of a file newer than process start triggers `importlib.reload`. Operators carrying the bad state need a one-time restart; the fix prevents recurrence for any future `app.*` addition.

`request_restart` previously wrote a single shared `.koan-restart` marker that both the bridge (`awake.py`) and the runner (`run.py`) polled. The runner's wrapper-restart cycle is fast: after `request_restart`, the runner exits with code 42, its wrapper relaunches it, and the fresh runner's startup wipe (`run.py:785`) deletes the marker within ~1 s. The bridge polls every `POLL_INTERVAL` (≈3 s) at `awake.py:778`, so the marker was usually gone before the bridge's next tick, and the bridge never re-execed. Observable failure: after `/update`, the runner pulled the new code and the runner process did restart, but the bridge kept its pre-update Python interpreter alive. The bridge's `sys.modules['app.utils']` remained the stale copy, so `/list` raised `cannot import name 'PROJECT_NAME_CHARS' from 'app.utils'` even after the upstream stale-module fix (`75bf3e2`) had landed on disk — the detector itself was in the stale `sys.modules`. Give each consumer its own marker file (`.koan-restart-bridge`, `.koan-restart-run`). `request_restart` writes both, plus the legacy `.koan-restart` for backward compatibility so a pre-upgrade bridge still polling the old name can pick up the first post-upgrade request and re-exec into the new code. `check_restart` and `clear_restart` take an optional `target=` ("bridge" / "run" / None); each consumer only clears its own marker, so the runner's startup wipe can no longer silence the bridge's signal. Unknown targets raise `ValueError` to catch typos at call sites. Bridge calls in `awake.py` (L754, L755, L778) pass `target="bridge"`. Runner calls in `run.py` (L734, L785, L854, L856) pass `target="run"`. The pause-loop check at L734 keeps its existing no-`since` semantics because the startup clear at L785 already handles stale markers from a previous incarnation. Tests: - `TestPerProcessRestartMarkers` in `test_restart_manager.py` adds 5 cases including a direct race regression (`test_runner_wrapper_restart_does_not_silence_bridge`). - `test_uses_atomic_write` now asserts all three markers are written. - `test_run.py` and `test_restart.py` updated to expect the new `target=` kwarg and the per-process marker filenames.

`_ensure_providers_loaded` short-circuited the moment `_providers` was non-empty: def _ensure_providers_loaded(): if _providers: return for module_name in _PROVIDER_MODULES: __import__(module_name) The intent ("skip the loop on repeat calls") was fine; the sentinel was wrong. `_providers` becomes non-empty as a side-effect of *any* code importing one of the provider modules — including the default startup path where the bridge imports `app.messaging.telegram` before anything asks for the full provider list. Once that happens, the very first `_ensure_providers_loaded` call hits the early return, the `matrix` / `slack` imports never run, and `_create_provider("matrix")` SystemExits with `Unknown messaging provider: 'matrix'` even though the module ships on disk. Symptom on CI: `test_matrix_provider.py::TestRegistry::test_matrix_registered` flapped under pytest-xdist depending on whether a sibling test (e.g. `test_telegram_provider.py`'s module-level `from app.messaging.telegram import TelegramProvider`) had already populated `_providers` in the worker before the matrix test ran. Fix: track loop-completion explicitly with a `_modules_loaded` flag rather than inferring it from `bool(_providers)`. Same short-circuit on repeat calls, correct semantics on the first one. Tests: - `TestEnsureProvidersLoaded.test_loads_matrix_when_telegram_imported_first` reproduces the production scenario in a fresh subprocess. Verified the test fails without the fix: AssertionError: missing providers after load: {'matrix', 'slack'} - `test_idempotent_when_called_repeatedly` (also subprocess) guards against future regressions where repeat calls might drift. - `test_matrix_provider.py::TestRegistry::test_matrix_registered` switched to the same subprocess pattern so it no longer flaps based on cross-file test ordering: `_providers` is process-wide module state and the `clean_registry` fixture in `test_messaging_provider.py` can leave the in-process registry out of sync with `sys.modules`, which no in-process call to `_ensure_providers_loaded` can recover from (cached modules don't re-run their `@register_provider` decorators). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace ❌ with 🚦 in the CI-failure outbox notification to avoid confusion with system-level errors. The traffic light emoji signals "CI gate not passing" without implying a crash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The /review skill never surfaced PR mergeable status, so reviewers were told nothing when a PR couldn't be merged due to conflicts (e.g. aio-libs/yarl#1638 — clean review, but the author had no idea conflicts were blocking merge). Fetch the mergeable field in fetch_pr_context and, when it equals "CONFLICTING", prepend a prominent warning block to the posted review body asking the author to rebase and resolve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pre-existing SIM105 lint error blocking make lint. Equivalent semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bluetoothbot and others added 30 commits May 16, 2026 22:49

fix: redact PVRS fallback issues to prevent leaking vulnerability det…

04476fc

…ails publicly

fix: address review — remove double-fetch, fix fragile rebase tests

3b39ba7

fix: resolve CI failures on Anantys-oss#1334 (attempt 1)

97fcac3

docs(matrix): advertise instance/config.yaml as primary, env vars as …

2e3fac7

…legacy

refactor(cli_exec): extract stream_with_timeout helper and fix watchd…

d5ff6d8

…og race

refactor(cli_exec): use contextlib.suppress in stream_with_timeout cl…

b21a5ab

…eanup

test: address review feedback on pytest-xdist isolation fixtures

a87abf7

test: mock resolve_pr_location in conftest to fix xdist gh flakes

dbdca5e

fix(missions): handle duplicate return value in all skill handlers

6f6a4d4

refactor(missions): extract queue_github_mission_once helper to reduc…

50e9a89

…e duplicate handling boilerplate

chore(cli_exec): collapse nested try/except into contextlib.suppress

3f5a2f5

Pre-existing SIM105 lint error blocking make lint. Equivalent semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(review): flag merge conflicts in posted review body#1

feat(review): flag merge conflicts in posted review body#1
aiolibsbot wants to merge 31 commits into
mainfrom
koan/review-flag-merge-conflicts

aiolibsbot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

aiolibsbot commented May 17, 2026

What

Why

How

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants