feat(review): flag merge conflicts in posted review body#1
Draft
aiolibsbot wants to merge 31 commits into
Draft
Conversation
Adds ruff lint config with the PERF ruleset, excludes tests/ from PERF checks (fixture loops add little perf value), and rewrites the production + skill violations: list append in for-loop → list.extend / comprehension, dict iterator key→value, dict-build loop → dict comprehension, slice copy → list.copy. All 12633 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`gh run list --branch X --limit 1` was returning the "Dependabot auto-merge" workflow (conclusion=skipped) on non-Dependabot PRs, which check_ci_status, wait_for_ci, and check_existing_ci all treated as a failure. drain_one then kept injecting /ci_check fix missions against a PR whose real CI was green — see aio-libs/yarl#1681 for the field report. Add aggregate_ci_runs(runs) and fetch_branch_ci_runs(branch, repo) helpers in claude_step. The aggregator filters out skipped, cancelled, neutral, and action_required conclusions, then reduces the remaining runs to (status, run_id) with failure > pending > success priority. All three CI status callers now share the same filtering, fetching up to 20 runs per branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When GitHub's Private Vulnerability Reporting is enabled on a target repo, /security_audit now submits critical/high findings as private security advisories instead of public issues. Lower-severity findings remain public. Configurable per-project via projects.yaml security section (pvrs mode + threshold). Graceful fallback to public issues on any PVRS API failure. Implements: Anantys-oss#1341 Plan: PR Anantys-oss#1269 (docs/plan-pvrs-security-audit.md) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ensure all remote tracking refs for the base branch are fresh before any mission work begins — both new branches and rebases. Two complementary changes: - git_prep: after primary sync, fetch base from all secondary remotes - claude_step: pre-fetch all relevant remotes before the rebase loop Addresses review feedback on PR Anantys-oss#1333 about keeping main up to date regardless of its name before working on branches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a Linting section to CLAUDE.md documenting ruff as the project's linter, and add a `make lint` Makefile target so contributors can check compliance before pushing. Addresses review feedback on Anantys-oss#1345. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a third messaging provider option (KOAN_MESSAGING_PROVIDER=matrix)
implemented as a thin synchronous wrapper around the Matrix Client-Server
HTTP API. No new Python dependencies — uses the existing requests
package, mirroring the Telegram provider's style.
What:
- app/messaging/matrix.py — MatrixProvider with send/poll/typing using
PUT /_matrix/client/v3/rooms/{roomId}/send/m.room.message/{txnId}
and long-polling GET /_matrix/client/v3/sync.
- Auto-registered via _PROVIDER_MODULES in app/messaging/__init__.py.
- Filters: only m.room.message events with msgtype=m.text, ignores
the bot's own user_id and events from rooms other than the configured one.
- Initial /sync discards historical events; only the next_batch cursor
is kept, so the bot doesn't replay history on startup.
- 5xx errors are retried via retry_with_backoff; 4xx fail fast.
- Onboarding wizard gains a Matrix option that writes the four env vars.
- docs/messaging-matrix.md walks through setup (account, access token,
room ID); README, INSTALL.md, and cross-doc links updated.
- 29 unit tests cover config validation, send chunking, URL encoding,
sync cursor handling, message filtering, and the typing indicator.
Why:
The codebase already had a clean MessagingProvider abstraction with
Telegram and Slack implementations — Matrix slots in without touching
any of the bridge code, giving self-hosted / federated users a third
option without reaching for slack-sdk or a Telegram bot account.
Note: end-to-end encryption (Olm/Megolm) is intentionally out of scope.
Operators should use unencrypted rooms.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ppress Add SIM105 to the enforced ruff rule set and refactor all 64 existing violations to use `contextlib.suppress(...)`. This catches the try/except/pass pattern automatically so future PRs cannot reintroduce it. Tests untouched by intent — only the suppression idiom changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…trying When a fork PR is opened by a first-time contributor, GitHub Actions gates every workflow run on a maintainer "Approve and run" click. The gh API exposes these runs with status="action_required" (or "waiting" for environment-protected jobs) and no conclusion. Until this change, aggregate_ci_runs() classified them as "pending" and drain_one() left the PR in ## CI indefinitely — every iteration polled again, and retried the /ci_check fix loop on stagnation. See aio-libs/aiohttp#12553 where Kōan retried multiple times against runs that physically could not start until a human clicked Approve. Now aggregate_ci_runs() returns a new "blocked_approval" status when any run is gated this way (failure still wins so a genuinely broken workflow on the same push gets surfaced). drain_one() removes the PR from ## CI and notifies the outbox so the operator can either approve in the GitHub UI or ping the maintainer. run_ci_check_and_fix() and rebase_pr._run_ci_check_and_fix() exit early without attempting fixes when CI is blocked, since pushing more commits triggers the same gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skill runners (rebase_pr, recreate_pr) invoke Claude via run_claude(), which used subprocess.run with capture_output=True. That buffers all stdout until the subprocess exits, so the run.py liveness watchdog (600s no-output kill) fires whenever a single Claude call exceeds the threshold — observed on the aioesphomeapi #1660 rebase. A secondary failure mode: subprocess.run timeout sends SIGKILL to the direct child but not grandchildren. When Claude has spawned helper subprocesses that inherit stdout, communicate()'s post-kill drain can block indefinitely, so the inner 120s/300s/600s timeouts never fire. Switch run_claude to popen_cli with start_new_session=True and stream stdout line-by-line through sys.stdout. Every emitted line resets the parent watchdog and surfaces real-time progress to /live. A threading Timer-backed watchdog kills the entire process group via os.killpg on timeout, breaking the grandchild-pipe hang. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three file writes used non-atomic operations (write_text / json.dump), risking corruption if the process is killed mid-write: - quarantine cap (_enforce_quarantine_cap): truncated quarantine could lose entries on crash during write - outbox staging (flush): partial staging file causes garbled Telegram messages on recovery - stagnation retry tracker: corrupted JSON resets all retry counters All three now use atomic_write / atomic_write_json (temp + fsync + rename). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace raw "blocked_approval" string literals across ci_queue_runner, claude_step, and rebase_pr with a single exported constant to prevent typos and improve grep-ability. Closes review comment from PR Anantys-oss#1351. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Five pre-existing isolation bugs surfaced when running the suite under pytest-xdist. Each failure was a real test pollution issue that also showed up in stand-alone subset runs — xdist just made it routine. - conftest: give every xdist worker its own KOAN_ROOT directory. `app.utils` snapshots KOAN_ROOT at import time, so workers that inherit the same env race on shared files under that path. - test_github_command_handler / test_github_subscribe: autouse-mock `_is_subject_closed`, which otherwise hits the real GitHub API (network-flaky in serial, parallel-unsafe under xdist). - test_mission_retry: autouse-reset the `_last_mission_*` flags on `app.run`. Other tests leave them set; xdist surfaced the leak. - test_skill_dispatch: patch parse_pr_url on the handler module's local binding instead of on `app.github_url_parser`. The handler does `from app.github_url_parser import parse_pr_url` at module load, so patching the source after that import leaks a stale binding into later tests that import the same handler. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The full suite takes ~3m38s serial. Adding pytest-xdist drops it to ~1m58s on 2 workers (1.85x) and ~1m08s with -n auto (3.2x) on this box. Makefile picks the worker count based on environment: - CI / GitHub Actions -> -n auto - Remote SSH session -> -n 2 (be polite on shared hosts) - Local terminal -> -n auto Override via `make test PYTEST_WORKERS=N` (PYTEST_WORKERS=0 disables xdist). All commands use `--dist loadfile` so each test file lands on one worker; that keeps file-internal collection order intact and avoids surprises from class-level state. CI (.github/workflows/tests.yml) installs pytest-xdist and passes `-n auto --dist loadfile` to every pytest invocation.
…eshold Adds two behavioral tests to TestCompactLearnings confirming that the git subprocess (_get_file_tree) is never spawned when compaction is skipped due to the line-count threshold or hash-unchanged guard. Closes Anantys-oss#1322 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a /rebase, /review, /recreate, /squash, /ci_check, /fix, /check, or /gh_request mission is queued for a URL that already has an identical command pending or in progress, the insert is silently skipped and the user is notified with a warning message. This prevents the queue from filling with redundant missions when the same action is triggered from multiple sources (Telegram, GitHub notifications, check_runner). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e duplicate handling boilerplate
The stale-module detector in `_refresh_stale_app_modules` only reloaded
`app.*` modules whose mtime had changed since a previously cached value.
On the very first observation of a module the function silently recorded
the current mtime without reloading, on the assumption that the first
encounter happens during a fresh boot when in-memory and on-disk content
already agree.
That assumption breaks when auto-update (or a manual `git pull`) rewrites
a file BEFORE the first post-update skill invocation. The new file mtime
is then recorded as the baseline while `sys.modules` still holds the
pre-update module, and no later refresh ever notices the divergence.
Reproduction: a Kōan process booted before `5742248`
(`refactor(missions): consolidate project-tag regexes`) pulled the
commit, then received `/list`. The handler does
`from app.utils import PROJECT_NAME_CHARS` and crashed with:
Skill error (core.list): cannot import name 'PROJECT_NAME_CHARS'
from 'app.utils' (.../koan/app/utils.py)
Because the trap is "any new symbol added to any `app.*` module after
the process started", every future addition to `app.utils` (or any
sibling) would re-trigger the same failure on first use after update.
Fix: capture `_PROCESS_START_TIME = time.time()` when `app.skills` is
imported, and treat a first-time observation as stale whenever the
source file's mtime is greater than that timestamp. Clean boots stay
zero-cost (file mtime ≤ process start ⇒ skip the reload, just record
the baseline) while post-update first observations now reload as
intended.
Tests: update the existing "first encounter, no reload" case to stub
`_PROCESS_START_TIME` past the fake file's mtime so it keeps exercising
the old-file branch, and add a new regression test asserting that a
first observation of a file newer than process start triggers
`importlib.reload`. Operators carrying the bad state need a one-time
restart; the fix prevents recurrence for any future `app.*` addition.
`request_restart` previously wrote a single shared `.koan-restart`
marker that both the bridge (`awake.py`) and the runner (`run.py`)
polled. The runner's wrapper-restart cycle is fast: after
`request_restart`, the runner exits with code 42, its wrapper relaunches
it, and the fresh runner's startup wipe (`run.py:785`) deletes the
marker within ~1 s. The bridge polls every `POLL_INTERVAL` (≈3 s) at
`awake.py:778`, so the marker was usually gone before the bridge's next
tick, and the bridge never re-execed.
Observable failure: after `/update`, the runner pulled the new code and
the runner process did restart, but the bridge kept its pre-update
Python interpreter alive. The bridge's `sys.modules['app.utils']`
remained the stale copy, so `/list` raised
`cannot import name 'PROJECT_NAME_CHARS' from 'app.utils'` even after
the upstream stale-module fix (`75bf3e2`) had landed on disk — the
detector itself was in the stale `sys.modules`.
Give each consumer its own marker file (`.koan-restart-bridge`,
`.koan-restart-run`). `request_restart` writes both, plus the legacy
`.koan-restart` for backward compatibility so a pre-upgrade bridge
still polling the old name can pick up the first post-upgrade request
and re-exec into the new code. `check_restart` and `clear_restart`
take an optional `target=` ("bridge" / "run" / None); each consumer
only clears its own marker, so the runner's startup wipe can no longer
silence the bridge's signal. Unknown targets raise `ValueError` to
catch typos at call sites.
Bridge calls in `awake.py` (L754, L755, L778) pass `target="bridge"`.
Runner calls in `run.py` (L734, L785, L854, L856) pass `target="run"`.
The pause-loop check at L734 keeps its existing no-`since` semantics
because the startup clear at L785 already handles stale markers from
a previous incarnation.
Tests:
- `TestPerProcessRestartMarkers` in `test_restart_manager.py` adds 5
cases including a direct race regression
(`test_runner_wrapper_restart_does_not_silence_bridge`).
- `test_uses_atomic_write` now asserts all three markers are written.
- `test_run.py` and `test_restart.py` updated to expect the new
`target=` kwarg and the per-process marker filenames.
`_ensure_providers_loaded` short-circuited the moment `_providers` was
non-empty:
def _ensure_providers_loaded():
if _providers:
return
for module_name in _PROVIDER_MODULES:
__import__(module_name)
The intent ("skip the loop on repeat calls") was fine; the sentinel was
wrong. `_providers` becomes non-empty as a side-effect of *any* code
importing one of the provider modules — including the default startup
path where the bridge imports `app.messaging.telegram` before anything
asks for the full provider list. Once that happens, the very first
`_ensure_providers_loaded` call hits the early return, the `matrix` /
`slack` imports never run, and `_create_provider("matrix")` SystemExits
with `Unknown messaging provider: 'matrix'` even though the module
ships on disk.
Symptom on CI: `test_matrix_provider.py::TestRegistry::test_matrix_registered`
flapped under pytest-xdist depending on whether a sibling test (e.g.
`test_telegram_provider.py`'s module-level `from app.messaging.telegram
import TelegramProvider`) had already populated `_providers` in the
worker before the matrix test ran.
Fix: track loop-completion explicitly with a `_modules_loaded` flag
rather than inferring it from `bool(_providers)`. Same short-circuit
on repeat calls, correct semantics on the first one.
Tests:
- `TestEnsureProvidersLoaded.test_loads_matrix_when_telegram_imported_first`
reproduces the production scenario in a fresh subprocess. Verified
the test fails without the fix:
AssertionError: missing providers after load: {'matrix', 'slack'}
- `test_idempotent_when_called_repeatedly` (also subprocess) guards
against future regressions where repeat calls might drift.
- `test_matrix_provider.py::TestRegistry::test_matrix_registered`
switched to the same subprocess pattern so it no longer flaps
based on cross-file test ordering: `_providers` is process-wide
module state and the `clean_registry` fixture in
`test_messaging_provider.py` can leave the in-process registry out
of sync with `sys.modules`, which no in-process call to
`_ensure_providers_loaded` can recover from (cached modules don't
re-run their `@register_provider` decorators).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace ❌ with 🚦 in the CI-failure outbox notification to avoid confusion with system-level errors. The traffic light emoji signals "CI gate not passing" without implying a crash. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The /review skill never surfaced PR mergeable status, so reviewers were told nothing when a PR couldn't be merged due to conflicts (e.g. aio-libs/yarl#1638 — clean review, but the author had no idea conflicts were blocking merge). Fetch the mergeable field in fetch_pr_context and, when it equals "CONFLICTING", prepend a prominent warning block to the posted review body asking the author to rebase and resolve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-existing SIM105 lint error blocking make lint. Equivalent semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Make the
/reviewskill surface PR merge-conflict state in the posted review comment.Why
The review skill never read the PR's
mergeablefield, so reviewers got a clean technical critique with no mention of conflicts. Real-world hit: aio-libs/yarl#1638 was inCONFLICTINGstate when reviewed — the bot stayed silent about it, leaving the author to discover the merge block on their own.How
fetch_pr_context()now requestsmergeablefromgh pr view --json …and surfaces it ascontext["mergeable"](defaults to"UNKNOWN"for compatibility).run_review()prepends a> ⚠️ **Merge conflicts detected** …blockquote to the finalreview_bodywhenevermergeable == "CONFLICTING", before posting. The notice references the base branch by name so the author knows what to rebase against.Testing
test_review_runner.py::TestRunReview:test_conflict_notice_prepended_for_conflicting_pr— verifies the warning is inserted before## PR Reviewwhenmergeable=CONFLICTING.test_no_conflict_notice_for_mergeable_pr— no warning for healthy PRs.test_rebase_pr.py::TestFetchPrContext::test_extracts_mergeable_statusplus default-UNKNOWNassertion.make lintclean (also collapsed a pre-existing SIM105 incli_exec.py).make test: 12 780 pass; one pre-existing flake (test_git_sync.py::TestGitSyncCLI::test_cli_runs_sync) — passes in isolation.🤖 Generated with Claude Code