⚡ Bolt: Use ThreadPoolExecutor for GitHub PR lookups#74
Conversation
Replaces sequential subprocess calls to `gh pr view` with concurrent execution using `concurrent.futures.ThreadPoolExecutor` when checking the status of multiple PRs. This addresses a significant performance bottleneck during the fan-out and respawn loops. Order is preserved using `executor.map`. Added missing imports and updated `.jules/bolt.md` with performance learnings. Co-authored-by: xbmc4lyfe <273732874+xbmc4lyfe@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR replaces sequential GitHub PR state checks with concurrent ChangesConcurrent GitHub PR State Checks
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 87500ca154
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ## 2026-05-31 - Use ThreadPoolExecutor for GitHub PR lookups | ||
| **Learning:** Checking states for multiple PRs using sequential `gh` subprocess calls is a significant performance bottleneck. | ||
| **Action:** Use `concurrent.futures.ThreadPoolExecutor` to perform PR state checks concurrently, preserving order with `executor.map`. |
There was a problem hiding this comment.
Drop generated Jules note from the commit
AGENTS.md says, “Do not add assistant-generated boilerplate to commits or docs.” This new .jules/bolt.md entry is bot-generated learning metadata and is not used by Ralph at runtime, so committing it adds repository noise unrelated to the ThreadPoolExecutor behavior change; please remove it from the patch or keep it as untracked local notes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
ralph_loop/cli.py (1)
768-772: ⚡ Quick winDeduplicate the open-state check helper.
_check_respawn_pr(Lines 768-772) is byte-for-byte identical to_check_pr(Lines 550-554) and is redefined on every supervisor-loop iteration. Hoist a single helper to module scope and reuse it at both call sites. This also silences the Ruff B023 hint at Line 772, which is a false positive here (pris a parameter andexcis local, so no loop variable is captured).♻️ Proposed shared helper
+def _check_pr_open_state(pr: int) -> Tuple[int, bool, Optional[CommandError]]: + try: + return pr, _pr_is_still_open(pr), None + except CommandError as exc: + return pr, True, exc + + def _filter_to_still_open_prs(pr_numbers: List[int]) -> List[int]:Then drop both inner definitions and pass
_check_pr_open_stateto the twoexecutor.map(...)calls.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ralph_loop/cli.py` around lines 768 - 772, Both _check_respawn_pr and _check_pr are identical and redefined inside loops; hoist a single module-level helper (e.g. _check_pr_open_state) that takes (pr: int) -> Tuple[int, bool, Optional[CommandError]] and calls _pr_is_still_open(pr) inside a try/except catching CommandError, returning (pr, bool, None) or (pr, True, exc). Remove the inner definitions of _check_respawn_pr and _check_pr and pass the new module-level _check_pr_open_state into the two executor.map(...) calls that previously used those inner helpers so both call sites reuse the same function and the Ruff B023 warning is eliminated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ralph_loop/cli.py`:
- Around line 550-557: The ThreadPoolExecutor() calls currently use the default
unbounded-ish worker count which can spawn ~32 concurrent gh subprocesses and
cause GitHub throttling; change both places where ThreadPoolExecutor() is
constructed (the one used with _check_pr mapping over pr_numbers and the second
one used in the respawn loop over prs_to_check) to specify a reasonable
max_workers cap (e.g., 6–8) so the executor.map calls for _check_pr run with
limited concurrency and reduce subprocess fan-out and rate-limit noise while
preserving the existing fallback behavior on CommandError.
---
Nitpick comments:
In `@ralph_loop/cli.py`:
- Around line 768-772: Both _check_respawn_pr and _check_pr are identical and
redefined inside loops; hoist a single module-level helper (e.g.
_check_pr_open_state) that takes (pr: int) -> Tuple[int, bool,
Optional[CommandError]] and calls _pr_is_still_open(pr) inside a try/except
catching CommandError, returning (pr, bool, None) or (pr, True, exc). Remove the
inner definitions of _check_respawn_pr and _check_pr and pass the new
module-level _check_pr_open_state into the two executor.map(...) calls that
previously used those inner helpers so both call sites reuse the same function
and the Ruff B023 warning is eliminated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 34a1cbb9-90e5-4c39-9693-e5ec68c1d4ae
📒 Files selected for processing (2)
.jules/bolt.mdralph_loop/cli.py
📜 Review details
🧰 Additional context used
🪛 Ruff (0.15.14)
ralph_loop/cli.py
[warning] 772-772: Function definition does not bind loop variable exc
(B023)
🔇 Additional comments (2)
ralph_loop/cli.py (1)
4-6: LGTM!.jules/bolt.md (1)
1-3: LGTM!
| def _check_pr(pr: int) -> Tuple[int, bool, Optional[CommandError]]: | ||
| try: | ||
| still_open = _pr_is_still_open(pr) | ||
| return pr, _pr_is_still_open(pr), None | ||
| except CommandError as exc: | ||
| return pr, True, exc | ||
|
|
||
| with concurrent.futures.ThreadPoolExecutor() as executor: | ||
| results = executor.map(_check_pr, pr_numbers) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the default max_workers value for concurrent.futures.ThreadPoolExecutor in Python 3.8+?
💡 Result:
For Python 3.8+, the default value of ThreadPoolExecutor’s max_workers (when max_workers=None / omitted) is: min(32, (os.cpu_count or 1) + 4) [1][2].
Citations:
- 1: https://docs.python.org/3/library/concurrent.futures.html
- 2: https://stackoverflow.com/questions/56195679/why-is-threadpoolexecutors-default-max-workers-decided-based-on-the-number-of-c
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the exact relevant code around the cited lines.
sed -n '520,620p' ralph_loop/cli.py | nl -ba | sed -n '1,140p'
# Show the "respawn site" around the claimed line 784 (with some surrounding context).
sed -n '740,830p' ralph_loop/cli.py | nl -ba | sed -n '1,200p'
# Find Python version / supported versions in common project config files.
ls -la
fd -t f 'pyproject.toml' . || true
fd -t f 'setup.cfg' . || true
fd -t f 'tox.ini' . || true
fd -t f 'requirements*.txt' . || true
fd -t f '.github/workflows/*.yml' . || true
fd -t f '.github/workflows/*.yaml' . || trueRepository: Appz4Fun/ralph-loop-close-prs
Length of output: 116
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Relevant code around the snippet in ralph_loop/cli.py
sed -n '520,620p' ralph_loop/cli.py | cat -n
# "Respawn site" area around the claimed line ~784
sed -n '740,830p' ralph_loop/cli.py | cat -n
# Check Python version config
ls -la
fd -t f 'pyproject.toml' . || true
fd -t f 'setup.cfg' . || true
fd -t f 'tox.ini' . || true
fd -t f 'requirements*.txt' . || true
fd -t f '.github/workflows/*.yml' . || true
fd -t f '.github/workflows/*.yaml' . || trueRepository: Appz4Fun/ralph-loop-close-prs
Length of output: 10127
Cap ThreadPoolExecutor concurrency to reduce GitHub throttling / subprocess fan-out
Leaving ThreadPoolExecutor() without max_workers defaults to min(32, (os.cpu_count or 1) + 4), which can spawn up to ~32 concurrent gh pr view subprocesses—both for the initial fan-out PR filtering and again in the respawn loop (prs_to_check), increasing throttling/rate-limit risk and log noise. The “keep PR on CommandError” fallback preserves correctness, but concurrency capping should apply to both call sites.
♻️ Proposed cap
- with concurrent.futures.ThreadPoolExecutor() as executor:
+ with concurrent.futures.ThreadPoolExecutor(
+ max_workers=min(8, len(pr_numbers))
+ ) as executor:
results = executor.map(_check_pr, pr_numbers)Also cap the second ThreadPoolExecutor() used when checking prs_to_check in the respawn section.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ralph_loop/cli.py` around lines 550 - 557, The ThreadPoolExecutor() calls
currently use the default unbounded-ish worker count which can spawn ~32
concurrent gh subprocesses and cause GitHub throttling; change both places where
ThreadPoolExecutor() is constructed (the one used with _check_pr mapping over
pr_numbers and the second one used in the respawn loop over prs_to_check) to
specify a reasonable max_workers cap (e.g., 6–8) so the executor.map calls for
_check_pr run with limited concurrency and reduce subprocess fan-out and
rate-limit noise while preserving the existing fallback behavior on
CommandError.
💡 What: Replaced sequential blocking
ghsubprocess calls with parallel execution usingconcurrent.futures.ThreadPoolExecutorin the_filter_to_still_open_prsand the_fan_out_all_prsrespawn loops.🎯 Why: Running sequential subprocess calls to check the status of multiple PRs represents a significant N+1 performance bottleneck.
📊 Impact: Considerably reduces execution time when checking PR status, providing a significant boost especially during periods where many open PRs need evaluating simultaneously.
🔬 Measurement: Compare total execution time of
_filter_to_still_open_prsor the_fan_out_childrenrespawn loop before and after the change on a test account with many PRs. Execution time will be bound by the slowestghlookup rather than the sum of all lookups.PR created automatically by Jules for task 16881484000797763401 started by @xbmc4lyfe