feat(bench): add opt-in LoCoMo cat5 judge by jack-arturo · Pull Request #119 · verygoodplugins/automem

jack-arturo · 2026-03-10T14:54:13Z

Summary

add an opt-in LoCoMo category-5 judge using BENCH_JUDGE_MODEL, --judge, and --judge-model
preserve the default benchmark behavior with category 5 skipped unless judge mode is enabled
record corrected mini/full baselines and tone down benchmark claims to reflect the split methodology

Implementation

add evidence-grounded cat-5 judging inside LoCoMoEvaluator with operation-scoped LLM caching
preserve deterministic scoring for the two category-5 rows that already have canonical answers
include judge metadata and per-question judge output in LoCoMo result JSON
add a 90s OpenAI request timeout so full judge-enabled runs fail fast instead of hanging indefinitely
add focused cat-5 judge unit tests and update benchmark/docs copy to distinguish judge-off vs judge-on results

Verification

make test
- 185 passed, 1 skipped, 12 deselected
make bench-eval BENCH=locomo-mini CONFIG=baseline
- 89.27% (208/233)
- category 5: N/A (71 skipped)
BENCH_JUDGE_MODEL=gpt-4o make bench-eval BENCH=locomo-mini CONFIG=baseline
- 89.80% (273/304)
- category 5: 91.55% (65/71)
BENCH_JUDGE_MODEL=gpt-4o make bench-eval BENCH=locomo CONFIG=baseline
- 87.56% (1739/1986)
- category 5: 95.74% (427/446)
- 0 judge skips/errors

Notes

this PR intentionally does not change AutoMem recall/ranking behavior; remaining misses should be handled in follow-on branches for clean benchmark comparison
benchmark messaging now treats CORE 88.24% as a published reference point rather than a strict apples-to-apples leaderboard claim

coderabbitai · 2026-03-10T14:54:34Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- LLM judge option for Complex Reasoning (cat‑5) with CLI flags to enable and select judge model; cat‑5 can be skipped or judged.
Documentation
- Updated README, docs, and benchmarks to publish judge-off vs judge-on baselines, category breakdowns, history notes, methodology, and judge cost indications.
Chores
- Runner scripts prefer virtualenv Python and propagate judge options.
Tests
- Added comprehensive cat‑5 judge tests covering caching, timeouts, failures, and determinism.

Walkthrough

Adds an optional LLM-based judge for LoCoMo Category‑5, CLI/config flags to enable it, local and LLM-level caching, OpenAI request timeout and JSON parsing helpers, extensive Cat‑5 tests, and documentation/benchmark updates distinguishing judge-on vs judge-off baselines and cost notes.

Changes

Cohort / File(s)	Summary
Documentation & Benchmark Metadata `AGENTS.md`, `README.md`, `benchmarks/EXPERIMENT_LOG.md`, `docs/TESTING.md`, `tests/benchmarks/BENCHMARK_*.md`	Update benchmark tables and README to show judge-off vs judge-on baselines and cost notes; add historical notes about evaluator fixes; document Cat‑5 judge usage, env var (`BENCH_JUDGE_MODEL`) and CLI flags; adjust example outputs and methodology pointers.
Shell Scripts / CLI Wrapper `scripts/bench/restore_and_eval.sh`, `test-locomo-benchmark.sh`	Add virtualenv Python detection with fallback to `python3`; replace hardcoded invocations; extend test runner wrapper to accept `--judge` and `--judge-model` flags and propagate them to the Python benchmark runner.
Benchmark Implementation (LoCoMo Evaluator) `tests/benchmarks/test_locomo.py`	Introduce `judge_model` config, OpenAI timeout constant, local conversation memory cache, LLM cache-key utilities, memory formatting and JSON parsing utilities, judge_complex_reasoning flow, QA evaluation helpers, and wiring to CLI/config. Adjust evaluation loops and reporting for cat‑5 judge behavior (skip vs judge).
Judge Tests `tests/test_locomo_cat5_judge.py`	Add comprehensive tests for Cat‑5 judge behavior covering no-judge skipping, judge scoring, local cache usage, caching effectiveness, timeout propagation, failure/abstention handling, prompt sizing, and deterministic cases using a mocked OpenAI client.

Sequence Diagram

sequenceDiagram
    actor User
    participant Runner as LoCoMo Runner
    participant Eval as LoCoMo Evaluator
    participant Cache as Local Memory Cache
    participant OpenAI as OpenAI API

    User->>Runner: run_benchmark(--judge / --judge-model)
    Runner->>Eval: init(config with judge_model)
    Runner->>Eval: evaluate_conversation()
    loop per Cat-5 question
        Eval->>Cache: lookup prepared memories (sample_id)
        alt cache hit
            Cache-->>Eval: return cached memories
        else cache miss
            Eval->>Eval: fetch_evidence_memories(evidence_ids, use_local_cache=false)
            Eval->>Cache: _cache_prepared_memories(sample_id, memories)
            Cache-->>Eval: cached
        end
        Eval->>Eval: _format_memories_for_llm(memories)
        Eval->>OpenAI: judge_complex_reasoning(prompt, timeout=OPENAI_REQUEST_TIMEOUT_SECONDS)
        OpenAI-->>Eval: JSON verdict (verdict, confidence, answer)
        Eval->>Eval: _parse_json_object_response()
        Eval->>Eval: score and record QA result
    end
    Eval-->>Runner: aggregated results (per-category, cat-5 judge state)
    Runner-->>User: benchmark report

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(bench): benchmark testing infrastructure for rapid iteration #97 — Overlaps LoCoMo benchmark runner and CLI flag changes; relevant to test-locomo-benchmark.sh and scripts/bench/restore_and_eval.sh edits.
fix(benchmarks): correct evaluator bugs, add agent guidelines, establish honest baseline #117 — Related to LoCoMo evaluator fixes (temporal matching / cat‑5 skipping) and likely touches similar evaluator logic.
feat: Multi-hop recall with graph expansion and multi-query support #8 — Related recall/evaluation changes (multi-query/recall API) that intersect memory retrieval and evaluation logic used by the new judge flow.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main feature: adding an opt-in LoCoMo category-5 judge, which is the primary change across all modified files.
Description check	✅ Passed	The description is directly related to the changeset, covering the opt-in judge implementation, benchmark baseline updates, verification steps, and implementation details that align with the actual file modifications.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/bench-cat5-judge

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/benchmarks/test_locomo.py (1)
1795-1807: ⚠️ Potential issue | 🟡 Minor

Differentiate judge-off from judge-failed in the cat-5 summary.

When judge mode is enabled but every cat-5 row is skipped (for example, missing API key or judge errors), this branch still prints needs LLM judge. That makes a broken judge run look the same as judge-off mode.
🛠️ Proposed fix
-                print(f"  {cat5_name:25s}:    N/A ({cat5_skipped:3d} skipped, needs LLM judge)")
+                reason = (
+                    "judge unavailable/errors"
+                    if self.config.judge_model
+                    else "needs LLM judge"
+                )
+                print(f"  {cat5_name:25s}:    N/A ({cat5_skipped:3d} skipped, {reason})")

🧹 Nitpick comments (1)

tests/test_locomo_cat5_judge.py (1)
9-21: Prefer importing a shared helper over executing another test file.

Lines 9-21 create a second copy of tests/benchmarks/test_locomo.py with its own module-level state. That gets brittle once the benchmark module grows caches, fixtures, or import-time setup, because this suite will no longer be exercising the same module instance pytest collects. Moving LoCoMoConfig and LoCoMoEvaluator into a normal helper module would make these tests much less fragile; if you keep this pattern, at least register the module in sys.modules before exec_module.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_locomo_cat5_judge.py` around lines 9 - 21, The test currently
loads and executes tests/benchmarks/test_locomo.py via _load_locomo_module,
creating a distinct module instance and duplicating module-level state; fix by
either (preferred) refactoring LoCoMoConfig and LoCoMoEvaluator into a shared
helper module and importing that helper in both places, or (if keeping
exec_module) register the spec.name in sys.modules before calling
spec.loader.exec_module(module) (use the same spec name used in
spec_from_file_location) so pytest and this test share the same module object;
update _load_locomo_module accordingly to avoid divergent module state.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/TESTING.md`:
- Around line 232-239: Update the sentence describing how the judge is enabled
to state that both --judge and --judge-model enable category-5 judging;
specifically, change the line that currently says "`--judge` enables the judge
and defaults to `gpt-4o` unless `BENCH_JUDGE_MODEL` or `--judge-model` overrides
it`" to something like: "`--judge` and `--judge-model` both enable the judge;
`--judge` defaults to `gpt-4o` unless overridden by `BENCH_JUDGE_MODEL` or
`--judge-model`." Reference the flags `--judge`, `--judge-model` and the
environment variable `BENCH_JUDGE_MODEL` (and the benchmark test
`tests/benchmarks/test_locomo.py` if needed) when making the edit.

In `@README.md`:
- Around line 560-562: There is an extra blank line between the two blockquotes
("Methodology note:" and "History note:") triggering MD028; remove the blank
line so the two lines become a single continuous blockquote (or alternatively
convert both to normal paragraphs) to satisfy the linter while keeping the exact
texts of "Methodology note:" and "History note:" intact.
- Around line 535-540: The README currently documents the full baseline as
"locomo" with the judge enabled using "gpt-4o" but instructs readers to run just
"make bench-eval BENCH=locomo"; update that invocation to explicitly enable the
judge so the run matches the published baseline (for example, change the example
command to include the judge flag or env var that enables gpt-4o judging), and
mention that enabling the judge is required to include category 5 results;
reference the strings "make bench-eval BENCH=locomo", "locomo", "gpt-4o", and
"category 5" when making the change.

In `@tests/benchmarks/test_locomo.py`:
- Around line 1486-1488: The current guard unconditionally skips all category==5
rows when self.config.judge_model is false, hiding deterministic rows with
canonical answers; change the condition in the category==5 block so it only
returns (sets base_result["explanation"]="Skipped: requires LLM judge") when
judge mode is off AND the example lacks a canonical/gold answer (e.g., check
row.get("answer") or row.get("gold") is None/empty) so deterministic category-5
rows are allowed to proceed and be scored; update the if using the existing
symbols category, self.config.judge_model, and base_result to implement this
selective skip.

---

Nitpick comments:
In `@tests/test_locomo_cat5_judge.py`:
- Around line 9-21: The test currently loads and executes
tests/benchmarks/test_locomo.py via _load_locomo_module, creating a distinct
module instance and duplicating module-level state; fix by either (preferred)
refactoring LoCoMoConfig and LoCoMoEvaluator into a shared helper module and
importing that helper in both places, or (if keeping exec_module) register the
spec.name in sys.modules before calling spec.loader.exec_module(module) (use the
same spec name used in spec_from_file_location) so pytest and this test share
the same module object; update _load_locomo_module accordingly to avoid
divergent module state.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3330e165-dad3-4892-a1c1-b451c41237eb

📥 Commits

Reviewing files that changed from the base of the PR and between d9cbecb and da4086c.

📒 Files selected for processing (11)

AGENTS.md
README.md
benchmarks/EXPERIMENT_LOG.md
docs/TESTING.md
scripts/bench/restore_and_eval.sh
test-locomo-benchmark.sh
tests/benchmarks/BENCHMARK_2025-11-08.md
tests/benchmarks/BENCHMARK_2025-11-20.md
tests/benchmarks/BENCHMARK_2025-12-02.md
tests/benchmarks/test_locomo.py
tests/test_locomo_cat5_judge.py

- docs/TESTING.md: clarify that --judge-model also enables the judge - README.md: update benchmark command to include judge for full baseline - README.md: merge blockquotes to fix MD028 lint warning - test_locomo.py: allow deterministic cat-5 rows when judge is off - test_locomo.py: differentiate judge-off vs judge-failed in cat-5 summary - test_locomo_cat5_judge.py: register module in sys.modules to avoid duplicates

jack-arturo · 2026-03-10T15:15:00Z

Resolving CodeRabbit comment #2912430608

✅ Addressed and implemented

Note: Fixed in 48438ce — updated line 238 to clarify both --judge and --judge-model enable the judge

Resolved via CodeRabbit MCP

jack-arturo · 2026-03-10T15:15:03Z

Resolving CodeRabbit comment #2912430626

✅ Addressed and implemented

Note: Fixed in 48438ce — updated benchmark command to include BENCH_JUDGE_MODEL=gpt-4o and CONFIG=baseline for full run

Resolved via CodeRabbit MCP

jack-arturo · 2026-03-10T15:15:06Z

Resolving CodeRabbit comment #2912430632

✅ Addressed and implemented

Note: Fixed in 48438ce — merged blockquotes into single continuous blockquote to fix MD028

Resolved via CodeRabbit MCP

jack-arturo · 2026-03-10T15:15:09Z

Resolving CodeRabbit comment #2912430638

✅ Addressed and implemented

Note: Fixed in 48438ce — added not answer check so deterministic cat-5 rows are scored when judge is off; also differentiated judge-off vs judge-failed in summary output

Resolved via CodeRabbit MCP

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (1)

README.md (1)
560-562: ⚠️ Potential issue | 🟡 Minor

Remove the blank line inside this blockquote.

The standalone > on Line 561 still triggers MD028 (no-blanks-blockquote). Make these two notes one continuous blockquote or convert them to paragraphs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 560 - 562, The blockquote contains an empty blank
line between two quote paragraphs which triggers MD028; remove the solitary ">"
blank line so the two notes become one continuous blockquote (or alternatively
convert the two quoted lines into normal paragraphs), editing the quoted lines
starting with "**Methodology note:**" and "**History note:**" so there is no
blank ">" line between them.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/benchmarks/test_locomo.py`:
- Around line 1857-1858: The JSON currently sets "judge_enabled" based solely on
self.config.judge_model which claims the judge ran even if it couldn't (e.g.,
missing OPENAI_API_KEY); change this to expose two flags instead:
"judge_requested": bool(self.config.judge_model) and "judge_available":
bool(self.config.judge_model) and bool(os.getenv("OPENAI_API_KEY")) (or the
runtime check your code uses for OpenAI availability), so callers can tell
whether the judge was requested vs actually available to run; update the dict
where "judge_enabled" and "judge_model" are set and import/use os.getenv or your
existing OpenAI-availability helper.
- Around line 68-72: The default for judge_model is being locked at import time
by using os.getenv(...) in the field default; change the dataclass field
declaration (judge_model) to use field(default_factory=lambda:
os.getenv("BENCH_JUDGE_MODEL") or None) so the environment is read when a
LoCoMoConfig instance is created (and keep the existing __post_init__ logic that
strips or sets None). Update the import to include dataclasses.field if needed
and ensure the symbol names to edit are judge_model in the dataclass and the
__post_init__ method of LoCoMoConfig.

In `@tests/test_locomo_cat5_judge.py`:
- Around line 229-257: The test
test_cat5_judge_failures_skip_instead_of_marking_wrong fails to stub the recall
step; patch locomo_evaluator._recall_memories_for_qa (the method called by
_evaluate_question) to return the same fixed evidence_memories used elsewhere in
the file so the test does not contact the local AutoMem API; update the test to
monkeypatch or set locomo_evaluator._recall_memories_for_qa = lambda qa,
sample_id, use_local_cache=False: evidence_memories before calling
_evaluate_question, ensuring the judge path runs with the stubbed memories.
- Around line 30-33: The locomo_evaluator fixture currently inherits
environment-driven judge/OpenAI settings; update the fixture so it explicitly
disables judge mode and clears any OpenAI key before constructing the evaluator:
obtain a LoCoMoConfig instance in the fixture, set the judge-related flag or
field (e.g., BENCH_JUDGE_MODEL or config.use_judge) to a falsy value and ensure
the OpenAI credential (e.g., OPENAI_API_KEY or config.openai_api_key) is
cleared/None, then pass that sanitized config into LoCoMoEvaluator;
alternatively explicitly unset os.environ['OPENAI_API_KEY'] in the fixture
before creating LoCoMoEvaluator so tests run with a deterministic, judge-off
baseline.

---

Duplicate comments:
In `@README.md`:
- Around line 560-562: The blockquote contains an empty blank line between two
quote paragraphs which triggers MD028; remove the solitary ">" blank line so the
two notes become one continuous blockquote (or alternatively convert the two
quoted lines into normal paragraphs), editing the quoted lines starting with
"**Methodology note:**" and "**History note:**" so there is no blank ">" line
between them.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 69fbbc58-4478-4a94-b765-eed6fc9e8e2e

📥 Commits

Reviewing files that changed from the base of the PR and between da4086c and 48438ce.

📒 Files selected for processing (4)

README.md
docs/TESTING.md
tests/benchmarks/test_locomo.py
tests/test_locomo_cat5_judge.py

- test_locomo.py: use field(default_factory=...) for judge_model to read env at instance time - test_locomo.py: split judge_enabled into judge_requested/judge_available - test_locomo_cat5_judge.py: sanitize fixture to disable judge/OpenAI for deterministic tests - test_locomo_cat5_judge.py: stub _recall_memories_for_qa in judge failure test

jack-arturo · 2026-03-10T15:27:58Z

Resolving CodeRabbit comment #2912565229

✅ Addressed and implemented

Note: Fixed in f10fbfe — changed to field(default_factory=lambda: os.getenv(...)) so env is read at instance creation

Resolved via CodeRabbit MCP

jack-arturo · 2026-03-10T15:28:02Z

Resolving CodeRabbit comment #2912565238

✅ Addressed and implemented

Note: Fixed in f10fbfe — split into judge_requested and judge_available (checks self.openai_client)

Resolved via CodeRabbit MCP

jack-arturo · 2026-03-10T15:28:06Z

Resolving CodeRabbit comment #2912565245

✅ Addressed and implemented

Note: Fixed in f10fbfe — fixture now explicitly sets judge_model=None, openai_client=None, use_embedding_similarity=False

Resolved via CodeRabbit MCP

jack-arturo · 2026-03-10T15:28:11Z

Resolving CodeRabbit comment #2912565250

✅ Addressed and implemented

Note: Fixed in f10fbfe — added monkeypatch for _recall_memories_for_qa to return stubbed memories

Resolved via CodeRabbit MCP

coderabbitai

🧹 Nitpick comments (1)

tests/test_locomo_cat5_judge.py (1)

74-89: Unused lambda arguments in stubs are intentional.

The static analysis warnings (ARG005) for unused lambda arguments are false positives in this context. Test stubs must match the real method signatures to work correctly with monkeypatch.setattr. Prefixing unused args with _ is optional but would be more Pythonic:

Optional: prefix unused args

-        lambda question, sample_id, evidence: [_fake_memory("D10:20", "Jolene watches videos.")],
+        lambda _question, _sample_id, _evidence: [_fake_memory("D10:20", "Jolene watches videos.")],

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_locomo_cat5_judge.py` around lines 74 - 89, The lambdas used in
the test stubs trigger static analysis for unused arguments; update the
monkeypatch.setattr calls that replace locomo_evaluator._recall_memories_for_qa
and locomo_evaluator.judge_complex_reasoning so their lambda parameters that are
unused are prefixed with an underscore (e.g., question -> _question, sample_id
-> _sample_id, evidence -> _evidence, adversarial_answer -> _adversarial_answer,
recalled_memories -> _recalled_memories, evidence_dialog_ids ->
_evidence_dialog_ids) to silence ARG005 while keeping the signatures identical
for monkeypatching.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_locomo_cat5_judge.py`:
- Around line 74-89: The lambdas used in the test stubs trigger static analysis
for unused arguments; update the monkeypatch.setattr calls that replace
locomo_evaluator._recall_memories_for_qa and
locomo_evaluator.judge_complex_reasoning so their lambda parameters that are
unused are prefixed with an underscore (e.g., question -> _question, sample_id
-> _sample_id, evidence -> _evidence, adversarial_answer -> _adversarial_answer,
recalled_memories -> _recalled_memories, evidence_dialog_ids ->
_evidence_dialog_ids) to silence ARG005 while keeping the signatures identical
for monkeypatching.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4d9db0a6-9b24-4760-b814-3b136c8fed09

📥 Commits

Reviewing files that changed from the base of the PR and between 48438ce and f10fbfe.

📒 Files selected for processing (2)

tests/benchmarks/test_locomo.py
tests/test_locomo_cat5_judge.py

🤖 I have created a release *beep* *boop* --- ## [0.15.0](v0.14.0...v0.15.0) (2026-03-12) ### Features * **bench:** add opt-in LoCoMo cat5 judge ([#119](#119)) ([8b7915c](8b7915c)) * **memory:** optimize relationship taxonomy ([#121](#121)) ([605633e](605633e)) ### Bug Fixes * **benchmarks:** correct evaluator bugs, add agent guidelines, establish honest baseline ([#117](#117)) ([d9cbecb](d9cbecb)) * **recall:** priority_ids parameter only boosts relevance ([#79](#79)) ([#125](#125)) ([5d3708c](5d3708c)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

feat(bench): add opt-in LoCoMo cat5 judge

da4086c

coderabbitai Bot added the enhancement New feature or request label Mar 10, 2026

coderabbitai Bot reviewed Mar 10, 2026

View reviewed changes

Comment thread docs/TESTING.md

Comment thread README.md

Comment thread README.md

Comment thread tests/benchmarks/test_locomo.py Outdated

coderabbitai Bot reviewed Mar 10, 2026

View reviewed changes

Comment thread tests/benchmarks/test_locomo.py Outdated

Comment thread tests/benchmarks/test_locomo.py Outdated

Comment thread tests/test_locomo_cat5_judge.py Outdated

Comment thread tests/test_locomo_cat5_judge.py

coderabbitai Bot reviewed Mar 10, 2026

View reviewed changes

jack-arturo merged commit 8b7915c into main Mar 10, 2026
7 checks passed

jack-arturo deleted the feat/bench-cat5-judge branch March 10, 2026 15:37

jack-arturo mentioned this pull request Mar 10, 2026

chore(main): release 0.15.0 #118

Merged

Conversation

jack-arturo commented Mar 10, 2026

Summary

Implementation

Verification

Notes

Uh oh!

coderabbitai Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jack-arturo commented Mar 10, 2026

Uh oh!

jack-arturo commented Mar 10, 2026

Uh oh!

jack-arturo commented Mar 10, 2026

Uh oh!

jack-arturo commented Mar 10, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jack-arturo commented Mar 10, 2026

Uh oh!

jack-arturo commented Mar 10, 2026

Uh oh!

jack-arturo commented Mar 10, 2026

Uh oh!

jack-arturo commented Mar 10, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Mar 10, 2026 •

edited

Loading