test: consolidate and reuse test fixtures in conftest files by wang-boyu · Pull Request #121 · mesa/mesa-llm

wang-boyu · 2026-03-01T15:09:10Z

This PR is a first attempt to refactor all tests so that:

Test fixtures are placed in conftest.py files as much as possible, instead of being duplicated in different test_*.py files.
Re-use these consolidated test fixtures as much as possible, particularly for llm agents fixtures and llm responses fixtures, instead of creating them repeatedly across different test cases.
No test cases were added / removed.

There are still more ad-hoc Mock() objects in tests, but this PR is getting too big to review. We can check and refactor them separately in the future.

Summary by CodeRabbit

Tests
- Consolidated test fixtures and utilities into centralized configuration files for improved maintainability and consistency across test suites.
- Refactored test setup to use shared factory patterns for mock objects, reducing duplication and simplifying test initialization.
- Updated test environment configuration for more reliable and repeatable testing.

coderabbitai · 2026-03-01T15:09:16Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-01T15:09:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.43%. Comparing base (21da947) to head (c1b99ad).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #121   +/-   ##
=======================================
  Coverage   89.43%   89.43%           
=======================================
  Files          19       19           
  Lines        1457     1457           
=======================================
  Hits         1303     1303           
  Misses        154      154

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tests/test_recording/conftest.py (1)

149-161: Consider reusing the temp_dir fixture for consistency.

This fixture creates its own temporary directory via tempfile.TemporaryDirectory() instead of reusing the temp_dir fixture defined above. While functional, reusing temp_dir would reduce duplication and maintain consistency.

♻️ Proposed refactor

 `@pytest.fixture`
-def temp_recording_file(sample_recording_data):
+def temp_recording_file(sample_recording_data, temp_dir):
     """Create temporary recording files for testing."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        json_path = Path(tmpdir) / "test_recording.json"
-        with open(json_path, "w") as f:
-            json.dump(sample_recording_data, f)
+    json_path = temp_dir / "test_recording.json"
+    with open(json_path, "w") as f:
+        json.dump(sample_recording_data, f)
 
-        pkl_path = Path(tmpdir) / "test_recording.pkl"
-        with open(pkl_path, "wb") as f:
-            pickle.dump(sample_recording_data, f)
+    pkl_path = temp_dir / "test_recording.pkl"
+    with open(pkl_path, "wb") as f:
+        pickle.dump(sample_recording_data, f)
 
-        yield json_path, pkl_path
+    yield json_path, pkl_path

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_recording/conftest.py` around lines 149 - 161, The
temp_recording_file fixture should reuse the existing temp_dir fixture instead
of creating its own TemporaryDirectory: update the fixture signature to accept
temp_dir (and keep sample_recording_data), write the JSON and pickle files into
Path(temp_dir) / "test_recording.json" and Path(temp_dir) /
"test_recording.pkl", and yield those two Path objects; this removes duplication
and relies on the temp_dir fixture for lifecycle/cleanup while preserving the
existing behavior in temp_recording_file.

tests/test_reasoning/test_rewoo.py (1)

80-90: Consider extracting repeated agent setup into a local helper/fixture.

The same arrangement block appears in multiple tests; one shared helper would reduce maintenance overhead.

♻️ Proposed refactor

+def _configure_agent_for_plan_tests(mock_agent, step_prompt="Default step prompt"):
+    mock_agent.step_prompt = step_prompt
+    mock_agent.generate_obs.return_value = Observation(
+        step=1, self_state={}, local_state={}
+    )
+    mock_agent.memory = Mock()
+    mock_agent.memory.format_long_term.return_value = "Long term memory"
+    mock_agent.memory.format_short_term.return_value = "Short term memory"
+    mock_agent.memory.add_to_memory = Mock()
+    mock_agent.llm = Mock()
+    mock_agent.tool_manager = Mock()
+    mock_agent.tool_manager.get_all_tools_schema.return_value = {}
+    return mock_agent

Then in each test:

-        mock_agent.step_prompt = "Default step prompt"
-        mock_agent.generate_obs.return_value = Observation(
-            step=1, self_state={}, local_state={}
-        )
-        mock_agent.memory = Mock()
-        mock_agent.memory.format_long_term.return_value = "Long term memory"
-        mock_agent.memory.format_short_term.return_value = "Short term memory"
-        mock_agent.memory.add_to_memory = Mock()
-        mock_agent.llm = Mock()
-        mock_agent.tool_manager = Mock()
-        mock_agent.tool_manager.get_all_tools_schema.return_value = {}
+        mock_agent = _configure_agent_for_plan_tests(mock_agent)

Also applies to: 118-125, 150-157, 198-205, 249-255, 289-295, 325-331

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_reasoning/test_rewoo.py` around lines 80 - 90, Several tests
repeat the same mock_agent setup (setting mock_agent.step_prompt,
mock_agent.generate_obs return value using Observation(...), mock_agent.memory
with format_long_term/format_short_term/add_to_memory, mock_agent.llm, and
mock_agent.tool_manager.get_all_tools_schema). Extract this into a shared helper
or pytest fixture (e.g., make_mock_agent or a `@pytest.fixture` named
mock_agent_setup) that constructs and returns the configured mock_agent with
step_prompt, generate_obs, memory stubs, llm, and tool_manager schema; then
replace the repeated blocks in tests with calls to that helper/fixture to reduce
duplication.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_recording/conftest.py`:
- Around line 49-51: Metadata field total_events is inconsistent with the actual
events list; update the test fixture so they match by either adjusting the
total_events value from 10 to 7 or adding the missing event entries to the
events list (e.g., add events with event_ids continuing from 000008–000010).
Locate and modify the total_events variable and/or the events list in the
conftest fixture (look for the dictionary containing "total_steps",
"total_events", "total_agents" and the corresponding "events" array) so the
count accurately reflects the number of event objects.

---

Nitpick comments:
In `@tests/test_reasoning/test_rewoo.py`:
- Around line 80-90: Several tests repeat the same mock_agent setup (setting
mock_agent.step_prompt, mock_agent.generate_obs return value using
Observation(...), mock_agent.memory with
format_long_term/format_short_term/add_to_memory, mock_agent.llm, and
mock_agent.tool_manager.get_all_tools_schema). Extract this into a shared helper
or pytest fixture (e.g., make_mock_agent or a `@pytest.fixture` named
mock_agent_setup) that constructs and returns the configured mock_agent with
step_prompt, generate_obs, memory stubs, llm, and tool_manager schema; then
replace the repeated blocks in tests with calls to that helper/fixture to reduce
duplication.

In `@tests/test_recording/conftest.py`:
- Around line 149-161: The temp_recording_file fixture should reuse the existing
temp_dir fixture instead of creating its own TemporaryDirectory: update the
fixture signature to accept temp_dir (and keep sample_recording_data), write the
JSON and pickle files into Path(temp_dir) / "test_recording.json" and
Path(temp_dir) / "test_recording.pkl", and yield those two Path objects; this
removes duplication and relies on the temp_dir fixture for lifecycle/cleanup
while preserving the existing behavior in temp_recording_file.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75592b7 and 11dff26.

📒 Files selected for processing (16)

tests/conftest.py
tests/test_async_memory.py
tests/test_llm_agent.py
tests/test_memory/conftest.py
tests/test_memory/memory_utils.py
tests/test_memory/test_episodic_memory.py
tests/test_memory/test_memory_parent.py
tests/test_module_llm.py
tests/test_reasoning/test_cot.py
tests/test_reasoning/test_react.py
tests/test_reasoning/test_reasoning.py
tests/test_reasoning/test_rewoo.py
tests/test_recording/conftest.py
tests/test_recording/test_agent_analysis.py
tests/test_recording/test_record_model.py
tests/test_recording/test_simulation_recorder.py

💤 Files with no reviewable changes (5)

tests/test_memory/memory_utils.py
tests/test_llm_agent.py
tests/test_recording/test_record_model.py
tests/test_recording/test_simulation_recorder.py
tests/test_recording/test_agent_analysis.py

tests/test_recording/conftest.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

wang-boyu requested review from EwoutH and sanika-n March 1, 2026 15:09

wang-boyu added testing Release notes label maintenance Release notes label labels Mar 1, 2026

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

tests/test_recording/conftest.py Show resolved Hide resolved

wang-boyu added 3 commits March 2, 2026 20:11

consolidate test fixtures into conftest files

0d7847f

relace ad-hoc llm response mocks with shared llm_response_factory

cb7c516

replace ad-hoc agent mocks to shared mock_agent fixture

537081e

wang-boyu force-pushed the refactor/test-mocks branch from 83752e1 to d3a8349 Compare March 3, 2026 01:11

fix total_events in tests/test_recording/conftest.py

55e9ced

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

wang-boyu force-pushed the refactor/test-mocks branch from d3a8349 to 55e9ced Compare March 3, 2026 01:13

Merge branch 'main' into refactor/test-mocks

c1b99ad

wang-boyu force-pushed the refactor/test-mocks branch from 2d9b958 to c1b99ad Compare March 3, 2026 23:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: consolidate and reuse test fixtures in conftest files#121

test: consolidate and reuse test fixtures in conftest files#121
wang-boyu wants to merge 5 commits intomesa:mainfrom
wang-boyu:refactor/test-mocks

wang-boyu commented Mar 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 1, 2026 •

edited

Loading

Review skipped

Uh oh!

codecov bot commented Mar 1, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

wang-boyu commented Mar 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

codecov bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wang-boyu commented Mar 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 1, 2026 •

edited

Loading

codecov bot commented Mar 1, 2026 •

edited

Loading