fix: use multiple seeds per task in regression baselines#33
Conversation
WalkthroughThe regression test infrastructure was enhanced to generate 5 test cases per task instead of 1, with seeds computed as Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
data/memory_tier/regression.json (1)
1-28:⚠️ Potential issue | 🟠 MajorSame issue: only one of 5 expected seeds is present.
This file should contain seeds 70-74, but only seed=74 remains due to the overwrite bug in
generate_regression_file().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@data/memory_tier/regression.json` around lines 1 - 28, The regression.json generation currently overwrote previous seed entries so only seed=74 remains; update the generate_regression_file() logic to accumulate results for seeds 70–74 (or iterate seed range 70..74) and write the full array once (or open the file in append/merge-safe mode) instead of replacing the file on each seed; ensure the function produces a JSON array containing five objects (one per seed) matching the expected structure so regression.json contains seeds 70,71,72,73,74.chuck/regression.py (1)
45-52:⚠️ Potential issue | 🔴 CriticalBaseline files are overwritten on each iteration, losing 4 of 5 seeds per task.
Each call to
baseline_file.write_text()overwrites the previous content, so only the last seed (offset=4) survives. The manifest also ends up with 5 duplicate entries per task pointing to the same single-entry file.🐛 Proposed fix: Group entries by task before writing
def generate_regression_file(path: Path | None = None) -> Path: target = DEFAULT_REGRESSION_PATH if path is None else path target.parent.mkdir(parents=True, exist_ok=True) + # Group cases by task + from collections import defaultdict + cases_by_task: dict[str, list[dict[str, Any]]] = defaultdict(list) + for entry in _expected_cases(): + cases_by_task[entry["task"]].append(entry) + manifest = [] - for entry in _expected_cases(): - baseline_file = _baseline_path(entry["task"]) + for task_name, entries in cases_by_task.items(): + baseline_file = _baseline_path(task_name) baseline_file.parent.mkdir(parents=True, exist_ok=True) - baseline_file.write_text(json.dumps([entry], indent=2, sort_keys=True) + "\n", encoding="utf-8") - manifest.append({"task": entry["task"], "path": str(baseline_file.relative_to(ROOT))}) + baseline_file.write_text(json.dumps(entries, indent=2, sort_keys=True) + "\n", encoding="utf-8") + manifest.append({"task": task_name, "path": str(baseline_file.relative_to(ROOT))}) target.write_text(json.dumps(manifest, indent=2, sort_keys=True) + "\n", encoding="utf-8") return target🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@chuck/regression.py` around lines 45 - 52, The loop currently writes each entry into its task baseline file with baseline_file.write_text(), which overwrites prior seeds so only the last entry per task remains and manifest gains duplicate task entries; instead, collect entries from _expected_cases() grouped by entry["task"] (e.g., build a dict mapping task -> list[entry]), then for each task call _baseline_path(task) once, mkdir parent, write the full list of entries for that task with write_text, and append a single manifest entry {"task": task, "path": str(baseline_file.relative_to(ROOT))}; finally write the deduplicated manifest with target.write_text().data/retrieval_core/regression.json (1)
1-23:⚠️ Potential issue | 🟠 MajorOnly the last seed is preserved; expected 5 entries per task.
This file should contain 5 regression entries (seeds 30-34 for retrieval_core), but only contains seed=34. The earlier seeds were overwritten during generation. See the root cause in
chuck/regression.pywheregenerate_regression_file()overwrites the file on each iteration instead of collecting all entries per task.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@data/retrieval_core/regression.json` around lines 1 - 23, generate_regression_file currently overwrites regression.json on each iteration so only the last seed remains; modify generate_regression_file in chuck/regression.py to collect all entries for the task into a list (e.g., accumulate into a local entries or regression_data list rather than writing inside the per-seed loop), populate entries for seeds 30–34, then open/write the JSON file once after the loop (use json.dump on the collected list) so all five regression entries are persisted instead of only seed=34.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@data/regression.json`:
- Around line 2-21: The manifest file contains 5 duplicate entries per task
because generate_regression_file() is overwriting or appending the same path
repeatedly instead of creating a single entry per task that references a file
containing all 5 seeds; update generate_regression_file() so it creates one
manifest entry per task (not one per seed) by aggregating the seeds into the
task's baseline file and writing only a single { "path": "...", "task": "<task>"
} entry per task, ensuring the manifest length matches the test expectation in
tests/test_regression.py (len(entries) == 10) and that _load_expected_by_task()
in comparison.py receives unique task keys (no duplicates to be silently
dropped).
- Around line 1-21: The manifest uses Windows backslashes and duplicated entries
causing POSIX FileNotFoundError and silent task-key overwrites; update the JSON
in data/regression.json to use POSIX-safe forward slashes (e.g.,
"data/io_pipeline/regression.json") and regenerate so there are exactly 10
unique task entries (one per task) instead of repeated duplicates so that
load_regression_file() when doing ROOT / item["path"] finds the file and
_load_expected_by_task() does not lose entries by key overwrites.
---
Outside diff comments:
In `@chuck/regression.py`:
- Around line 45-52: The loop currently writes each entry into its task baseline
file with baseline_file.write_text(), which overwrites prior seeds so only the
last entry per task remains and manifest gains duplicate task entries; instead,
collect entries from _expected_cases() grouped by entry["task"] (e.g., build a
dict mapping task -> list[entry]), then for each task call _baseline_path(task)
once, mkdir parent, write the full list of entries for that task with
write_text, and append a single manifest entry {"task": task, "path":
str(baseline_file.relative_to(ROOT))}; finally write the deduplicated manifest
with target.write_text().
In `@data/memory_tier/regression.json`:
- Around line 1-28: The regression.json generation currently overwrote previous
seed entries so only seed=74 remains; update the generate_regression_file()
logic to accumulate results for seeds 70–74 (or iterate seed range 70..74) and
write the full array once (or open the file in append/merge-safe mode) instead
of replacing the file on each seed; ensure the function produces a JSON array
containing five objects (one per seed) matching the expected structure so
regression.json contains seeds 70,71,72,73,74.
In `@data/retrieval_core/regression.json`:
- Around line 1-23: generate_regression_file currently overwrites
regression.json on each iteration so only the last seed remains; modify
generate_regression_file in chuck/regression.py to collect all entries for the
task into a list (e.g., accumulate into a local entries or regression_data list
rather than writing inside the per-seed loop), populate entries for seeds 30–34,
then open/write the JSON file once after the loop (use json.dump on the
collected list) so all five regression entries are persisted instead of only
seed=34.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1d7275de-f7f9-4c15-93af-9e48d705a67b
📒 Files selected for processing (12)
chuck/regression.pydata/compute_core/regression.jsondata/data_encoding/regression.jsondata/graph_analytics/regression.jsondata/io_pipeline/regression.jsondata/memory_index/regression.jsondata/memory_tier/regression.jsondata/ordering_core/regression.jsondata/prime_analytics/regression.jsondata/regression.jsondata/relational_fusion/regression.jsondata/retrieval_core/regression.json
| [ | ||
| { | ||
| "path": "data/io_pipeline/regression.json", | ||
| "path": "data\\io_pipeline\\regression.json", | ||
| "task": "io_pipeline" | ||
| }, | ||
| { | ||
| "path": "data/ordering_core/regression.json", | ||
| "path": "data\\io_pipeline\\regression.json", | ||
| "task": "io_pipeline" | ||
| }, | ||
| { | ||
| "path": "data\\io_pipeline\\regression.json", | ||
| "task": "io_pipeline" | ||
| }, | ||
| { | ||
| "path": "data\\io_pipeline\\regression.json", | ||
| "task": "io_pipeline" | ||
| }, | ||
| { | ||
| "path": "data\\io_pipeline\\regression.json", | ||
| "task": "io_pipeline" | ||
| }, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify if the baseline files can be resolved with current paths
cd data
echo "=== Checking if paths resolve correctly ==="
# List actual directory structure
find . -name "regression.json" -type f
echo ""
echo "=== Testing path resolution ==="
# On POSIX, this literal path should NOT exist
if [ -e "data\\io_pipeline\\regression.json" ]; then
echo "UNEXPECTED: Backslash path exists (likely Windows)"
else
echo "EXPECTED: Backslash path does not exist on POSIX"
fiRepository: iiitl/chuck
Length of output: 521
🏁 Script executed:
cat data/regression.jsonRepository: iiitl/chuck
Length of output: 4537
🏁 Script executed:
find . -name "*.py" -type f | head -20Repository: iiitl/chuck
Length of output: 753
🏁 Script executed:
grep -r "load_regression_file" --include="*.py" -B 2 -A 10Repository: iiitl/chuck
Length of output: 2945
🏁 Script executed:
grep -r "test_regression\|_load_expected_by_task" --include="*.py" -B 2 -A 10 | head -100Repository: iiitl/chuck
Length of output: 1490
🏁 Script executed:
grep -r "DEFAULT_REGRESSION_PATH\|test.*regression" --include="*.py" -B 2 -A 5Repository: iiitl/chuck
Length of output: 5156
🏁 Script executed:
grep -r "def test_" --include="*.py" | grep -i regressionRepository: iiitl/chuck
Length of output: 268
🏁 Script executed:
grep -A 5 "DEFAULT_REGRESSION_PATH" chuck/regression.pyRepository: iiitl/chuck
Length of output: 819
🏁 Script executed:
cat tests/test_regression.pyRepository: iiitl/chuck
Length of output: 2334
Windows-style path separators will break on POSIX systems.
Paths like "data\\io_pipeline\\regression.json" use backslashes which are literal characters (not separators) on Linux/Mac. When load_regression_file() executes ROOT / item["path"], it will attempt to read a file literally named data\io_pipeline\regression.json instead of data/io_pipeline/regression.json, causing FileNotFoundError on POSIX.
Additionally, the manifest should contain 10 entries (one per task), not 50. The current 5 duplicate entries per task are redundant, and when _load_expected_by_task() builds a dict keyed by task name, the duplicates silently collapse to a single entry per task via key overwriting.
Regenerate this file on a POSIX system or use forward slashes (data/io_pipeline/regression.json) which work cross-platform.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@data/regression.json` around lines 1 - 21, The manifest uses Windows
backslashes and duplicated entries causing POSIX FileNotFoundError and silent
task-key overwrites; update the JSON in data/regression.json to use POSIX-safe
forward slashes (e.g., "data/io_pipeline/regression.json") and regenerate so
there are exactly 10 unique task entries (one per task) instead of repeated
duplicates so that load_regression_file() when doing ROOT / item["path"] finds
the file and _load_expected_by_task() does not lose entries by key overwrites.
| { | ||
| "path": "data/io_pipeline/regression.json", | ||
| "path": "data\\io_pipeline\\regression.json", | ||
| "task": "io_pipeline" | ||
| }, | ||
| { | ||
| "path": "data/ordering_core/regression.json", | ||
| "path": "data\\io_pipeline\\regression.json", | ||
| "task": "io_pipeline" | ||
| }, | ||
| { | ||
| "path": "data\\io_pipeline\\regression.json", | ||
| "task": "io_pipeline" | ||
| }, | ||
| { | ||
| "path": "data\\io_pipeline\\regression.json", | ||
| "task": "io_pipeline" | ||
| }, | ||
| { | ||
| "path": "data\\io_pipeline\\regression.json", | ||
| "task": "io_pipeline" | ||
| }, |
There was a problem hiding this comment.
Manifest has 50 duplicate entries; should have 10 (one per task).
The manifest contains 5 identical entries per task, but each points to the same baseline file. Per the relevant code snippet from tests/test_regression.py:48-52, the test expects exactly 10 entries:
self.assertEqual(len(entries), 10)Additionally, _load_expected_by_task() in comparison.py builds a dict keyed by task name, so duplicates are silently discarded anyway.
After fixing the overwrite bug in generate_regression_file(), the manifest should have one entry per task (pointing to a file containing all 5 seeds).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@data/regression.json` around lines 2 - 21, The manifest file contains 5
duplicate entries per task because generate_regression_file() is overwriting or
appending the same path repeatedly instead of creating a single entry per task
that references a file containing all 5 seeds; update generate_regression_file()
so it creates one manifest entry per task (not one per seed) by aggregating the
seeds into the task's baseline file and writing only a single { "path": "...",
"task": "<task>" } entry per task, ensuring the manifest length matches the test
expectation in tests/test_regression.py (len(entries) == 10) and that
_load_expected_by_task() in comparison.py receives unique task keys (no
duplicates to be silently dropped).
| }, | ||
| { | ||
| "path": "data/ordering_core/regression.json", | ||
| "path": "data\\io_pipeline\\regression.json", |
| for offset in range(SEEDS_PER_TASK): | ||
|
|
||
| seed = (index * 10) + offset | ||
|
|
There was a problem hiding this comment.
Could you please tell your reasoning behind these changes
| { | ||
| "path": "data/relational_fusion/regression.json", | ||
| "path": "data\\compute_core\\regression.json", | ||
| "task": "compute_core" |
There was a problem hiding this comment.
chuck currently supports linux, mac and WSL. These windows style path style must be reverted
| "path": "data/memory_tier/regression.json", | ||
| "path": "data\\prime_analytics\\regression.json", | ||
| "task": "prime_analytics" | ||
| }, |
There was a problem hiding this comment.
Do you realise why each entry is occuring 5 time?
|
The CI checks are also failing, fix that too |
Resolves #3.
Description
Previously,
_expected_cases()inchuck/regression.pyonly evaluated a single baseline seed per task (seed = task_index). This left each task with only one data point, which does not cover edge cases for regression testing.Changes Made
_expected_cases()to generate multiple baseline cases per task (5 seeds per task instead of 1).seed = (index * 10) + offset.regression.jsonfiles usingpython -m chuck generate-baselines.Summary by CodeRabbit