fix: use multiple seeds per task in regression baselines by AviralAv03 · Pull Request #33 · iiitl/chuck

AviralAv03 · 2026-04-11T15:45:14Z

Resolves #3.

Description

Previously, _expected_cases() in chuck/regression.py only evaluated a single baseline seed per task (seed = task_index). This left each task with only one data point, which does not cover edge cases for regression testing.

Changes Made

Modified the loop in _expected_cases() to generate multiple baseline cases per task (5 seeds per task instead of 1).
Updated seed calculation ,while allowing for the offset: seed = (index * 10) + offset.
Regenerated the baseline regression.json files using python -m chuck generate-baselines.

Summary by CodeRabbit

Tests
- Expanded regression test coverage with multiple test cases per task instead of single cases for more comprehensive validation.
- Updated all regression test baselines and expected results to reflect new test data.
- Enhanced test data organization with improved path reference formatting.

coderabbitai · 2026-04-11T15:45:31Z

Walkthrough

The regression test infrastructure was enhanced to generate 5 test cases per task instead of 1, with seeds computed as (task_index * 10) + offset. The code logic in chuck/regression.py was updated with an inner loop, and all regression fixture files were regenerated with new seed values and expected outputs.

Changes

Cohort / File(s)	Summary
Code Logic Update `chuck/regression.py`	Added `SEEDS_PER_TASK = 5` constant and modified `_expected_cases()` to generate 5 regression cases per task. Seed computation changed from simple index to `(index * 10) + offset`. Each case includes updated seed values while maintaining size, task, and expected value computation.
Regression Fixture Data `data/compute_core/regression.json`, `data/data_encoding/regression.json`, `data/graph_analytics/regression.json`, `data/io_pipeline/regression.json`, `data/memory_index/regression.json`, `data/memory_tier/regression.json`, `data/ordering_core/regression.json`, `data/prime_analytics/regression.json`, `data/relational_fusion/regression.json`, `data/retrieval_core/regression.json`	Updated expected test results across all task-specific regression fixture files. Each file regenerated with new seed values (incremented by 10× task index) and corresponding checksums, metrics, and output values recalculated.
Main Regression Manifest `data/regression.json`	Updated path separators from POSIX-style to Windows-style (`data\\...\\regression.json`). Expanded array to include multiple entries per task with duplicate objects for tasks like `io_pipeline`, `ordering_core`, `retrieval_core`, `data_encoding`, `graph_analytics`, `prime_analytics`, `memory_tier`, `memory_index`, `compute_core`, and `relational_fusion`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately describes the main change: implementing multiple seeds per task in regression baselines instead of single seeds.
Linked Issues check	✅ Passed	The PR fully implements the objective from issue `#3`: replacing single-seed-per-task with multiple seeds per task using the new seed calculation method.
Out of Scope Changes check	✅ Passed	All changes are directly related to the stated objective: modifying the seed generation logic in chuck/regression.py and regenerating regression baseline files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

data/memory_tier/regression.json (1)

1-28: ⚠️ Potential issue | 🟠 Major

Same issue: only one of 5 expected seeds is present.

This file should contain seeds 70-74, but only seed=74 remains due to the overwrite bug in generate_regression_file().

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@data/memory_tier/regression.json` around lines 1 - 28, The regression.json
generation currently overwrote previous seed entries so only seed=74 remains;
update the generate_regression_file() logic to accumulate results for seeds
70–74 (or iterate seed range 70..74) and write the full array once (or open the
file in append/merge-safe mode) instead of replacing the file on each seed;
ensure the function produces a JSON array containing five objects (one per seed)
matching the expected structure so regression.json contains seeds
70,71,72,73,74.

chuck/regression.py (1)

45-52: ⚠️ Potential issue | 🔴 Critical

Baseline files are overwritten on each iteration, losing 4 of 5 seeds per task.

Each call to baseline_file.write_text() overwrites the previous content, so only the last seed (offset=4) survives. The manifest also ends up with 5 duplicate entries per task pointing to the same single-entry file.

🐛 Proposed fix: Group entries by task before writing

 def generate_regression_file(path: Path | None = None) -> Path:
     target = DEFAULT_REGRESSION_PATH if path is None else path
     target.parent.mkdir(parents=True, exist_ok=True)

+    # Group cases by task
+    from collections import defaultdict
+    cases_by_task: dict[str, list[dict[str, Any]]] = defaultdict(list)
+    for entry in _expected_cases():
+        cases_by_task[entry["task"]].append(entry)
+
     manifest = []
-    for entry in _expected_cases():
-        baseline_file = _baseline_path(entry["task"])
+    for task_name, entries in cases_by_task.items():
+        baseline_file = _baseline_path(task_name)
         baseline_file.parent.mkdir(parents=True, exist_ok=True)
-        baseline_file.write_text(json.dumps([entry], indent=2, sort_keys=True) + "\n", encoding="utf-8")
-        manifest.append({"task": entry["task"], "path": str(baseline_file.relative_to(ROOT))})
+        baseline_file.write_text(json.dumps(entries, indent=2, sort_keys=True) + "\n", encoding="utf-8")
+        manifest.append({"task": task_name, "path": str(baseline_file.relative_to(ROOT))})

     target.write_text(json.dumps(manifest, indent=2, sort_keys=True) + "\n", encoding="utf-8")
     return target

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@chuck/regression.py` around lines 45 - 52, The loop currently writes each
entry into its task baseline file with baseline_file.write_text(), which
overwrites prior seeds so only the last entry per task remains and manifest
gains duplicate task entries; instead, collect entries from _expected_cases()
grouped by entry["task"] (e.g., build a dict mapping task -> list[entry]), then
for each task call _baseline_path(task) once, mkdir parent, write the full list
of entries for that task with write_text, and append a single manifest entry
{"task": task, "path": str(baseline_file.relative_to(ROOT))}; finally write the
deduplicated manifest with target.write_text().

data/retrieval_core/regression.json (1)

1-23: ⚠️ Potential issue | 🟠 Major

Only the last seed is preserved; expected 5 entries per task.

This file should contain 5 regression entries (seeds 30-34 for retrieval_core), but only contains seed=34. The earlier seeds were overwritten during generation. See the root cause in chuck/regression.py where generate_regression_file() overwrites the file on each iteration instead of collecting all entries per task.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/retrieval_core/regression.json` around lines 1 - 23,
generate_regression_file currently overwrites regression.json on each iteration
so only the last seed remains; modify generate_regression_file in
chuck/regression.py to collect all entries for the task into a list (e.g.,
accumulate into a local entries or regression_data list rather than writing
inside the per-seed loop), populate entries for seeds 30–34, then open/write the
JSON file once after the loop (use json.dump on the collected list) so all five
regression entries are persisted instead of only seed=34.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data/regression.json`:
- Around line 2-21: The manifest file contains 5 duplicate entries per task
because generate_regression_file() is overwriting or appending the same path
repeatedly instead of creating a single entry per task that references a file
containing all 5 seeds; update generate_regression_file() so it creates one
manifest entry per task (not one per seed) by aggregating the seeds into the
task's baseline file and writing only a single { "path": "...", "task": "<task>"
} entry per task, ensuring the manifest length matches the test expectation in
tests/test_regression.py (len(entries) == 10) and that _load_expected_by_task()
in comparison.py receives unique task keys (no duplicates to be silently
dropped).
- Around line 1-21: The manifest uses Windows backslashes and duplicated entries
causing POSIX FileNotFoundError and silent task-key overwrites; update the JSON
in data/regression.json to use POSIX-safe forward slashes (e.g.,
"data/io_pipeline/regression.json") and regenerate so there are exactly 10
unique task entries (one per task) instead of repeated duplicates so that
load_regression_file() when doing ROOT / item["path"] finds the file and
_load_expected_by_task() does not lose entries by key overwrites.

---

Outside diff comments:
In `@chuck/regression.py`:
- Around line 45-52: The loop currently writes each entry into its task baseline
file with baseline_file.write_text(), which overwrites prior seeds so only the
last entry per task remains and manifest gains duplicate task entries; instead,
collect entries from _expected_cases() grouped by entry["task"] (e.g., build a
dict mapping task -> list[entry]), then for each task call _baseline_path(task)
once, mkdir parent, write the full list of entries for that task with
write_text, and append a single manifest entry {"task": task, "path":
str(baseline_file.relative_to(ROOT))}; finally write the deduplicated manifest
with target.write_text().

In `@data/memory_tier/regression.json`:
- Around line 1-28: The regression.json generation currently overwrote previous
seed entries so only seed=74 remains; update the generate_regression_file()
logic to accumulate results for seeds 70–74 (or iterate seed range 70..74) and
write the full array once (or open the file in append/merge-safe mode) instead
of replacing the file on each seed; ensure the function produces a JSON array
containing five objects (one per seed) matching the expected structure so
regression.json contains seeds 70,71,72,73,74.

In `@data/retrieval_core/regression.json`:
- Around line 1-23: generate_regression_file currently overwrites
regression.json on each iteration so only the last seed remains; modify
generate_regression_file in chuck/regression.py to collect all entries for the
task into a list (e.g., accumulate into a local entries or regression_data list
rather than writing inside the per-seed loop), populate entries for seeds 30–34,
then open/write the JSON file once after the loop (use json.dump on the
collected list) so all five regression entries are persisted instead of only
seed=34.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1d7275de-f7f9-4c15-93af-9e48d705a67b

📥 Commits

Reviewing files that changed from the base of the PR and between a302e5f and 8ab3973.

📒 Files selected for processing (12)

chuck/regression.py
data/compute_core/regression.json
data/data_encoding/regression.json
data/graph_analytics/regression.json
data/io_pipeline/regression.json
data/memory_index/regression.json
data/memory_tier/regression.json
data/ordering_core/regression.json
data/prime_analytics/regression.json
data/regression.json
data/relational_fusion/regression.json
data/retrieval_core/regression.json

coderabbitai · 2026-04-11T15:48:52Z

 [
  {
-    "path": "data/io_pipeline/regression.json",
+    "path": "data\\io_pipeline\\regression.json",
    "task": "io_pipeline"
  },
  {
-    "path": "data/ordering_core/regression.json",
+    "path": "data\\io_pipeline\\regression.json",
+    "task": "io_pipeline"
+  },
+  {
+    "path": "data\\io_pipeline\\regression.json",
+    "task": "io_pipeline"
+  },
+  {
+    "path": "data\\io_pipeline\\regression.json",
+    "task": "io_pipeline"
+  },
+  {
+    "path": "data\\io_pipeline\\regression.json",
+    "task": "io_pipeline"
+  },


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify if the baseline files can be resolved with current paths cd data echo "=== Checking if paths resolve correctly ===" # List actual directory structure find . -name "regression.json" -type f echo "" echo "=== Testing path resolution ===" # On POSIX, this literal path should NOT exist if [ -e "data\\io_pipeline\\regression.json" ]; then echo "UNEXPECTED: Backslash path exists (likely Windows)" else echo "EXPECTED: Backslash path does not exist on POSIX" fi

Repository: iiitl/chuck

Length of output: 521

🏁 Script executed:

cat data/regression.json

Repository: iiitl/chuck

Length of output: 4537

🏁 Script executed:

find . -name "*.py" -type f | head -20

Repository: iiitl/chuck

Length of output: 753

🏁 Script executed:

grep -r "load_regression_file" --include="*.py" -B 2 -A 10

Repository: iiitl/chuck

Length of output: 2945

🏁 Script executed:

grep -r "test_regression\|_load_expected_by_task" --include="*.py" -B 2 -A 10 | head -100

Repository: iiitl/chuck

Length of output: 1490

🏁 Script executed:

grep -r "DEFAULT_REGRESSION_PATH\|test.*regression" --include="*.py" -B 2 -A 5

Repository: iiitl/chuck

Length of output: 5156

🏁 Script executed:

grep -r "def test_" --include="*.py" | grep -i regression

Repository: iiitl/chuck

Length of output: 268

🏁 Script executed:

grep -A 5 "DEFAULT_REGRESSION_PATH" chuck/regression.py

Repository: iiitl/chuck

Length of output: 819

🏁 Script executed:

cat tests/test_regression.py

Repository: iiitl/chuck

Length of output: 2334

Windows-style path separators will break on POSIX systems.

Paths like "data\\io_pipeline\\regression.json" use backslashes which are literal characters (not separators) on Linux/Mac. When load_regression_file() executes ROOT / item["path"], it will attempt to read a file literally named data\io_pipeline\regression.json instead of data/io_pipeline/regression.json, causing FileNotFoundError on POSIX.

Additionally, the manifest should contain 10 entries (one per task), not 50. The current 5 duplicate entries per task are redundant, and when _load_expected_by_task() builds a dict keyed by task name, the duplicates silently collapse to a single entry per task via key overwriting.

Regenerate this file on a POSIX system or use forward slashes (data/io_pipeline/regression.json) which work cross-platform.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@data/regression.json` around lines 1 - 21, The manifest uses Windows backslashes and duplicated entries causing POSIX FileNotFoundError and silent task-key overwrites; update the JSON in data/regression.json to use POSIX-safe forward slashes (e.g., "data/io_pipeline/regression.json") and regenerate so there are exactly 10 unique task entries (one per task) instead of repeated duplicates so that load_regression_file() when doing ROOT / item["path"] finds the file and _load_expected_by_task() does not lose entries by key overwrites.

coderabbitai · 2026-04-11T15:48:52Z

  {
-    "path": "data/io_pipeline/regression.json",
+    "path": "data\\io_pipeline\\regression.json",
    "task": "io_pipeline"
  },
  {
-    "path": "data/ordering_core/regression.json",
+    "path": "data\\io_pipeline\\regression.json",
+    "task": "io_pipeline"
+  },
+  {
+    "path": "data\\io_pipeline\\regression.json",
+    "task": "io_pipeline"
+  },
+  {
+    "path": "data\\io_pipeline\\regression.json",
+    "task": "io_pipeline"
+  },
+  {
+    "path": "data\\io_pipeline\\regression.json",
+    "task": "io_pipeline"
+  },


⚠️ Potential issue | 🟠 Major

Manifest has 50 duplicate entries; should have 10 (one per task).

The manifest contains 5 identical entries per task, but each points to the same baseline file. Per the relevant code snippet from tests/test_regression.py:48-52, the test expects exactly 10 entries:

self.assertEqual(len(entries), 10)

Additionally, _load_expected_by_task() in comparison.py builds a dict keyed by task name, so duplicates are silently discarded anyway.

After fixing the overwrite bug in generate_regression_file(), the manifest should have one entry per task (pointing to a file containing all 5 seeds).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@data/regression.json` around lines 2 - 21, The manifest file contains 5 duplicate entries per task because generate_regression_file() is overwriting or appending the same path repeatedly instead of creating a single entry per task that references a file containing all 5 seeds; update generate_regression_file() so it creates one manifest entry per task (not one per seed) by aggregating the seeds into the task's baseline file and writing only a single { "path": "...", "task": "<task>" } entry per task, ensuring the manifest length matches the test expectation in tests/test_regression.py (len(entries) == 10) and that _load_expected_by_task() in comparison.py receives unique task keys (no duplicates to be silently dropped).

Aaryan-Dadu · 2026-04-11T16:32:38Z

  },
  {
-    "path": "data/ordering_core/regression.json",
+    "path": "data\\io_pipeline\\regression.json",


Aaryan-Dadu · 2026-04-11T16:37:05Z

+        for offset in range(SEEDS_PER_TASK):
+
+            seed = (index * 10) + offset 
+


Could you please tell your reasoning behind these changes

Aaryan-Dadu · 2026-04-11T16:38:31Z

  {
-    "path": "data/relational_fusion/regression.json",
+    "path": "data\\compute_core\\regression.json",
+    "task": "compute_core"


chuck currently supports linux, mac and WSL. These windows style path style must be reverted

Aaryan-Dadu · 2026-04-11T16:39:49Z

-    "path": "data/memory_tier/regression.json",
+    "path": "data\\prime_analytics\\regression.json",
+    "task": "prime_analytics"
+  },


Do you realise why each entry is occuring 5 time?

Aaryan-Dadu · 2026-04-11T16:43:25Z

The CI checks are also failing, fix that too

fix: use multiple seeds per task in regression baselines

8ab3973

mini-walkerx Bot mentioned this pull request Apr 11, 2026

Single Regression Seed Per Task #3

Open

coderabbitai Bot reviewed Apr 11, 2026

View reviewed changes

Aaryan-Dadu requested changes Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use multiple seeds per task in regression baselines#33

fix: use multiple seeds per task in regression baselines#33
AviralAv03 wants to merge 1 commit into
iiitl:mainfrom
AviralAv03:fix/multiple-regression-seeds

AviralAv03 commented Apr 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 11, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 11, 2026

Uh oh!

coderabbitai Bot Apr 11, 2026

Uh oh!

Aaryan-Dadu Apr 11, 2026

Uh oh!

Aaryan-Dadu Apr 11, 2026

Uh oh!

Aaryan-Dadu Apr 11, 2026

Uh oh!

Aaryan-Dadu Apr 11, 2026

Uh oh!

Aaryan-Dadu commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		for offset in range(SEEDS_PER_TASK):

		seed = (index * 10) + offset

Conversation

AviralAv03 commented Apr 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes Made

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Aaryan-Dadu Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Aaryan-Dadu Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Aaryan-Dadu Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Aaryan-Dadu Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Aaryan-Dadu commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AviralAv03 commented Apr 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 11, 2026 •

edited

Loading