Skip to content

fix: use multiple seeds per task in regression baselines#33

Open
AviralAv03 wants to merge 1 commit into
iiitl:mainfrom
AviralAv03:fix/multiple-regression-seeds
Open

fix: use multiple seeds per task in regression baselines#33
AviralAv03 wants to merge 1 commit into
iiitl:mainfrom
AviralAv03:fix/multiple-regression-seeds

Conversation

@AviralAv03
Copy link
Copy Markdown

@AviralAv03 AviralAv03 commented Apr 11, 2026

Resolves #3.

Description

Previously, _expected_cases() in chuck/regression.py only evaluated a single baseline seed per task (seed = task_index). This left each task with only one data point, which does not cover edge cases for regression testing.

Changes Made

  • Modified the loop in _expected_cases() to generate multiple baseline cases per task (5 seeds per task instead of 1).
  • Updated seed calculation ,while allowing for the offset: seed = (index * 10) + offset.
  • Regenerated the baseline regression.json files using python -m chuck generate-baselines.

Summary by CodeRabbit

  • Tests
    • Expanded regression test coverage with multiple test cases per task instead of single cases for more comprehensive validation.
    • Updated all regression test baselines and expected results to reflect new test data.
    • Enhanced test data organization with improved path reference formatting.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

Walkthrough

The regression test infrastructure was enhanced to generate 5 test cases per task instead of 1, with seeds computed as (task_index * 10) + offset. The code logic in chuck/regression.py was updated with an inner loop, and all regression fixture files were regenerated with new seed values and expected outputs.

Changes

Cohort / File(s) Summary
Code Logic Update
chuck/regression.py
Added SEEDS_PER_TASK = 5 constant and modified _expected_cases() to generate 5 regression cases per task. Seed computation changed from simple index to (index * 10) + offset. Each case includes updated seed values while maintaining size, task, and expected value computation.
Regression Fixture Data
data/compute_core/regression.json, data/data_encoding/regression.json, data/graph_analytics/regression.json, data/io_pipeline/regression.json, data/memory_index/regression.json, data/memory_tier/regression.json, data/ordering_core/regression.json, data/prime_analytics/regression.json, data/relational_fusion/regression.json, data/retrieval_core/regression.json
Updated expected test results across all task-specific regression fixture files. Each file regenerated with new seed values (incremented by 10× task index) and corresponding checksums, metrics, and output values recalculated.
Main Regression Manifest
data/regression.json
Updated path separators from POSIX-style to Windows-style (data\\...\\regression.json). Expanded array to include multiple entries per task with duplicate objects for tasks like io_pipeline, ordering_core, retrieval_core, data_encoding, graph_analytics, prime_analytics, memory_tier, memory_index, compute_core, and relational_fusion.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately describes the main change: implementing multiple seeds per task in regression baselines instead of single seeds.
Linked Issues check ✅ Passed The PR fully implements the objective from issue #3: replacing single-seed-per-task with multiple seeds per task using the new seed calculation method.
Out of Scope Changes check ✅ Passed All changes are directly related to the stated objective: modifying the seed generation logic in chuck/regression.py and regenerating regression baseline files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
data/memory_tier/regression.json (1)

1-28: ⚠️ Potential issue | 🟠 Major

Same issue: only one of 5 expected seeds is present.

This file should contain seeds 70-74, but only seed=74 remains due to the overwrite bug in generate_regression_file().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/memory_tier/regression.json` around lines 1 - 28, The regression.json
generation currently overwrote previous seed entries so only seed=74 remains;
update the generate_regression_file() logic to accumulate results for seeds
70–74 (or iterate seed range 70..74) and write the full array once (or open the
file in append/merge-safe mode) instead of replacing the file on each seed;
ensure the function produces a JSON array containing five objects (one per seed)
matching the expected structure so regression.json contains seeds
70,71,72,73,74.
chuck/regression.py (1)

45-52: ⚠️ Potential issue | 🔴 Critical

Baseline files are overwritten on each iteration, losing 4 of 5 seeds per task.

Each call to baseline_file.write_text() overwrites the previous content, so only the last seed (offset=4) survives. The manifest also ends up with 5 duplicate entries per task pointing to the same single-entry file.

🐛 Proposed fix: Group entries by task before writing
 def generate_regression_file(path: Path | None = None) -> Path:
     target = DEFAULT_REGRESSION_PATH if path is None else path
     target.parent.mkdir(parents=True, exist_ok=True)

+    # Group cases by task
+    from collections import defaultdict
+    cases_by_task: dict[str, list[dict[str, Any]]] = defaultdict(list)
+    for entry in _expected_cases():
+        cases_by_task[entry["task"]].append(entry)
+
     manifest = []
-    for entry in _expected_cases():
-        baseline_file = _baseline_path(entry["task"])
+    for task_name, entries in cases_by_task.items():
+        baseline_file = _baseline_path(task_name)
         baseline_file.parent.mkdir(parents=True, exist_ok=True)
-        baseline_file.write_text(json.dumps([entry], indent=2, sort_keys=True) + "\n", encoding="utf-8")
-        manifest.append({"task": entry["task"], "path": str(baseline_file.relative_to(ROOT))})
+        baseline_file.write_text(json.dumps(entries, indent=2, sort_keys=True) + "\n", encoding="utf-8")
+        manifest.append({"task": task_name, "path": str(baseline_file.relative_to(ROOT))})

     target.write_text(json.dumps(manifest, indent=2, sort_keys=True) + "\n", encoding="utf-8")
     return target
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@chuck/regression.py` around lines 45 - 52, The loop currently writes each
entry into its task baseline file with baseline_file.write_text(), which
overwrites prior seeds so only the last entry per task remains and manifest
gains duplicate task entries; instead, collect entries from _expected_cases()
grouped by entry["task"] (e.g., build a dict mapping task -> list[entry]), then
for each task call _baseline_path(task) once, mkdir parent, write the full list
of entries for that task with write_text, and append a single manifest entry
{"task": task, "path": str(baseline_file.relative_to(ROOT))}; finally write the
deduplicated manifest with target.write_text().
data/retrieval_core/regression.json (1)

1-23: ⚠️ Potential issue | 🟠 Major

Only the last seed is preserved; expected 5 entries per task.

This file should contain 5 regression entries (seeds 30-34 for retrieval_core), but only contains seed=34. The earlier seeds were overwritten during generation. See the root cause in chuck/regression.py where generate_regression_file() overwrites the file on each iteration instead of collecting all entries per task.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/retrieval_core/regression.json` around lines 1 - 23,
generate_regression_file currently overwrites regression.json on each iteration
so only the last seed remains; modify generate_regression_file in
chuck/regression.py to collect all entries for the task into a list (e.g.,
accumulate into a local entries or regression_data list rather than writing
inside the per-seed loop), populate entries for seeds 30–34, then open/write the
JSON file once after the loop (use json.dump on the collected list) so all five
regression entries are persisted instead of only seed=34.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@data/regression.json`:
- Around line 2-21: The manifest file contains 5 duplicate entries per task
because generate_regression_file() is overwriting or appending the same path
repeatedly instead of creating a single entry per task that references a file
containing all 5 seeds; update generate_regression_file() so it creates one
manifest entry per task (not one per seed) by aggregating the seeds into the
task's baseline file and writing only a single { "path": "...", "task": "<task>"
} entry per task, ensuring the manifest length matches the test expectation in
tests/test_regression.py (len(entries) == 10) and that _load_expected_by_task()
in comparison.py receives unique task keys (no duplicates to be silently
dropped).
- Around line 1-21: The manifest uses Windows backslashes and duplicated entries
causing POSIX FileNotFoundError and silent task-key overwrites; update the JSON
in data/regression.json to use POSIX-safe forward slashes (e.g.,
"data/io_pipeline/regression.json") and regenerate so there are exactly 10
unique task entries (one per task) instead of repeated duplicates so that
load_regression_file() when doing ROOT / item["path"] finds the file and
_load_expected_by_task() does not lose entries by key overwrites.

---

Outside diff comments:
In `@chuck/regression.py`:
- Around line 45-52: The loop currently writes each entry into its task baseline
file with baseline_file.write_text(), which overwrites prior seeds so only the
last entry per task remains and manifest gains duplicate task entries; instead,
collect entries from _expected_cases() grouped by entry["task"] (e.g., build a
dict mapping task -> list[entry]), then for each task call _baseline_path(task)
once, mkdir parent, write the full list of entries for that task with
write_text, and append a single manifest entry {"task": task, "path":
str(baseline_file.relative_to(ROOT))}; finally write the deduplicated manifest
with target.write_text().

In `@data/memory_tier/regression.json`:
- Around line 1-28: The regression.json generation currently overwrote previous
seed entries so only seed=74 remains; update the generate_regression_file()
logic to accumulate results for seeds 70–74 (or iterate seed range 70..74) and
write the full array once (or open the file in append/merge-safe mode) instead
of replacing the file on each seed; ensure the function produces a JSON array
containing five objects (one per seed) matching the expected structure so
regression.json contains seeds 70,71,72,73,74.

In `@data/retrieval_core/regression.json`:
- Around line 1-23: generate_regression_file currently overwrites
regression.json on each iteration so only the last seed remains; modify
generate_regression_file in chuck/regression.py to collect all entries for the
task into a list (e.g., accumulate into a local entries or regression_data list
rather than writing inside the per-seed loop), populate entries for seeds 30–34,
then open/write the JSON file once after the loop (use json.dump on the
collected list) so all five regression entries are persisted instead of only
seed=34.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1d7275de-f7f9-4c15-93af-9e48d705a67b

📥 Commits

Reviewing files that changed from the base of the PR and between a302e5f and 8ab3973.

📒 Files selected for processing (12)
  • chuck/regression.py
  • data/compute_core/regression.json
  • data/data_encoding/regression.json
  • data/graph_analytics/regression.json
  • data/io_pipeline/regression.json
  • data/memory_index/regression.json
  • data/memory_tier/regression.json
  • data/ordering_core/regression.json
  • data/prime_analytics/regression.json
  • data/regression.json
  • data/relational_fusion/regression.json
  • data/retrieval_core/regression.json

Comment thread data/regression.json
Comment on lines 1 to +21
[
{
"path": "data/io_pipeline/regression.json",
"path": "data\\io_pipeline\\regression.json",
"task": "io_pipeline"
},
{
"path": "data/ordering_core/regression.json",
"path": "data\\io_pipeline\\regression.json",
"task": "io_pipeline"
},
{
"path": "data\\io_pipeline\\regression.json",
"task": "io_pipeline"
},
{
"path": "data\\io_pipeline\\regression.json",
"task": "io_pipeline"
},
{
"path": "data\\io_pipeline\\regression.json",
"task": "io_pipeline"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify if the baseline files can be resolved with current paths
cd data
echo "=== Checking if paths resolve correctly ==="
# List actual directory structure
find . -name "regression.json" -type f

echo ""
echo "=== Testing path resolution ==="
# On POSIX, this literal path should NOT exist
if [ -e "data\\io_pipeline\\regression.json" ]; then
    echo "UNEXPECTED: Backslash path exists (likely Windows)"
else
    echo "EXPECTED: Backslash path does not exist on POSIX"
fi

Repository: iiitl/chuck

Length of output: 521


🏁 Script executed:

cat data/regression.json

Repository: iiitl/chuck

Length of output: 4537


🏁 Script executed:

find . -name "*.py" -type f | head -20

Repository: iiitl/chuck

Length of output: 753


🏁 Script executed:

grep -r "load_regression_file" --include="*.py" -B 2 -A 10

Repository: iiitl/chuck

Length of output: 2945


🏁 Script executed:

grep -r "test_regression\|_load_expected_by_task" --include="*.py" -B 2 -A 10 | head -100

Repository: iiitl/chuck

Length of output: 1490


🏁 Script executed:

grep -r "DEFAULT_REGRESSION_PATH\|test.*regression" --include="*.py" -B 2 -A 5

Repository: iiitl/chuck

Length of output: 5156


🏁 Script executed:

grep -r "def test_" --include="*.py" | grep -i regression

Repository: iiitl/chuck

Length of output: 268


🏁 Script executed:

grep -A 5 "DEFAULT_REGRESSION_PATH" chuck/regression.py

Repository: iiitl/chuck

Length of output: 819


🏁 Script executed:

cat tests/test_regression.py

Repository: iiitl/chuck

Length of output: 2334


Windows-style path separators will break on POSIX systems.

Paths like "data\\io_pipeline\\regression.json" use backslashes which are literal characters (not separators) on Linux/Mac. When load_regression_file() executes ROOT / item["path"], it will attempt to read a file literally named data\io_pipeline\regression.json instead of data/io_pipeline/regression.json, causing FileNotFoundError on POSIX.

Additionally, the manifest should contain 10 entries (one per task), not 50. The current 5 duplicate entries per task are redundant, and when _load_expected_by_task() builds a dict keyed by task name, the duplicates silently collapse to a single entry per task via key overwriting.

Regenerate this file on a POSIX system or use forward slashes (data/io_pipeline/regression.json) which work cross-platform.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/regression.json` around lines 1 - 21, The manifest uses Windows
backslashes and duplicated entries causing POSIX FileNotFoundError and silent
task-key overwrites; update the JSON in data/regression.json to use POSIX-safe
forward slashes (e.g., "data/io_pipeline/regression.json") and regenerate so
there are exactly 10 unique task entries (one per task) instead of repeated
duplicates so that load_regression_file() when doing ROOT / item["path"] finds
the file and _load_expected_by_task() does not lose entries by key overwrites.

Comment thread data/regression.json
Comment on lines 2 to +21
{
"path": "data/io_pipeline/regression.json",
"path": "data\\io_pipeline\\regression.json",
"task": "io_pipeline"
},
{
"path": "data/ordering_core/regression.json",
"path": "data\\io_pipeline\\regression.json",
"task": "io_pipeline"
},
{
"path": "data\\io_pipeline\\regression.json",
"task": "io_pipeline"
},
{
"path": "data\\io_pipeline\\regression.json",
"task": "io_pipeline"
},
{
"path": "data\\io_pipeline\\regression.json",
"task": "io_pipeline"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Manifest has 50 duplicate entries; should have 10 (one per task).

The manifest contains 5 identical entries per task, but each points to the same baseline file. Per the relevant code snippet from tests/test_regression.py:48-52, the test expects exactly 10 entries:

self.assertEqual(len(entries), 10)

Additionally, _load_expected_by_task() in comparison.py builds a dict keyed by task name, so duplicates are silently discarded anyway.

After fixing the overwrite bug in generate_regression_file(), the manifest should have one entry per task (pointing to a file containing all 5 seeds).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@data/regression.json` around lines 2 - 21, The manifest file contains 5
duplicate entries per task because generate_regression_file() is overwriting or
appending the same path repeatedly instead of creating a single entry per task
that references a file containing all 5 seeds; update generate_regression_file()
so it creates one manifest entry per task (not one per seed) by aggregating the
seeds into the task's baseline file and writing only a single { "path": "...",
"task": "<task>" } entry per task, ensuring the manifest length matches the test
expectation in tests/test_regression.py (len(entries) == 10) and that
_load_expected_by_task() in comparison.py receives unique task keys (no
duplicates to be silently dropped).

Comment thread data/regression.json
},
{
"path": "data/ordering_core/regression.json",
"path": "data\\io_pipeline\\regression.json",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why

Comment thread chuck/regression.py
for offset in range(SEEDS_PER_TASK):

seed = (index * 10) + offset

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please tell your reasoning behind these changes

Comment thread data/regression.json
{
"path": "data/relational_fusion/regression.json",
"path": "data\\compute_core\\regression.json",
"task": "compute_core"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chuck currently supports linux, mac and WSL. These windows style path style must be reverted

Comment thread data/regression.json
"path": "data/memory_tier/regression.json",
"path": "data\\prime_analytics\\regression.json",
"task": "prime_analytics"
},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you realise why each entry is occuring 5 time?

@Aaryan-Dadu
Copy link
Copy Markdown
Member

The CI checks are also failing, fix that too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Single Regression Seed Per Task

2 participants