fix: synthetic dataset generation for ≥200 rows by Copilot · Pull Request #54 · Icar0S/SmartDataTest

Copilot · 2026-03-11T03:39:47Z

Gemini (free tier) reliably generates ≤100 rows per call but returns only ~71–76% of rows for larger requests, exhausting all retries. A second bug caused the exhausted-retry path (no exception) to return [] instead of data, producing "Batch 1 complete: 0/200 total rows".

Changes

`src/synthetic/generator.py`

Auto sub-batching: requests above _LLM_MAX_ROWS_PER_CALL = 100 are split into sequential sub-batches of ≤100 rows. A 200-row request now makes 2 reliable calls instead of 3 failing ones, reducing quota burn.
Fixed empty-return bug: tracks best_rows across retry attempts; when the retry loop exhausts without an exception, fills from the best partial LLM result + mock data rather than returning [].

# Before — silent data loss
return [], logs  # hit when all retries got < 80% rows with no exception

# After — always returns requested count
if best_rows:
    records = self._coerce_types(best_rows, schema)
    ...
    if len(records) < num_rows:
        records.extend(self._generate_mock_data(schema, num_rows - len(records)))
    return records[:num_rows], logs
return self._generate_mock_data(schema, num_rows), logs

`tests/backend/api/test_synthetic_backend.py`

Added TestGenerateBatchLargeRowHandling with 6 tests covering: sub-batch triggering, single-call at threshold, mock-fill fallback, no-empty-return guarantee, and best-partial-rows retention.

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

vercel · 2026-03-11T03:39:51Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
data-forge-test	Ready	Preview, Comment	Mar 11, 2026 3:48am

…n bug Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>

github-actions · 2026-03-11T17:02:11Z

🧪 PR Test Summary

Suite	Status
Backend — Unit	✅ success
Backend — API	✅ success
Backend — Security	✅ success
Frontend — Coverage	✅ success

Coverage thresholds: Statements 80% · Branches 70% · Functions 75% · Lines 80%

📦 View artifacts

github-actions · 2026-03-11T17:03:23Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
1168	983	84%	80%	🟢

New Files

No new covered files...

Modified Files

No covered modified files...

updated for commit: bb66b01 by action🐍

Initial plan

4d68ae0

Copilot AI assigned Copilot and Icar0S Mar 11, 2026

Copilot started work on behalf of Icar0S March 11, 2026 03:39 View session

vercel bot deployed to Preview March 11, 2026 03:40 View deployment

Fix generate_batch for >=200 rows: auto sub-batch and fix empty-retur…

bb66b01

…n bug Co-authored-by: Icar0S <39846852+Icar0S@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Investigate dataset generation issue for 200 rows or more~~ Fix synthetic dataset generation for ≥200 rows Mar 11, 2026

vercel bot deployed to Preview March 11, 2026 03:48 View deployment

Copilot finished work on behalf of Icar0S March 11, 2026 03:48

Icar0S changed the title ~~Fix synthetic dataset generation for ≥200 rows~~ fix: synthetic dataset generation for ≥200 rows Mar 11, 2026

Icar0S marked this pull request as ready for review March 11, 2026 23:03

Icar0S merged commit c176a2a into main Mar 11, 2026
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: synthetic dataset generation for ≥200 rows#54

fix: synthetic dataset generation for ≥200 rows#54
Icar0S merged 2 commits intomainfrom
copilot/investigate-dataset-generation-issue

Copilot AI commented Mar 11, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

src/synthetic/generator.py

tests/backend/api/test_synthetic_backend.py

Uh oh!

vercel bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 PR Test Summary

Uh oh!

github-actions bot commented Mar 11, 2026

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 11, 2026 •

edited

Loading

`src/synthetic/generator.py`

`tests/backend/api/test_synthetic_backend.py`

vercel bot commented Mar 11, 2026 •

edited

Loading

github-actions bot commented Mar 11, 2026 •

edited

Loading