Skip to content

DEMO: intentional skill regression for eval pipeline validation (DO NOT MERGE)#61

Closed
saurabhrb wants to merge 1 commit into
mainfrom
users/saurabhrb/evals-bad-skill-demo-v2
Closed

DEMO: intentional skill regression for eval pipeline validation (DO NOT MERGE)#61
saurabhrb wants to merge 1 commit into
mainfrom
users/saurabhrb/evals-bad-skill-demo-v2

Conversation

@saurabhrb

Copy link
Copy Markdown
Contributor

Purpose

Demo branch for validating the eval pipeline catches skill regressions. DO NOT MERGE.

Recreated from a fresh branch off main (replaces closed PR #58) now that the pipeline default branch is main.

What's regressed

dv-data/SKILL.md replaces CreateMultiple bulk-create guidance with a per-record loop antipattern:

  • Bulk create via list-arg replaced with per-record for loop
  • CreateMultiple references removed
  • Chunking/adaptive helpers removed

How it's used

The ADO pipeline DVSkillsPlugin-Evals-PR (32010) runs against this branch. The data_003_skill_contract test asks the agent to report what the skill teaches, and NOT_CONTAINS: assertions catch the regressed content.

Expected result: 2/3 FAIL (data_003 catches the regression; data_001 and data_002 may still pass due to model prior knowledge).

@saurabhrb

Copy link
Copy Markdown
Contributor Author

Closing -- validation harness / demo PR, no longer needed. Real coverage now lives in PR #70 (deterministic eval tests) and PR #68 (Cursor + auth unification).

@saurabhrb saurabhrb closed this Jun 3, 2026
@saurabhrb saurabhrb deleted the users/saurabhrb/evals-bad-skill-demo-v2 branch June 3, 2026 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant