DEMO: intentional skill regression for eval pipeline validation (DO NOT MERGE) by saurabhrb · Pull Request #61 · microsoft/Dataverse-skills

saurabhrb · 2026-05-16T01:24:59Z

Purpose

Demo branch for validating the eval pipeline catches skill regressions. DO NOT MERGE.

Recreated from a fresh branch off main (replaces closed PR #58) now that the pipeline default branch is main.

What's regressed

dv-data/SKILL.md replaces CreateMultiple bulk-create guidance with a per-record loop antipattern:

Bulk create via list-arg replaced with per-record for loop
CreateMultiple references removed
Chunking/adaptive helpers removed

How it's used

The ADO pipeline DVSkillsPlugin-Evals-PR (32010) runs against this branch. The data_003_skill_contract test asks the agent to report what the skill teaches, and NOT_CONTAINS: assertions catch the regressed content.

Expected result: 2/3 FAIL (data_003 catches the regression; data_001 and data_002 may still pass due to model prior knowledge).

saurabhrb · 2026-06-03T03:24:06Z

Closing -- validation harness / demo PR, no longer needed. Real coverage now lives in PR #70 (deterministic eval tests) and PR #68 (Cursor + auth unification).

DEMO: break bulk-create guidance (regression test target)

b01b346

saurabhrb closed this Jun 3, 2026

saurabhrb deleted the users/saurabhrb/evals-bad-skill-demo-v2 branch June 3, 2026 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DEMO: intentional skill regression for eval pipeline validation (DO NOT MERGE)#61

DEMO: intentional skill regression for eval pipeline validation (DO NOT MERGE)#61
saurabhrb wants to merge 1 commit into
mainfrom
users/saurabhrb/evals-bad-skill-demo-v2

saurabhrb commented May 16, 2026

Uh oh!

saurabhrb commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

saurabhrb commented May 16, 2026

Purpose

What's regressed

How it's used

Uh oh!

saurabhrb commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant