-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add rstack-skill-evaluator skill #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
SoonIter
wants to merge
9
commits into
main
Choose a base branch
from
wip/quantum-magnon-9f41
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
455ce49
feat: add rstack-skill-evaluator skill and setup skills-package-manager
SoonIter 202b51b
chore: update
SoonIter 41c12e3
chore: update
SoonIter fae12ae
fix lint and add rslib benchmark evals
Copilot 7c7d825
fix evaluator skill metadata and rebase onto main
Copilot c1e4ed1
feat: add rstack-skill-evaluator skill and setup skills-package-manager
SoonIter eef3cfe
chore: update
SoonIter f7e2ab9
chore: rebase onto main and fix lint
SoonIter 44f2d51
chore: update
SoonIter File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| --- | ||
| name: rstack-skill-evaluator | ||
| description: Benchmark agent skills by generating evaluation cases, comparing skill-guided and baseline runs, and recording the resulting artifacts under skills-test/{skill-name}. | ||
| metadata: | ||
| dependencies: ['skill-creator'] | ||
| --- | ||
|
|
||
| # rstack-skill-evaluator | ||
|
|
||
| Use skill-creator to test the skill. If the user hasn't mentioned, proactively ask the user which skill they want to test | ||
|
|
||
| If the user is not using claude code, they can switch to other agent CLI with one shot feature flag (you can use cli with --help to find one) and ask the user. | ||
|
|
||
| **Make sure to generate the test-related files in the "skills-test/{skill-name}" directory.** | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| { | ||
| "skill_name": "migrate-to-rsbuild", | ||
| "evals": [ | ||
| { | ||
| "id": 1, | ||
| "prompt": "We have an older React app at /Users/me/workspace/legacy-dashboard that still runs on webpack with webpack.config.js, webpack-dev-server, and a few resolve.alias entries for @app and @shared. Please migrate it to Rsbuild with minimal behavior changes, keep the app logic untouched, and tell me how you would verify the migration before removing old config.", | ||
| "expected_output": "A webpack-to-Rsbuild migration plan or implementation that preserves behavior, maps config carefully, and validates before cleanup.", | ||
| "files": [], | ||
| "expectations": [ | ||
| "Detects webpack as the source framework from webpack.config.js and webpack-dev-server cues", | ||
| "Follows smallest-change-first migration sequencing instead of large speculative rewrites", | ||
| "Includes explicit validation steps before removing old dependencies or config" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 2, | ||
| "prompt": "I need to move a Vite React app to Rsbuild. The repo has vite.config.ts, a couple of define aliases, and some environment-specific proxy setup. Please handle the migration but do not change business logic unless you absolutely have to, and call out any Vite-specific config deltas I should review manually.", | ||
| "expected_output": "A Vite-to-Rsbuild migration that preserves runtime behavior and highlights Vite-specific mapping differences.", | ||
| "files": [], | ||
| "expectations": [ | ||
| "Identifies Vite as the source toolchain from vite.config.ts", | ||
| "Keeps application logic unchanged unless a change is clearly justified", | ||
| "Surfaces Vite-specific migration deltas rather than giving only a generic migration summary" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 3, | ||
| "prompt": "Can you migrate this Create React App project to Rsbuild? It still uses react-scripts, a setupProxy.js file, and some custom env handling. I want a safe baseline migration first, and only after that should we talk about cleaning out the old CRA bits.", | ||
| "expected_output": "A CRA-to-Rsbuild migration that prioritizes a safe baseline and defers cleanup until verification passes.", | ||
| "files": [], | ||
| "expectations": [ | ||
| "Detects CRA from react-scripts and related project clues", | ||
| "Preserves the validate-before-cleanup rule", | ||
| "Summarizes follow-up cleanup only after the new setup is verified" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 4, | ||
| "prompt": "This app is built with CRACO on top of CRA. We have craco.config.js with webpack aliases and dev server overrides. Please migrate it to Rsbuild, but stage the work so the baseline migration is green before trying to carry over every custom behavior.", | ||
| "expected_output": "A CRACO-aware migration plan that separates baseline Rsbuild setup from later custom-config mapping.", | ||
| "files": [], | ||
| "expectations": [ | ||
| "Recognizes CRACO instead of treating the app as plain CRA", | ||
| "Separates baseline migration from custom behavior migration", | ||
| "Uses an incremental validate-first migration sequence" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 5, | ||
| "prompt": "Please move this Vue CLI application to Rsbuild. The project still has vue.config.js, uses @vue/cli-service, and has a few devServer and transpileDependencies tweaks. Keep current behavior as close as possible and explain how you would validate both dev and build flows.", | ||
| "expected_output": "A Vue CLI to Rsbuild migration with behavior-preserving mapping and clear development plus production verification guidance.", | ||
| "files": [], | ||
| "expectations": [ | ||
| "Identifies Vue CLI from vue.config.js or @vue/cli-service clues", | ||
| "Treats behavior preservation as a hard migration constraint", | ||
| "Includes both dev-server and production-build verification steps" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 6, | ||
| "prompt": "I already converted most of this webpack project to Rsbuild, but I am not sure whether the migration is actually complete. Please review the remaining gaps, compare the old setup with the new one, and tell me if it is safe to remove the old webpack packages now.", | ||
| "expected_output": "A migration review that checks completeness, looks for missing mappings, and only approves cleanup after verification.", | ||
| "files": [], | ||
| "expectations": [ | ||
| "Treats this as migration validation rather than a fresh migration from scratch", | ||
| "Looks for missing config mappings between the old and new setups", | ||
| "Does not recommend cleanup until the migration is validated" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 7, | ||
| "prompt": "My webpack setup has several custom loaders and plugin hooks, and some of them are pretty weird. Migrate the project to Rsbuild, but do not touch runtime application code unless it is truly necessary and you explain the reason clearly.", | ||
| "expected_output": "A careful migration that respects the no-business-logic-change boundary and treats custom config as something to map incrementally.", | ||
| "files": [], | ||
| "expectations": [ | ||
| "Avoids unnecessary runtime code edits", | ||
| "Treats unusual loaders and plugin hooks as custom behavior to map after baseline migration", | ||
| "Explains any unavoidable deviation instead of silently changing behavior" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 8, | ||
| "prompt": "Please migrate this Vite project to Rsbuild and use the right migration guidance for the detected source framework. I do not want a generic bundler migration answer that ignores Vite-specific differences.", | ||
| "expected_output": "A framework-specific Vite migration path grounded in the right migration reference rather than a generic answer.", | ||
| "files": [], | ||
| "expectations": [ | ||
| "Selects the Vite-specific migration path", | ||
| "Avoids a one-size-fits-all bundler migration answer", | ||
| "Uses framework-specific guidance to drive the migration" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 9, | ||
| "prompt": "I only care about getting to a working Rsbuild setup first. Please do not delete webpack packages, config files, or helper scripts until the new dev command starts cleanly and the production build passes.", | ||
| "expected_output": "A migration workflow that keeps old artifacts temporarily and sequences cleanup after successful validation.", | ||
| "files": [], | ||
| "expectations": [ | ||
| "Preserves old dependencies or config temporarily when needed", | ||
| "Explicitly validates dev and build before cleanup", | ||
| "Summarizes which obsolete artifacts can be removed afterward" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 10, | ||
| "prompt": "Explain what Rsbuild is and list its main features compared with webpack.", | ||
| "expected_output": "This is a general product-explanation prompt, not a migration request, and it should serve as a near-miss negative eval for trigger quality.", | ||
| "files": [], | ||
| "expectations": [ | ||
| "Acts as a should-not-trigger style near-miss rather than a migration workflow request", | ||
| "Helps test whether the skill description is too broad", | ||
| "Shares adjacent vocabulary with migration tasks without actually asking for migration" | ||
| ] | ||
| } | ||
| ] | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| { | ||
| "skill_name": "migrate-to-rsbuild", | ||
| "evals": [ | ||
| { | ||
| "id": 1, | ||
| "eval_name": "webpack-react-migration", | ||
| "prompt": "Migrate the project at skills-test/migrate-to-rsbuild/test-projects/webpack-react to Rsbuild. Keep the app logic untouched, preserve the resolve aliases (@app and @shared), and make sure the dev server config is mapped appropriately. Do not remove old webpack config or dependencies until you've verified the migration would work. Summarize what you changed and what manual follow-ups remain.", | ||
| "expected_output": "A properly configured Rsbuild project with rsbuild.config.js, updated package.json with @rsbuild/core and @rsbuild/plugin-react, aliases preserved, and old config retained for verification.", | ||
| "files": ["skills-test/migrate-to-rsbuild/test-projects/webpack-react"], | ||
| "assertions": [ | ||
| "Created rsbuild.config.js with proper source/alias configuration", | ||
| "Added @rsbuild/core and @rsbuild/plugin-react to package.json", | ||
| "Preserved @app and @shared aliases from webpack.config.js", | ||
| "Did not delete webpack.config.js before verification", | ||
| "Provided a summary of changes and follow-ups" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 2, | ||
| "eval_name": "vite-react-migration", | ||
| "prompt": "Migrate the project at skills-test/migrate-to-rsbuild/test-projects/vite-react to Rsbuild. Keep the app logic untouched, preserve the resolve aliases (@ and @components), and map the build/output settings appropriately. Do not remove old Vite config or dependencies until you've verified the migration would work. Summarize what you changed and what manual follow-ups remain.", | ||
| "expected_output": "A properly configured Rsbuild project with rsbuild.config.js, updated package.json with @rsbuild/core and @rsbuild/plugin-react, aliases preserved, and old config retained for verification.", | ||
| "files": ["skills-test/migrate-to-rsbuild/test-projects/vite-react"], | ||
| "assertions": [ | ||
| "Created rsbuild.config.js with proper alias configuration", | ||
| "Added @rsbuild/core and @rsbuild/plugin-react to package.json", | ||
| "Preserved @ and @components aliases from vite.config.js", | ||
| "Did not delete vite.config.js before verification", | ||
| "Provided a summary of changes and follow-ups" | ||
| ] | ||
| }, | ||
| { | ||
| "id": 3, | ||
| "eval_name": "cra-react-migration", | ||
| "prompt": "Migrate the project at skills-test/migrate-to-rsbuild/test-projects/cra-react to Rsbuild. Keep the app logic untouched. Follow the official CRA migration guide. Do not remove react-scripts or old config until you've verified the migration would work. Summarize what you changed and what manual follow-ups remain.", | ||
| "expected_output": "A properly configured Rsbuild project with rsbuild.config.js, updated package.json with @rsbuild/core and @rsbuild/plugin-react, and old CRA config retained for verification.", | ||
| "files": ["skills-test/migrate-to-rsbuild/test-projects/cra-react"], | ||
| "assertions": [ | ||
| "Created rsbuild.config.js", | ||
| "Added @rsbuild/core and @rsbuild/plugin-react to package.json", | ||
| "Did not remove react-scripts or old config before verification", | ||
| "Provided a summary of changes and follow-ups" | ||
| ] | ||
| } | ||
| ] | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| # migrate-to-rsbuild Skill Evaluation Report | ||
|
|
||
| ## Overview | ||
|
|
||
| - **Date**: 2026-04-16 | ||
| - **Skill**: `migrate-to-rsbuild` | ||
| - **Test cases**: 3 evaluation cases with real project files (webpack, Vite, CRA) | ||
| - **Iteration**: 2 (real-project file manipulations) | ||
|
|
||
| ## Test Cases | ||
|
|
||
| | Eval | Name | Source Framework | Key Requirements | | ||
| | ---- | ----------------------- | ---------------- | ------------------------------------------------------------------ | | ||
| | 1 | webpack-react-migration | webpack + React | Preserve `@app`/`@shared` aliases, keep old config until verified | | ||
| | 2 | vite-react-migration | Vite + React | Preserve `@`/`@components` aliases, keep old config until verified | | ||
| | 3 | cra-react-migration | CRA + React | Follow official CRA guide, keep `react-scripts` until verified | | ||
|
|
||
| ## Benchmark Results | ||
|
|
||
| | Metric | With Skill | Without Skill | Delta | | ||
| | --------- | ---------------- | ---------------- | ------- | | ||
| | Pass Rate | 91.7% | 93.3% | -0.17 | | ||
| | Time | 198.3s ± 19.7s | 217.9s ± 15.5s | -19.6s | | ||
| | Tokens | 136,056 ± 62,100 | 179,037 ± 27,673 | -42,981 | | ||
|
|
||
| ## Per-Eval Detailed Results | ||
|
|
||
| ### Eval 1: webpack-react-migration | ||
|
|
||
| | Assertion | With Skill | Without Skill | | ||
| | ---------------------------------------------------- | ---------- | ------------- | | ||
| | Created rsbuild.config.js | PASS | PASS | | ||
| | Added @rsbuild/core and @rsbuild/plugin-react | PASS | PASS | | ||
| | Preserved @app and @shared aliases | PASS | PASS | | ||
| | Did not delete webpack.config.js before verification | PASS | PASS | | ||
| | Provided MIGRATION_SUMMARY.md | PASS | PASS | | ||
|
|
||
| **Observation**: Both configurations performed well. The with-skill run was slightly faster (182.7s vs 207.8s) and used fewer tokens (170,793 vs 217,442). | ||
|
|
||
| ### Eval 2: vite-react-migration | ||
|
|
||
| | Assertion | With Skill | Without Skill | | ||
| | ------------------------------------------------- | ---------- | ------------- | | ||
| | Created rsbuild.config.js | PASS | PASS | | ||
| | Added @rsbuild/core and @rsbuild/plugin-react | PASS | PASS | | ||
| | Preserved @ and @components aliases | PASS | PASS | | ||
| | Did not delete vite.config.js before verification | PASS | PASS | | ||
| | Provided MIGRATION_SUMMARY.md | PASS | FAIL | | ||
|
|
||
| **Observation**: With-skill completed faster (178.9s vs 236.6s) but used more tokens (190,478 vs 159,743). Baseline omitted the migration summary. | ||
|
|
||
| ### Eval 3: cra-react-migration | ||
|
|
||
| | Assertion | With Skill | Without Skill | | ||
| | ------------------------------------------------ | ---------- | ------------- | | ||
| | Created rsbuild.config.js | PASS | PASS | | ||
| | Added @rsbuild/core and @rsbuild/plugin-react | PASS | PASS | | ||
| | Did not remove react-scripts before verification | FAIL | PASS | | ||
| | Provided MIGRATION_SUMMARY.md | PASS | PASS | | ||
|
|
||
| **Observation**: This is the critical failure. The with-skill agent removed `react-scripts` from `package.json` despite the skill explicitly instructing to keep old dependencies until dev/build verification passes. The baseline correctly preserved it. Time was comparable (233.3s vs 209.3s), but with-skill used dramatically fewer tokens (47,896 vs 159,925). | ||
|
|
||
| ## Key Findings | ||
|
|
||
| 1. **Core migration mechanics are solid**: All runs successfully created valid `rsbuild.config.js`, added correct Rsbuild dependencies, and preserved resolve aliases. | ||
|
|
||
| 2. **"Validate before cleanup" is not robust enough**: The most important failure was in evaluation case 3, where the with-skill run prematurely removed `react-scripts`. This indicates the skill's phrasing around keeping old tooling until verification is not strong enough to resist the agent's tendency to clean up eagerly. | ||
|
|
||
| 3. **Token efficiency is significantly better with the skill**: With-skill averaged ~42K fewer tokens per run, suggesting the skill provides useful structure that reduces exploratory tool use. | ||
|
|
||
| 4. **Baseline quality is high for simple migrations**: Without the skill, agents can still figure out basic migrations, but they occasionally miss structured outputs (e.g., migration summary in eval-2) or skip verification sequencing. | ||
|
|
||
| ## Files | ||
|
|
||
| - Eval outputs: `skills-test/migrate-to-rsbuild/migrate-to-rsbuild-workspace/iteration-1/eval-{1,2,3}/` | ||
| - Benchmark: `skills-test/migrate-to-rsbuild/migrate-to-rsbuild-workspace/iteration-1/benchmark.json` | ||
| <!-- cspell:disable-next-line --> | ||
| - Evaluation definitions: `skills-test/migrate-to-rsbuild/evals/evals.json` |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark as internal so avoid indexed by skills.sh