Skip to content

feat: Add NL2Query quality test runner#3116

Open
languy wants to merge 23 commits into
mainfrom
dev/languy/add-nl2query-quality-test
Open

feat: Add NL2Query quality test runner#3116
languy wants to merge 23 commits into
mainfrom
dev/languy/add-nl2query-quality-test

Conversation

@languy

@languy languy commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Introduce a test runner that takes test spec, schema as input and produces a quality report md file as output.
Test runs the same function calls used in NL2Query for eatch test spec.
The report includes statistics on time measurement, token count (input and output) and grading scores.
The test spec specifies the purpose of the test, prompt and expected query and we use AI to grade the actual query against the expected result.
The runner is designed to be lightweight and generic (user provides test spec).
The runner needs to be executed manually at runtime by the user who must be signed in github, in order to use gh copilot.

  • Can be invoked with runNl2QueryQualityTest command. Command is only available in debug mode.
  • User selects LLM model to test
  • User selects LLM model that grades the output
  • User selects number of test runs

@languy languy changed the title feat: NL2Query quality test feat: Add NL2Query quality test runner Jun 3, 2026
Comment thread src/commands/nl2queryQualityTest.ts Dismissed
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

🎉 Build Summary

🔗 Source

📦 Package Information

🧪 Test Results

  • Unit Tests: ✅ success
  • Integration Tests: ✅ success

✅ Build Status

All checks completed successfully!

@languy languy marked this pull request as ready for review June 5, 2026 15:21
@languy languy requested a review from a team as a code owner June 5, 2026 15:21
@languy languy marked this pull request as draft June 5, 2026 15:23
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🎭 E2E Tests (Playwright + VS Code)

Commit: 2202ab3
Pull Request: #3116 feat: Add NL2Query quality test runner

🧪 Result

  • E2E Tests: ✅ success

📥 Artifacts (run)

Tip: the HTML report artifact contains a self-contained Playwright report.
Download the zip, extract, and open index.html — or run
npx playwright show-report <extracted-dir> for the interactive view.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🔨 Build, Lint & Test

🔗 Source

📦 Package Information

🧪 Test Results

  • Unit Tests: ✅ success
  • Integration Tests (extension host): ✅ success

📥 Artifacts (run)

✅ Build Status

Build and local tests passed. See sibling comments below for E2E and NoSQL integration results.

@languy languy marked this pull request as ready for review June 10, 2026 13:02
@languy languy requested a review from Copilot June 11, 2026 06:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a manual NL2Query quality test runner to help evaluate generateQuery behavior across a user-provided test spec and schema, producing a Markdown report with timing/token/grade statistics. This fits as a dev-only workflow tool for validating NL2Query prompt/pipeline quality during development.

Changes:

  • Adds a dev-only VS Code command (cosmosDB.dev.runNl2QueryQualityTest) to run NL2Query quality tests and generate a Markdown report.
  • Adds documentation and gitignore scaffolding for the quality test suite under test/quality/nl2query/.
  • Wires command visibility via a dev-mode context key (cosmosDB.devMode) and package contributions.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
test/quality/nl2query/README.md Adds instructions and structure for running the NL2Query quality suite.
test/quality/nl2query/.gitignore Ignores generated reports under results/.
src/extension.ts Registers a dev-only context + dynamically imports the quality test command in Development mode.
src/commands/nl2queryQualityTest.ts Implements the interactive runner, batch grading, and Markdown report generation.
package.json Contributes the new dev command and gates it in the Command Palette with cosmosDB.devMode.
package-lock.json Lockfile update (platform metadata normalization).

Comment thread test/quality/nl2query/README.md Outdated
Comment thread src/commands/nl2queryQualityTest.ts
Comment thread src/commands/nl2queryQualityTest.ts
Comment thread src/commands/nl2queryQualityTest.ts Outdated
Comment thread src/commands/nl2queryQualityTest.ts Outdated
Comment thread src/commands/nl2queryQualityTest.ts Outdated
Comment thread src/commands/nl2queryQualityTest.ts Outdated
Comment thread src/commands/nl2queryQualityTest.ts Outdated
Comment thread src/commands/nl2queryQualityTest.ts
bk201-
bk201- previously approved these changes Jun 11, 2026
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
languy and others added 3 commits June 11, 2026 17:48
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
languy and others added 5 commits June 11, 2026 17:50
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Comment thread src/commands/nl2queryQualityTest.ts Fixed
Comment thread src/commands/nl2queryQualityTest.ts Fixed
languy and others added 2 commits June 11, 2026 18:37
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants