Certy GH-600 - Evaluation, Debugging and Iteration Lab

A public, fully runnable and tested TypeScript training lab for GH-600 domain 4.0: Evaluation, Debugging and Iteration.

Most agent courses skip evaluation entirely. This one does not. Everything here is deterministic - no network, no API keys, no LLM calls - so the same dataset always produces the same score, and the lab runs identically on your machine and in CI.

What is inside

evals/
  datasets/      issue-triage, code-review and safety cases as JSONL (input + expected)
  rubrics/       the matching rubric for each dataset
  runners/       run-eval.ts (deterministic classifier + scorer) and score-result.ts
  reports/       a sample eval report
traces/          passing, failing and tool-call agent traces (JSON)
debugging/       trace-reader.ts, failure-analysis.md, iteration-log.md
tests/           Node test runner suites for the scorer, trace reader and eval runner
labs/            eight hands on labs

Requirements

Node.js 22 or newer (the CI pins Node 22).
npm.

Install

npm install --no-fund --no-audit

Run the evaluation

npm run eval

This reads each JSONL dataset, runs a deterministic rule based classifier over every case, scores the prediction against the expected label, prints a per case PASS/FAIL breakdown plus a mean score against the threshold, and exits 0. See evals/reports/sample-eval-report.md for example output.

Run a single dataset:

node --import tsx evals/runners/run-eval.ts safety

Run the tests

npm test

The suite verifies that the scorer returns correct pass/fail/score values, that the trace reader finds the first failing step in traces/failing-trace.json, and that the eval runner's classifiers score known cases correctly.

Type check

npm run typecheck

Continuous integration

.github/workflows/agent-evals.yml runs install, npm test and npm run typecheck on every push and pull request to main. Dependabot keeps npm and GitHub Actions current.

Start here

Work through labs/README.md - eight labs that take you from authoring a dataset to writing an evidence backed iteration report.

Links

Certy: https://certy.pro
CertyPro on GitHub: https://github.com/CertyPro
Course content: https://github.com/CertyPro/certy-gh600-course-content

Licence

MIT. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Certy GH-600 - Evaluation, Debugging and Iteration Lab

What is inside

Requirements

Install

Run the evaluation

Run the tests

Type check

Continuous integration

Start here

Links

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
debugging		debugging
evals		evals
labs		labs
tests		tests
traces		traces
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Certy GH-600 - Evaluation, Debugging and Iteration Lab

What is inside

Requirements

Install

Run the evaluation

Run the tests

Type check

Continuous integration

Start here

Links

Licence

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages