Skip to content

Releases: Corbell-AI/evalmonkey

v1.4.0

02 Jun 06:48
1262971

Choose a tag to compare

What's Changed

  • feat: regression guard, agent card, recommend command, external datasets by @himmi-01 in #18 Like : EvalMonkey Score

Full Changelog: v1.3.0...v1.4.0

v1.3.0

28 May 04:03
0a016be

Choose a tag to compare

What's Changed

  • feat: regression guard, agent card, recommend command, external datasets by @himmi-01 in #14

Full Changelog: v1.2.0...v1.3.0

v1.2.0

27 May 03:25
788fcfd

Choose a tag to compare

What's Changed

  • Add voice agents support into EvalMonkey by @himmi-01 in #13

Full Changelog: v1.1.1...v1.2.0

v1.1.1

23 May 19:58
16f4505

Choose a tag to compare

What's Changed

  • feat: add lightweight coding agent sample app with chaos profiles by @himmi-01 in #9
  • fix: stop test suite hanging + add coding agent demo script by @himmi-01 in #10
  • docs: add real-world benchmark leaderboard for 10 open-source agents by @himmi-01 in #11

Full Changelog: v1.1.0...v1.1.1

v1.1.0

20 May 04:42
7cb5a57

Choose a tag to compare

What's Changed

Full Changelog: v1.0.2...v1.1.0

v1.0.2

16 May 22:52
a515c89

Choose a tag to compare

What's Changed

  • docs: add web dashboard screenshots to README by @himmi-01 in #5
  • feat: add evalmonkey generate-ci command for easy GitHub Actions setup by @himmi-01 in #6

Full Changelog: v1.0.1...v1.0.2

v1.0.1

09 May 06:22
d91c46d

Choose a tag to compare

What's Changed

  • feat: add framework adapters for LangGraph, LlamaIndex, and PydanticAI by @himmi-01 in #4

Full Changelog: v1.0.0...v1.0.1

v1.0.0

06 May 05:56
1ad2b0e

Choose a tag to compare

What's Changed

  • feat: evalmonkey web ui and benchmark stability fixes by @himmi-01 in #3
Screenshot 2026-05-05 at 10 48 33 PM Screenshot 2026-05-05 at 10 48 46 PM Screenshot 2026-05-05 at 10 49 01 PM

Full Changelog: v0.1.3...v1.0.0

v0.1.3

03 May 20:14
82a5925

Choose a tag to compare

implement automated eval asset generation and improvement prompts for failed benchmark traces ea99606
optimize benchmark loading by enabling streaming mode and add testing/inspection utilities 88a8fb5
strip markdown code fences from LLM judge responses 46be866
remove webarena benchmark 179d5ab

v0.1.2

26 Apr 01:15

Choose a tag to compare