Releases: Corbell-AI/evalmonkey
Releases · Corbell-AI/evalmonkey
v1.4.0
02 Jun 06:48
Compare
Sorry, something went wrong.
No results found
What's Changed
feat: regression guard, agent card, recommend command, external datasets by @himmi-01 in #18 Like :
Full Changelog : v1.3.0...v1.4.0
v1.3.0
28 May 04:03
Compare
Sorry, something went wrong.
No results found
What's Changed
feat: regression guard, agent card, recommend command, external datasets by @himmi-01 in #14
Full Changelog : v1.2.0...v1.3.0
v1.2.0
27 May 03:25
Compare
Sorry, something went wrong.
No results found
v1.1.1
23 May 19:58
Compare
Sorry, something went wrong.
No results found
What's Changed
feat: add lightweight coding agent sample app with chaos profiles by @himmi-01 in #9
fix: stop test suite hanging + add coding agent demo script by @himmi-01 in #10
docs: add real-world benchmark leaderboard for 10 open-source agents by @himmi-01 in #11
Full Changelog : v1.1.0...v1.1.1
v1.1.0
20 May 04:42
Compare
Sorry, something went wrong.
No results found
v1.0.2
16 May 22:52
Compare
Sorry, something went wrong.
No results found
What's Changed
docs: add web dashboard screenshots to README by @himmi-01 in #5
feat: add evalmonkey generate-ci command for easy GitHub Actions setup by @himmi-01 in #6
Full Changelog : v1.0.1...v1.0.2
v1.0.1
09 May 06:22
Compare
Sorry, something went wrong.
No results found
What's Changed
feat: add framework adapters for LangGraph, LlamaIndex, and PydanticAI by @himmi-01 in #4
Full Changelog : v1.0.0...v1.0.1
v1.0.0
06 May 05:56
Compare
Sorry, something went wrong.
No results found
v0.1.3
03 May 20:14
Compare
Sorry, something went wrong.
No results found
implement automated eval asset generation and improvement prompts for failed benchmark traces ea99606
optimize benchmark loading by enabling streaming mode and add testing/inspection utilities 88a8fb5
strip markdown code fences from LLM judge responses 46be866
remove webarena benchmark 179d5ab
v0.1.2
26 Apr 01:15
Compare
Sorry, something went wrong.
No results found