Releases · Corbell-AI/evalmonkey

feat: add lightweight coding agent sample app with chaos profiles by @himmi-01 in #9
fix: stop test suite hanging + add coding agent demo script by @himmi-01 in #10
docs: add real-world benchmark leaderboard for 10 open-source agents by @himmi-01 in #11

Full Changelog: v1.1.0...v1.1.1

Contributors

himmi-01

Assets 2

20 May 04:42

himmi-01

v1.1.0

7cb5a57

v1.1.0

What's Changed

Add coding agent benchmarks by @himmi-01 in #8

Full Changelog: v1.0.2...v1.1.0

Contributors

himmi-01

Assets 2

16 May 22:52

himmi-01

v1.0.2

a515c89

v1.0.2

What's Changed

docs: add web dashboard screenshots to README by @himmi-01 in #5
feat: add evalmonkey generate-ci command for easy GitHub Actions setup by @himmi-01 in #6

Full Changelog: v1.0.1...v1.0.2

Contributors

himmi-01

Assets 2

09 May 06:22

himmi-01

v1.0.1

d91c46d

v1.0.1

What's Changed

feat: add framework adapters for LangGraph, LlamaIndex, and PydanticAI by @himmi-01 in #4

Full Changelog: v1.0.0...v1.0.1

Contributors

himmi-01

Assets 2

06 May 05:56

himmi-01

v1.0.0

1ad2b0e

v1.0.0

What's Changed

feat: evalmonkey web ui and benchmark stability fixes by @himmi-01 in #3

Full Changelog: v0.1.3...v1.0.0

Contributors

himmi-01

Assets 2

03 May 20:14

himmi-01

v0.1.3

82a5925

v0.1.3

implement automated eval asset generation and improvement prompts for failed benchmark traces ea99606
optimize benchmark loading by enabling streaming mode and add testing/inspection utilities 88a8fb5
strip markdown code fences from LLM judge responses 46be866
remove webarena benchmark 179d5ab

Assets 2

26 Apr 01:15

himmi-01

v0.1.2

2e95a53

v0.1.2

add 5 chaos injection profiles and expand benchmark suite to 20 datasets with updated documentation and tests
add five new server-side chaos profiles and corresponding integration tests to evaluate agent resilience.

Assets 2

Releases: Corbell-AI/evalmonkey

v1.4.0

What's Changed

Contributors

Uh oh!

v1.3.0

What's Changed

Contributors

Uh oh!

v1.2.0

What's Changed

Contributors

Uh oh!

v1.1.1

What's Changed

Contributors

Uh oh!

v1.1.0

What's Changed

Contributors

Uh oh!

v1.0.2

What's Changed

Contributors

Uh oh!

v1.0.1

What's Changed

Contributors

Uh oh!

v1.0.0

What's Changed

Contributors

Uh oh!

v0.1.3

Uh oh!

v0.1.2

Uh oh!