Releases · ServiceNow/eva

What's Changed

Fix GitHub Pages blank page by setting Vite base path to /eva/ by @tara-servicenow in #1
Fix website: audio playback, favicon, title, and GitHub link by @tara-servicenow in #2
Fix small details on website by @tara-servicenow in #3
mobile friendliness by @katstankiewicz in #4
Remove turn-taking expander and rename early results URL by @tara-servicenow in #5
Update README.md by @gabegma in #6
mobile friendliness, architecture diagram and acknowledgements by @katstankiewicz in #7
Cleanup entry points and documentation by @gabegma in #8
Remove "Bench" from CLI logo by @JosephMarinier in #9
README + website updates by @tara-servicenow in #10
Standardize the project name by @JosephMarinier in #11
fix turn_strategy config by @katstankiewicz in #12
Update lock following python version restrictions by @gabegma in #13
Update README and add contribution guidelines, code of conduct and citation by @gabegma in #15
Pass user context in text only mode by @tara-servicenow in #16
Add fallback if end_call tool is not properly called in text flow by @gabegma in #19
Add deepgram tts to available services by @tara-servicenow in #22
Update llm set up docs by @tara-servicenow in #23
add support for openai realtime model by @katstankiewicz in #18
Judge Validation Datasets by @tara-servicenow in #25
Pr/reasoning logging by @raghavm243512 in #21
Website fixes by @tara-servicenow in #27
Improve contribution guidelines by @gabegma in #31
Reuse src/eva/cli.py in Docker by @JosephMarinier in #32
Update website with 2 new runs (cohere and voxtral) by @tara-servicenow in #37
Polish config validation by @JosephMarinier in #36
Update documentation for ElevenLabs User Simulator by @fanny-riols in #39
Improve config.json and output folder name by @gabegma in #24
make model required for all services by @katstankiewicz in #50
Merge develop/0.1.2 into main by @JosephMarinier in #48
Yield content alongside tool calls by @JosephMarinier in #49
Enable more ruff rules by @JosephMarinier in #51
Add analysis app by @gabegma in #40
Update Streamlit to 1.56 and make a few UI improvements by @JosephMarinier in #53
Fix user turn not advancing after interruption by @gabegma in #54
Add gemini live by @katstankiewicz in #38
Results analysis app: small improvements by @fanny-riols in #52
Decompose response_speed into response_speed_with_tool_calls and response_speed_no_tool_calls by @fanny-riols in #57
Remove intended assistant turns for s2s and use transcript from user simulator by @gabegma in #58
Add audio analysis support to app by @nhhoang96 in #55
Fix partial metric re-runs by @fanny-riols in #59
Minor bug fixes for analysis app by @gabegma in #61
Fix CI by @JosephMarinier in #66
add support for saving git info from docker container runs by @katstankiewicz in #67
Initial perturbations implementation by @tara-servicenow in #56
filter metrics based on pipeline type by @raghavm243512 in #63
Add new speech fidelity metric for s2s by @gabegma in #60
Support openai realtime framework (separate from pipecat) by @tara-servicenow in #44
metric computation from DB by @raghavm243512 in #62
Add transcript field to user speech fidelity by @tara-servicenow in #75
Pr/configurable turn options by @raghavm243512 in #45
Improve airline agent: enum schemas, policy consistency, same-day fee fix by @gabegma in #73
Pr/rm/reasoning trace by @raghavm243512 in #64
Fix crash in STT/TTS latency calc on TurnMetricsData entries by @gabegma in #72
Adapt judge prompts to be more generic by @gabegma in #71
Penalize no agent response by @tara-servicenow in #70
Add submetrics for more metrics by @gabegma in #65
Improve turn taking by @gabegma in #69
interleave metric and nonblock IO by @raghavm243512 in #76
Stop double sending audio to assistant during perturbations by @tara-servicenow in #78
Log LLM judge calls (prompt, input and output tokens, parameters) by @tara-servicenow in #80
ITSM Domain by @tara-servicenow in #79
Update audio judge model to gemini 3 flash due to cost constraints by @tara-servicenow in #84
Make airline instructions more specific on what needs to be explained to the user by @tara-servicenow in #82
Pr/tara/hr domain by @tara-servicenow in #46
Pr/tara/improve user behavioral fidelity by @tara-servicenow in #83
Small improvement to metrics by @gabegma in #81
Improve audio for speech fidelity by @gabegma in #85
Pr/tara/multi domain by @tara-servicenow in #86
Bump version by @tara-servicenow in #87
Add cohere stt and voxtral tts by @katstankiewicz in #33

No results found