Jido Eval

Jido Eval is an Elixir package for evaluating LLM applications in the Jido ecosystem. The current core is a Ragas-like harness for dataset-based evaluation with structured judge calls through req_llm and model metadata from llm_db.

Agentic evals are expected to build on this foundation later. This package keeps the basic evaluation layer small: load samples, run metrics, preserve per-metric judge metadata, and return an auditable Jido.Eval.Result.

Installation

Add jido_eval to your dependencies:

def deps do
  [
    {:jido_eval, "~> 0.1.0"}
  ]
end

Then fetch dependencies:

mix deps.get

Jido Eval does not currently ship an Igniter installer because the package has no required config files, migrations, or scaffolding side effects. Add provider keys through req_llm configuration or environment variables.

Quick Start

alias Jido.Eval
alias Jido.Eval.Dataset.InMemory
alias Jido.Eval.Sample.SingleTurn

samples = [
  %SingleTurn{
    user_input: "What is the capital of France?",
    retrieved_contexts: ["France's capital is Paris."],
    response: "Paris is the capital of France."
  }
]

{:ok, dataset} = InMemory.new(samples)

{:ok, result} =
  Eval.evaluate(dataset,
    metrics: [:faithfulness, :context_precision],
    judge_model: "openai:gpt-4o",
    judge_opts: [temperature: 0.0]
  )

result.summary_stats

llm: and llm_opts: remain accepted compatibility aliases, but new code should use judge_model: and judge_opts:.

Built-In Metrics

:faithfulness extracts factual statements from a response and checks whether each statement is supported by the retrieved contexts.
:context_precision checks whether retrieved contexts are relevant to the input and computes average precision over relevant context positions.

Both built-in metrics use structured req_llm object calls so result details include schema-validated booleans, reasoning, judge-call summaries, usage, finish reason, provider metadata, latency, and cache-hit state.

Model Specs

Pass judge models using req_llm model specs:

"openai:gpt-4o"
"anthropic:claude-3-5-sonnet-20241022"
LLMDB.model!("openai:gpt-4o")

Jido Eval deliberately passes model specs through to req_llm and llm_db directly instead of maintaining a parallel model map layer.

Live Evals

Live evals are excluded from the default test suite. To run them, create a local .env with provider keys:

OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...

Then run:

mix test --include live_eval

Development

mix setup
mix test
mix quality
mix coveralls
mix docs

mix quality runs the Jido package quality gate: formatting, strict compile, Credo, Dialyzer, and MixDoctor.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
config		config
examples		examples
lib		lib
test		test
.credo.exs		.credo.exs
.doctor.exs		.doctor.exs
.formatter.exs		.formatter.exs
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
coveralls.json		coveralls.json
mix.exs		mix.exs
mix.lock		mix.lock
usage-rules.md		usage-rules.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jido Eval

Installation

Quick Start

Built-In Metrics

Model Specs

Live Evals

Development

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Jido Eval

Installation

Quick Start

Built-In Metrics

Model Specs

Live Evals

Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages