Readers and writers for every input format EVALSIG accepts. Each reader
returns a RunFrame.
from evalsig.io import register_reader, get_reader, available_formatsregister_reader(name, reader)-- add a new format under a short name.get_reader(name)-- look one up.available_formats()-- list everything currently registered.
The built-in registrations happen at import time: runframe, lm_eval,
inspect, helm, parquet.
EVALSIG's own JSON format, the canonical exchange shape.
from evalsig.io import read_runframe_json, write_runframe_json
run = read_runframe_json("baseline.json")
write_runframe_json(run, "out.json")The schema is exported as RUNFRAME_SCHEMA (JSON Schema draft 2020-12).
The validator is lightweight and runs before any RunFrame is
constructed, so bad inputs fail fast with a clear message.
Reads samples_*.jsonl from EleutherAI's lm-evaluation-harness.
run = read_lm_eval_json(
"samples_mmlu_2026-05-16.jsonl",
model_id="claude-x",
task_id="mmlu",
metric_name="acc", # or 'exact_match', 'is_correct', etc.
cluster_key="subject", # optional; field on the doc to group by
)Resilient to several variants of the format (a list of dicts, a samples-wrapping dict, JSONL).
Reads JSON exports of Inspect AI .eval logs. Run
inspect log export run.eval > run.json first.
run = read_inspect_log("run.json",
metric_name="accuracy",
cluster_key="passage_id")Handles the common score.value shapes ("C"/"I", booleans,
numbers).
Reads HELM's scenario_state.json.
run = read_helm_scenario("scenario_state.json",
metric_name="accuracy",
cluster_key="category")Pulls result.success (bool) by default, falls back to a numeric
metric in result[metric_name] or result.stats[metric_name].
The long-term storage format. One row per (run, item, epoch). Use the
canonical PARQUET_SCHEMA (also exported) when writing your own
ingestion paths.
from evalsig.io import read_runframe_parquet, write_runframe_parquet
write_runframe_parquet(run, "run.parquet")
back = read_runframe_parquet("run.parquet")If a file holds multiple runs, pass run_id= to disambiguate.
Convenience wrapper for callers that want to import the alignment
helper from evalsig.io.normalize rather than evalsig.compare.compare.
Returns the aligned arrays plus any warning notes.
Any function that turns a path into a RunFrame is a reader.
from evalsig.io import register_reader
from evalsig.types import RunFrame, ItemResult
def read_my_format(path: str, **kw) -> RunFrame:
rows = my_parser(path)
return RunFrame(
run_id=kw.get("run_id", path),
model_id=kw.get("model_id", "unknown"),
task_id=kw.get("task_id", "unknown"),
metric_name=kw.get("metric_name", "accuracy"),
items=[
ItemResult(item_id=str(r["id"]),
score=float(r["score"]),
cluster_id=r.get("group"))
for r in rows
],
)
register_reader("my_format", read_my_format)The CLI's --format will then accept my_format.
- Configuration for format selection on the command line.
- Methodology for the schema rationale.