Local OKA-SR pipeline for building legal motion-to-dismiss casepacks, running OpenAI-compatible model providers, and scoring structured state-recovery output.
如果你没有代码或美国法律背景,想先用中文理解项目目标、文件结构、题目设计、
本地/API 跑法、评分标签和数据安全边界,请读
docs/oka_sr_project_intro_zh.md。
cd oka_sr
python3 -m venv .venv
. .venv/bin/activate
pip install -e ".[dev]"
oka inventory --source ../case_download/outputs/2024_ip_mtd_relaxed
oka extract
oka mask --provider mock
oka gold --provider mock
oka flip --provider mock
oka build-casepacks
oka eval --providers mock --run-id mock-smoke
oka score --run runs/mock-smoke
oka report --run runs/mock-smoke
oka auditTo merge the 20-case case_new expansion into the local private working
dataset, run:
oka inventory --source ../case_new/mtd_pr_order_casepack
oka extract
oka mask --provider mock
oka gold --provider mock
oka flip --provider mock
oka build-casepacks
oka auditThe current technical baseline attempts 29 source cases after the expansion.
The automated quality gate promotes 16 source cases, producing 64 static
prediction instances and 32 ARC-style world-model episodes. Excluded cases are
listed with machine-readable reasons in reports/quality_promotion_ledger.jsonl.
For local Qwen through Ollama:
brew install ollama
oka ollama start --model qwen3:0.6b --pull
oka eval --providers ollama_qwen3_0_6b --quality-reviewed --run-id qwen3-0_6b-reviewed
oka score --run runs/qwen3-0_6b-reviewed
oka report --run runs/qwen3-0_6b-reviewedThe small local Qwen provider uses deterministic prompt truncation for long
Document Mode inputs so the reviewed subset can run on a laptop-sized model.
Use ollama_qwen3_8b for less aggressive local evaluation.
For a stronger sub-10B Qwen local run:
ollama pull qwen3.5:9b
oka eval --providers ollama_qwen3_5_9b --quality-reviewed --run-id qwen3_5-9b-reviewed
oka score --run runs/qwen3_5-9b-reviewed
oka report --run runs/qwen3_5-9b-reviewedFor the DeepSeek v4-pro cloud baseline, keep the API key in the shell environment only:
export DEEPSEEK_API_KEY="<your-deepseek-api-key>"
oka eval --providers deepseek_v4_pro --quality-reviewed --run-id deepseek_v4_pro-reviewed
oka score --run runs/deepseek_v4_pro-reviewed
oka report --run runs/deepseek_v4_pro-reviewedThe world-model path sits beside the static casepack benchmark. It builds multi-step episodes where a provider requests sanitized materials, declares a legal state, predicts the base outcome, predicts the state-flip transition, and is scored on outcome, world-model recovery, transition correctness, evidence grounding, and action efficiency.
oka wm-build --quality-reviewed
oka wm-eval --providers mock --run-id wm-mock-reviewed
oka wm-score --run runs/wm-mock-reviewed
oka wm-report --run runs/wm-mock-reviewedThe v1 world-model report is a research technical baseline only. It does not
claim final legal prediction capability, and its action-efficiency score is
relative to a reference action baseline rather than a formal human baseline.
The mock provider is a reference smoke baseline that uses private local
casepack answers to verify the runner/scorer pipeline; use Qwen or DeepSeek
providers for actual model behavior.
See docs/arc_world_model_benchmark_zh.md for the Chinese design note.
After oka audit, prefer oka eval --quality-reviewed ... for official-ish
runs. The full casepack file remains diagnostic; the quality-reviewed file
excludes cases whose order/gold/flip did not pass the Codex quality gate.
For DeepSeek:
export DEEPSEEK_API_KEY=...
oka eval --providers deepseek_chatThis repository intentionally excludes raw PDFs, extracted source text, private gold files, hidden answer keys, private casepacks, and raw run outputs. The checked-in public data file is a sanitized reviewed-subset stub:
data/casepacks_public_visible/casepacks_visible_quality_reviewed_safe.jsonl
Use the local private data/ directory to reproduce full Document Mode runs.
Release candidates are built from allowlisted public-safe files only. The GitHub package contains source code, tests, docs, safe public casepacks, public world-model episodes, selected reports, checksums, and a release manifest. The Hugging Face package contains dataset files, schema notes, metrics, checksums, and a dataset card.
oka release-build --target github --output dist/github
oka release-build --target huggingface --output dist/huggingface
oka release-verify --target github --input dist/github
oka release-verify --target huggingface --input dist/huggingfaceThe release verifier rejects raw case folders, raw PDFs, extracted text, private gold, hidden answer keys, local source paths, raw logs, identifier-like public IDs, judge/counsel signature names, and obvious case/docket URL leakage. The public-safe Document materials are release-safe digest views rather than long source-text clones; private full-text runs remain local.