OKA-SR

Local OKA-SR pipeline for building legal motion-to-dismiss casepacks, running OpenAI-compatible model providers, and scoring structured state-recovery output.

如果你没有代码或美国法律背景，想先用中文理解项目目标、文件结构、题目设计、本地/API 跑法、评分标签和数据安全边界，请读 docs/oka_sr_project_intro_zh.md。

Quick Start

cd oka_sr
python3 -m venv .venv
. .venv/bin/activate
pip install -e ".[dev]"

oka inventory --source ../case_download/outputs/2024_ip_mtd_relaxed
oka extract
oka mask --provider mock
oka gold --provider mock
oka flip --provider mock
oka build-casepacks
oka eval --providers mock --run-id mock-smoke
oka score --run runs/mock-smoke
oka report --run runs/mock-smoke
oka audit

To merge the 20-case case_new expansion into the local private working dataset, run:

oka inventory --source ../case_new/mtd_pr_order_casepack
oka extract
oka mask --provider mock
oka gold --provider mock
oka flip --provider mock
oka build-casepacks
oka audit

The current technical baseline attempts 29 source cases after the expansion. The automated quality gate promotes 16 source cases, producing 64 static prediction instances and 32 ARC-style world-model episodes. Excluded cases are listed with machine-readable reasons in reports/quality_promotion_ledger.jsonl.

For local Qwen through Ollama:

brew install ollama
oka ollama start --model qwen3:0.6b --pull
oka eval --providers ollama_qwen3_0_6b --quality-reviewed --run-id qwen3-0_6b-reviewed
oka score --run runs/qwen3-0_6b-reviewed
oka report --run runs/qwen3-0_6b-reviewed

The small local Qwen provider uses deterministic prompt truncation for long Document Mode inputs so the reviewed subset can run on a laptop-sized model. Use ollama_qwen3_8b for less aggressive local evaluation.

For a stronger sub-10B Qwen local run:

ollama pull qwen3.5:9b
oka eval --providers ollama_qwen3_5_9b --quality-reviewed --run-id qwen3_5-9b-reviewed
oka score --run runs/qwen3_5-9b-reviewed
oka report --run runs/qwen3_5-9b-reviewed

For the DeepSeek v4-pro cloud baseline, keep the API key in the shell environment only:

export DEEPSEEK_API_KEY="<your-deepseek-api-key>"
oka eval --providers deepseek_v4_pro --quality-reviewed --run-id deepseek_v4_pro-reviewed
oka score --run runs/deepseek_v4_pro-reviewed
oka report --run runs/deepseek_v4_pro-reviewed

ARC-Style World Model Episodes

The world-model path sits beside the static casepack benchmark. It builds multi-step episodes where a provider requests sanitized materials, declares a legal state, predicts the base outcome, predicts the state-flip transition, and is scored on outcome, world-model recovery, transition correctness, evidence grounding, and action efficiency.

oka wm-build --quality-reviewed
oka wm-eval --providers mock --run-id wm-mock-reviewed
oka wm-score --run runs/wm-mock-reviewed
oka wm-report --run runs/wm-mock-reviewed

The v1 world-model report is a research technical baseline only. It does not claim final legal prediction capability, and its action-efficiency score is relative to a reference action baseline rather than a formal human baseline. The mock provider is a reference smoke baseline that uses private local casepack answers to verify the runner/scorer pipeline; use Qwen or DeepSeek providers for actual model behavior.

See docs/arc_world_model_benchmark_zh.md for the Chinese design note.

After oka audit, prefer oka eval --quality-reviewed ... for official-ish runs. The full casepack file remains diagnostic; the quality-reviewed file excludes cases whose order/gold/flip did not pass the Codex quality gate.

For DeepSeek:

export DEEPSEEK_API_KEY=...
oka eval --providers deepseek_chat

Repository Data Policy

This repository intentionally excludes raw PDFs, extracted source text, private gold files, hidden answer keys, private casepacks, and raw run outputs. The checked-in public data file is a sanitized reviewed-subset stub:

data/casepacks_public_visible/casepacks_visible_quality_reviewed_safe.jsonl

Use the local private data/ directory to reproduce full Document Mode runs.

Public Release Candidates

Release candidates are built from allowlisted public-safe files only. The GitHub package contains source code, tests, docs, safe public casepacks, public world-model episodes, selected reports, checksums, and a release manifest. The Hugging Face package contains dataset files, schema notes, metrics, checksums, and a dataset card.

oka release-build --target github --output dist/github
oka release-build --target huggingface --output dist/huggingface
oka release-verify --target github --input dist/github
oka release-verify --target huggingface --input dist/huggingface

The release verifier rejects raw case folders, raw PDFs, extracted text, private gold, hidden answer keys, local source paths, raw logs, identifier-like public IDs, judge/counsel signature names, and obvious case/docket URL leakage. The public-safe Document materials are release-safe digest views rather than long source-text clones; private full-text runs remain local.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
configs		configs
data		data
docs		docs
reports		reports
scripts		scripts
src/oka		src/oka
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OKA-SR

Quick Start

ARC-Style World Model Episodes

Repository Data Policy

Public Release Candidates

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OKA-SR

Quick Start

ARC-Style World Model Episodes

Repository Data Policy

Public Release Candidates

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages