Add model-agnostic analysis, response-cleaning, prompt/version tracking, incremental runs, and Chinese prompts by Huberyky · Pull Request #1 · Huberyky/GABRIEL

Huberyky · 2026-03-18T03:03:14Z

Enable non-OpenAI (OpenAI‑compatible) models by restoring cost estimates and sanitizing vendor reasoning traces so downstream JSON parsing is robust.
Provide built‑in reliability/validation/robustness analytics so multi‑run outputs can produce standard measurement diagnostics used in social‑science research.
Improve reproducibility and incremental workflows by binding prompt versions to outputs, reporting parse success rates, and supporting prompt language selection (e.g., Chinese templates).

Added a new analysis module gabriel.analysis exposing reliability, validate, and robustness helpers for Krippendorff's α, ICC, Pearson/Spearman, MAE, F1/Cohen's κ, stratified reports, and bootstrap CIs.
Introduced gabriel.utils.model_utils with strip_reasoning_tags, prompt_hash, write_run_metadata, and load_incremental_cache to strip vendor CoT traces (e.g., <think>...</think>), compute prompt hashes, and manage incremental reuse.
Extended core tasks (rate, classify, extract) to accept prompt_language and incremental flags, write prompt‑hash metadata to run_metadata.json, sanitize raw responses with strip_reasoning_tags before parsing, produce per‑task parse reports (*_parse_report.csv), and attempt incremental merged reads when incremental=True.
Added Chinese Jinja2 prompt templates (ratings_prompt_zh.jinja2, classification_prompt_zh.jinja2, extraction_prompt_zh.jinja2) and wired template selection by prompt_language.
Added pricing entries for DeepSeek / Qwen families to MODEL_PRICING so cost estimates remain available when using OpenAI‑compatible base_urls.
Small plumbing: exported new top‑level functions (gabriel.reliability, gabriel.validate, gabriel.robustness) and added unit tests and utilities accordingly.

Ran targeted unit tests: pytest -q tests/test_analysis_extensions.py tests/test_imports.py tests/test_discover_exports.py, all tests passed.
Performed static sanity checks: python -m py_compile $(rg --files src/gabriel -g '*.py' | tr '\n' ' ') succeeded (no syntax errors).
Verified template and runtime integration by running the new analysis and response‑sanitization flows in task code paths (parse reports created and incremental merge logic exercised during tests).

Add model-agnostic analysis and tracking tools

b4c3dcc

Huberyky added the codex label Mar 18, 2026 — with ChatGPT Codex Connector

Provide feedback