Flagship correctness fixes, real escalation metric, lean install by ChunkyTortoise · Pull Request #17 · ChunkyTortoise/docextract

ChunkyTortoise · 2026-05-18T03:44:04Z

Summary

A focused, conflict-free, code-only change (single commit rebased onto current main):

reranker.py — stop mutating the caller's SearchResult objects in place (this corrupted the original retrieval scores still used upstream in agentic_rag); return model_copy'd results so rerank is pure w.r.t. its inputs.
model_ab_test.py — the standard error was the Welch (unpooled) SE of the mean difference but was named pooled_se. Renamed se_diff; math unchanged.
agentic_rag.py — dedup used chunk_id or doc_id, which collapses distinct chunks when chunk_id is a falsy-but-valid value (""). Now chunk_id is not None, extracted to a _dedup_key helper.
metrics.py — escalation_rate was a hardcoded 0.12 (also surfaced in the Quality Monitor dashboard). Now computed from ExtractedRecord.needs_review via a single count(*) FILTER (WHERE ...) query.
Lean install — extract-msg (Outlook .msg) pulls red-black-tree-mod, which needs a C toolchain and broke pip install on clean environments. Made opt-in via requirements-msg.txt; the code already degrades gracefully (HAS_EXTRACT_MSG).

Scope note

This branch originally also carried a doc-honesty pass and a self-verifying repo_stats.py/CI mechanism. While it was in progress, main independently shipped a more extensive parallel pass (hiring narrative, metric-provenance labelling, offline-replay eval gate). That work is now redundant, so it was dropped rather than force a conflicted merge — leaving only the uniquely-valuable, non-conflicting code fixes. Verified that none of main's recent work is reverted.

Validation

1,104 unit tests pass on top of current main (includes main's own new tests), 0 failures, ruff clean.
The new count(*) FILTER aggregate compiles to valid PostgreSQL.
Clean-venv pip install -r requirements_ci.txt exits 0.
Integration suite (Postgres/Redis) not run in the authoring env — covered in CI.

Test plan

CI green: full pytest incl. integration (test_business_metrics.py now asserts the computed rate), ruff, eval-gate
Sanity-check /metrics/business and /metrics/quality-trend return a real computed escalation_rate
Confirm .eml extraction still works with extract-msg absent; .msg returns the graceful error

https://claude.ai/code/session_016hjhwGkKkA7duB6cjzfQ2y

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fcc4ee4a05

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T03:47:24Z

+def _dedup_key(r: SearchResult) -> str:
+    # Fall back to doc_id only when chunk_id is genuinely absent (None);
+    # a falsy-but-valid chunk_id ("") must not collapse distinct chunks.
+    return r.chunk_id if r.chunk_id is not None else r.doc_id


Preserve doc identity for empty chunk IDs

When a tool returns chunk_id="" (which the new comment explicitly treats as a valid falsy chunk id), this key becomes the same empty string for every result, so _merge_results/_deduplicate will drop all but the first such result even when they come from different doc_ids. Use a composite key that still includes doc_id when chunk_id is present but empty, or normalize empty strings to absent if they are not actually valid.

Useful? React with 👍 / 👎.

Rebased onto current main as a clean, conflict-free, code-only change. The branch's earlier doc-honesty and self-verifying-stats work was dropped: main independently shipped a more extensive parallel pass (hiring narrative + metric provenance + offline-replay eval gate), so that work is now redundant. Only the uniquely-valuable, non-conflicting changes remain: - reranker: return model_copy'd results instead of mutating the caller's SearchResult objects in place (corrupted upstream retrieval scores). - model_ab_test: rename pooled_se -> se_diff; it is the Welch (unpooled) SE of the mean difference. Math unchanged. - agentic_rag: dedup via _dedup_key using `is not None` so a falsy-but- valid chunk_id ("") no longer collapses distinct chunks. - metrics: escalation_rate was a hardcoded 0.12; now computed from ExtractedRecord.needs_review in a single count(*) FILTER (...) query. - Outlook .msg support made opt-in (requirements-msg.txt): extract-msg pulls red-black-tree-mod, which needs a C toolchain and broke `pip install` on clean envs. Code already degrades gracefully. - test_reranker uses real SearchResult (required by the no-mutation fix). Verified: 1,104 unit tests pass on top of main, ruff clean, the FILTER aggregate compiles to valid PostgreSQL. https://claude.ai/code/session_016hjhwGkKkA7duB6cjzfQ2y

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

ChunkyTortoise force-pushed the claude/improve-repo-quality-DXamB branch from fcc4ee4 to 2541de1 Compare May 18, 2026 03:49

ChunkyTortoise changed the title ~~Repo quality: honest metrics, flagship correctness fixes, lean install~~ Flagship correctness fixes, real escalation metric, lean install May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flagship correctness fixes, real escalation metric, lean install#17

Flagship correctness fixes, real escalation metric, lean install#17
ChunkyTortoise wants to merge 1 commit into
mainfrom
claude/improve-repo-quality-DXamB

ChunkyTortoise commented May 18, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChunkyTortoise commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope note

Validation

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChunkyTortoise commented May 18, 2026 •

edited

Loading