feat: add T-I-F reliability helpers (TifScore, evaluate_tif) by ferhimedamine · Pull Request #126 · Dakera-AI/dakera-py

ferhimedamine · 2026-06-12T22:54:54Z

Summary

Part of T-I-F RFC Phase 3 — adds type-safe T-I-F reliability helpers so developers don't have to hand-roll metadata.reliability parsing.

Changes:

TifScore dataclass in models.py with truth/indeterminacy/falsity proportions, feedback_count, classification property, from_feedback_history() and from_metadata() classmethods
evaluate_tif(memory_id) convenience method on both DakeraClient and AsyncDakeraClient
TifScore exported from __init__.py
20 unit tests in tests/test_tif.py covering all edge cases and classification thresholds

T-I-F computation:

upvote/positive → truth
downvote/negative → falsity
flag → indeterminacy
No feedback → {truth=0.0, indeterminacy=1.0, falsity=0.0}

Classification thresholds:

falsity >= 0.50 → surface_contradiction
indeterminacy >= 0.50 → ask_clarification
truth >= 0.70 → confident_reuse
else → verify_before_use

Related PRs (all 4 SDKs batch)

dakera-py (this PR)
dakera-js: feat/tif-reliability-helpers
dakera-rs: feat/tif-reliability-helpers
dakera-go: feat/tif-reliability-helpers

🤖 Generated with Claude Code

Reviewed-by: Jean-Sébastien Beaulieu (@SeCuReDmE-main-dev) — T-I-F contract parity review

Implements Phase 3 of the T-I-F RFC (dakera-deploy#161). - Add TifScore dataclass to models.py with truth/indeterminacy/falsity proportions, feedback_count, classification property, and from_feedback_history/from_metadata classmethods - Add evaluate_tif() to DakeraClient and AsyncDakeraClient - Export TifScore from __init__.py - Add 20 unit tests covering all edge cases and classification thresholds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix I001: move TifScore to correct alphabetical position in __init__.py import block - Fix E501: break long line in models.py from_feedback_history() ternary - Fix I001: add blank line between third-party/first-party imports in test_tif.py - Fix E501: break long _make_history() calls in test_tif.py lines 57 and 100 - Update CHANGELOG.md and README.md with TifScore / evaluate_tif documentation Part of T-I-F RFC Phase 3 (DAK-6562).

Co-Authored-By: Paperclip <noreply@paperclip.ing>

…t imports async_client.py and client.py both had TifScore placed between FeedbackSignal and FilterDict (wrong: T > F). Ruff I001 flags this on the merge commit where the full import block is visible. Move TifScore to between TextUpsertResponse and TtlStatsResponse (correct: T-i is between T-e-x-t and T-t-l alphabetically). Part of T-I-F RFC Phase 3 (DAK-6562). Co-Authored-By: Paperclip <noreply@paperclip.ing>

SeCuReDmE-main-dev · 2026-06-12T23:59:19Z

Phase 3 SDK review note from the RFC side.

This is clean and useful ergonomically: TifScore, evaluate_tif(), metadata parsing, sync/async support, and green CI are all strong.

The one thing I would hold before stabilizing Phase 3 is cross-language parity with the MCP PR and the other SDKs. Right now this Python helper appears to compute raw feedback proportions only, while dakera-mcp#123 adds a thin-evidence base indeterminacy rule when feedback_count < 3. Also, no feedback here is truth=0, indeterminacy=1, falsity=0, which classifies as ask_clarification, while MCP currently returns verify_before_use for no feedback.

I recommend adding shared golden vectors before merge/release, for example:

[]
[upvote]
[upvote, upvote]
[upvote, upvote, upvote]
[downvote, downvote]
[flag, flag]
[upvote x8, downvote x1, flag x1]
[downvote x3, flag x3]

For each vector, Python, JS, Rust, Go, and MCP should return the same truth, indeterminacy, falsity, classification, and feedback_count.

Once that contract is aligned, this PR looks like the right Python-side shape for Phase 3.

…6566) Aligns Python SDK with MCP canonical T-I-F v1 contract: - Inject base indeterminacy when feedback_count < 3 to prevent false confidence from sparse signals - Normalise T+I+F to 1.0 after adding base indeterminacy - Add 8 golden vector tests matching MCP/JS/Rust/Go - Add 3 thin-evidence unit tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SeCuReDmE-main-dev · 2026-06-13T00:50:36Z

@ferhimedamine final Phase 3 review complete from my side for the Python SDK PR.

I rechecked the current PR state and the DAK-6566 parity fixes. The previous blockers are resolved:

CI is green.
PR is mergeable.
no-feedback now maps to ask_clarification.
thin-evidence base indeterminacy is present.
the 8 golden vectors are present and cover the canonical T-I-F v1 contract.
3 downvote + 3 flag correctly prioritizes surface_contradiction.
metadata reliability parsing remains backward compatible with Phase 1 / Phase 2.

This is reviewed from my side. No further requested changes from me.

ferhimedamine marked this pull request as ready for review June 12, 2026 23:27

Platform Bot and others added 3 commits June 12, 2026 23:28

ci: re-trigger CI checks after lint fixes

6035536

Co-Authored-By: Paperclip <noreply@paperclip.ing>

ferhimedamine force-pushed the feat/tif-reliability-helpers branch from 7d37559 to 6035536 Compare June 12, 2026 23:29

ferhimedamine mentioned this pull request Jun 13, 2026

feat(mcp): add dakera_tif_evaluate tool for T-I-F reliability scoring Dakera-AI/dakera-mcp#123

Merged

SeCuReDmE-main-dev mentioned this pull request Jun 13, 2026

RFC: T-I-F (Truth-Indeterminacy-Falsity) Decision Provenance Layer Dakera-AI/dakera-deploy#161

Closed

ferhimedamine added the auto-merge Auto-merge when CI passes label Jun 13, 2026

ferhimedamine merged commit 33b4ab2 into main Jun 13, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add T-I-F reliability helpers (TifScore, evaluate_tif)#126

feat: add T-I-F reliability helpers (TifScore, evaluate_tif)#126
ferhimedamine merged 5 commits into
mainfrom
feat/tif-reliability-helpers

ferhimedamine commented Jun 12, 2026 •

edited

Loading

Uh oh!

SeCuReDmE-main-dev commented Jun 12, 2026

Uh oh!

SeCuReDmE-main-dev commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ferhimedamine commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related PRs (all 4 SDKs batch)

Uh oh!

SeCuReDmE-main-dev commented Jun 12, 2026

Uh oh!

SeCuReDmE-main-dev commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ferhimedamine commented Jun 12, 2026 •

edited

Loading