feat: add T-I-F reliability helpers (TifScore, evaluate_tif)#126
Conversation
Implements Phase 3 of the T-I-F RFC (dakera-deploy#161). - Add TifScore dataclass to models.py with truth/indeterminacy/falsity proportions, feedback_count, classification property, and from_feedback_history/from_metadata classmethods - Add evaluate_tif() to DakeraClient and AsyncDakeraClient - Export TifScore from __init__.py - Add 20 unit tests covering all edge cases and classification thresholds Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix I001: move TifScore to correct alphabetical position in __init__.py import block - Fix E501: break long line in models.py from_feedback_history() ternary - Fix I001: add blank line between third-party/first-party imports in test_tif.py - Fix E501: break long _make_history() calls in test_tif.py lines 57 and 100 - Update CHANGELOG.md and README.md with TifScore / evaluate_tif documentation Part of T-I-F RFC Phase 3 (DAK-6562).
Co-Authored-By: Paperclip <noreply@paperclip.ing>
7d37559 to
6035536
Compare
…t imports async_client.py and client.py both had TifScore placed between FeedbackSignal and FilterDict (wrong: T > F). Ruff I001 flags this on the merge commit where the full import block is visible. Move TifScore to between TextUpsertResponse and TtlStatsResponse (correct: T-i is between T-e-x-t and T-t-l alphabetically). Part of T-I-F RFC Phase 3 (DAK-6562). Co-Authored-By: Paperclip <noreply@paperclip.ing>
|
Phase 3 SDK review note from the RFC side. This is clean and useful ergonomically: The one thing I would hold before stabilizing Phase 3 is cross-language parity with the MCP PR and the other SDKs. Right now this Python helper appears to compute raw feedback proportions only, while I recommend adding shared golden vectors before merge/release, for example: For each vector, Python, JS, Rust, Go, and MCP should return the same Once that contract is aligned, this PR looks like the right Python-side shape for Phase 3. |
…6566) Aligns Python SDK with MCP canonical T-I-F v1 contract: - Inject base indeterminacy when feedback_count < 3 to prevent false confidence from sparse signals - Normalise T+I+F to 1.0 after adding base indeterminacy - Add 8 golden vector tests matching MCP/JS/Rust/Go - Add 3 thin-evidence unit tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@ferhimedamine final Phase 3 review complete from my side for the Python SDK PR. I rechecked the current PR state and the DAK-6566 parity fixes. The previous blockers are resolved:
This is reviewed from my side. No further requested changes from me. |
Summary
Part of T-I-F RFC Phase 3 — adds type-safe T-I-F reliability helpers so developers don't have to hand-roll
metadata.reliabilityparsing.Changes:
TifScoredataclass inmodels.pywithtruth/indeterminacy/falsityproportions,feedback_count,classificationproperty,from_feedback_history()andfrom_metadata()classmethodsevaluate_tif(memory_id)convenience method on bothDakeraClientandAsyncDakeraClientTifScoreexported from__init__.pytests/test_tif.pycovering all edge cases and classification thresholdsT-I-F computation:
upvote/positive→ truthdownvote/negative→ falsityflag→ indeterminacy{truth=0.0, indeterminacy=1.0, falsity=0.0}Classification thresholds:
falsity >= 0.50→surface_contradictionindeterminacy >= 0.50→ask_clarificationtruth >= 0.70→confident_reuseverify_before_useRelated PRs (all 4 SDKs batch)
🤖 Generated with Claude Code
Reviewed-by: Jean-Sébastien Beaulieu (@SeCuReDmE-main-dev) — T-I-F contract parity review