Skip to content

feat: add T-I-F reliability helpers (TifScore, evaluate_tif)#126

Merged
ferhimedamine merged 5 commits into
mainfrom
feat/tif-reliability-helpers
Jun 13, 2026
Merged

feat: add T-I-F reliability helpers (TifScore, evaluate_tif)#126
ferhimedamine merged 5 commits into
mainfrom
feat/tif-reliability-helpers

Conversation

@ferhimedamine

@ferhimedamine ferhimedamine commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Part of T-I-F RFC Phase 3 — adds type-safe T-I-F reliability helpers so developers don't have to hand-roll metadata.reliability parsing.

Changes:

  • TifScore dataclass in models.py with truth/indeterminacy/falsity proportions, feedback_count, classification property, from_feedback_history() and from_metadata() classmethods
  • evaluate_tif(memory_id) convenience method on both DakeraClient and AsyncDakeraClient
  • TifScore exported from __init__.py
  • 20 unit tests in tests/test_tif.py covering all edge cases and classification thresholds

T-I-F computation:

  • upvote/positive → truth
  • downvote/negative → falsity
  • flag → indeterminacy
  • No feedback → {truth=0.0, indeterminacy=1.0, falsity=0.0}

Classification thresholds:

  • falsity >= 0.50surface_contradiction
  • indeterminacy >= 0.50ask_clarification
  • truth >= 0.70confident_reuse
  • else → verify_before_use

Related PRs (all 4 SDKs batch)

  • dakera-py (this PR)
  • dakera-js: feat/tif-reliability-helpers
  • dakera-rs: feat/tif-reliability-helpers
  • dakera-go: feat/tif-reliability-helpers

🤖 Generated with Claude Code


Reviewed-by: Jean-Sébastien Beaulieu (@SeCuReDmE-main-dev) — T-I-F contract parity review

@ferhimedamine ferhimedamine marked this pull request as ready for review June 12, 2026 23:27
Platform Bot and others added 3 commits June 12, 2026 23:28
Implements Phase 3 of the T-I-F RFC (dakera-deploy#161).

- Add TifScore dataclass to models.py with truth/indeterminacy/falsity
  proportions, feedback_count, classification property, and
  from_feedback_history/from_metadata classmethods
- Add evaluate_tif() to DakeraClient and AsyncDakeraClient
- Export TifScore from __init__.py
- Add 20 unit tests covering all edge cases and classification thresholds

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix I001: move TifScore to correct alphabetical position in __init__.py import block
- Fix E501: break long line in models.py from_feedback_history() ternary
- Fix I001: add blank line between third-party/first-party imports in test_tif.py
- Fix E501: break long _make_history() calls in test_tif.py lines 57 and 100
- Update CHANGELOG.md and README.md with TifScore / evaluate_tif documentation

Part of T-I-F RFC Phase 3 (DAK-6562).
Co-Authored-By: Paperclip <noreply@paperclip.ing>
@ferhimedamine ferhimedamine force-pushed the feat/tif-reliability-helpers branch from 7d37559 to 6035536 Compare June 12, 2026 23:29
…t imports

async_client.py and client.py both had TifScore placed between
FeedbackSignal and FilterDict (wrong: T > F). Ruff I001 flags this on
the merge commit where the full import block is visible. Move TifScore
to between TextUpsertResponse and TtlStatsResponse (correct: T-i is
between T-e-x-t and T-t-l alphabetically).

Part of T-I-F RFC Phase 3 (DAK-6562).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@SeCuReDmE-main-dev

Copy link
Copy Markdown

Phase 3 SDK review note from the RFC side.

This is clean and useful ergonomically: TifScore, evaluate_tif(), metadata parsing, sync/async support, and green CI are all strong.

The one thing I would hold before stabilizing Phase 3 is cross-language parity with the MCP PR and the other SDKs. Right now this Python helper appears to compute raw feedback proportions only, while dakera-mcp#123 adds a thin-evidence base indeterminacy rule when feedback_count < 3. Also, no feedback here is truth=0, indeterminacy=1, falsity=0, which classifies as ask_clarification, while MCP currently returns verify_before_use for no feedback.

I recommend adding shared golden vectors before merge/release, for example:

[]
[upvote]
[upvote, upvote]
[upvote, upvote, upvote]
[downvote, downvote]
[flag, flag]
[upvote x8, downvote x1, flag x1]
[downvote x3, flag x3]

For each vector, Python, JS, Rust, Go, and MCP should return the same truth, indeterminacy, falsity, classification, and feedback_count.

Once that contract is aligned, this PR looks like the right Python-side shape for Phase 3.

…6566)

Aligns Python SDK with MCP canonical T-I-F v1 contract:
- Inject base indeterminacy when feedback_count < 3 to prevent
  false confidence from sparse signals
- Normalise T+I+F to 1.0 after adding base indeterminacy
- Add 8 golden vector tests matching MCP/JS/Rust/Go
- Add 3 thin-evidence unit tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@SeCuReDmE-main-dev

Copy link
Copy Markdown

@ferhimedamine final Phase 3 review complete from my side for the Python SDK PR.

I rechecked the current PR state and the DAK-6566 parity fixes. The previous blockers are resolved:

  • CI is green.
  • PR is mergeable.
  • no-feedback now maps to ask_clarification.
  • thin-evidence base indeterminacy is present.
  • the 8 golden vectors are present and cover the canonical T-I-F v1 contract.
  • 3 downvote + 3 flag correctly prioritizes surface_contradiction.
  • metadata reliability parsing remains backward compatible with Phase 1 / Phase 2.

This is reviewed from my side. No further requested changes from me.

@ferhimedamine ferhimedamine added the auto-merge Auto-merge when CI passes label Jun 13, 2026
@ferhimedamine ferhimedamine merged commit 33b4ab2 into main Jun 13, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge Auto-merge when CI passes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants