Skip to content

Provider leaderboard over-attributes request cost to billed hedge losers #1291

Description

@Brisbanehuang

Summary

Provider cost leaderboard can over-attribute a full request cost to a hedge loser provider when hedge loser billing appends hedge_loser_billed as the last provider_chain entry.

This is a provider-level reporting issue. The request total itself is not necessarily overbilled, but the provider leaderboard / usage_ledger.final_provider_id attribution becomes wrong.

What happens

For a hedged request:

  1. Provider A is selected first and later becomes a billed hedge loser.
  2. Provider B wins and returns the response to the client.
  3. The request row stores total cost as:
message_request.cost_usd = winnerCost + SUM(hedge_losers[].costUsd)
  1. The provider chain ends with an entry like:
{ "id": <provider A id>, "reason": "hedge_loser_billed" }
  1. fn_upsert_usage_ledger() derives usage_ledger.final_provider_id from the last provider_chain entry:
v_final_provider_id := (NEW.provider_chain -> -1 ->> 'id')::integer;
  1. Provider leaderboard groups by usage_ledger.final_provider_id and sums usage_ledger.cost_usd.

Result: Provider A, the loser, receives the entire request cost in the provider leaderboard, even though only hedge_losers[].costUsd belongs to Provider A.

Why this looks wrong

hedge_loser_billed is a terminal accounting/audit entry, but not the provider that served the successful response. Treating the last chain entry as final_provider_id is no longer reliable once hedge loser billing appends loser billing records after the winner.

The affected code paths appear to be:

  • src/lib/ledger-backfill/trigger.sql: fn_upsert_usage_ledger() derives final_provider_id from provider_chain -> -1.
  • src/lib/ledger-backfill/service.ts: backfill uses the same last-chain-entry rule.
  • src/repository/leaderboard.ts: provider leaderboard groups by usage_ledger.final_provider_id and sums usage_ledger.cost_usd.

Minimal reproduction shape

A single request row with:

provider_id = Provider B  -- winner / request row provider
cost_usd = 0.54           -- winner + loser total
hedge_losers = [
  { providerId: Provider A, costUsd: "0.03", attemptNumber: 1 }
]
provider_chain last entry =
  { id: Provider A, reason: "hedge_loser_billed" }

Current ledger result:

usage_ledger.final_provider_id = Provider A
usage_ledger.cost_usd = 0.54

Expected attribution for provider-level cost reporting:

Provider B: winner portion, roughly 0.51
Provider A: hedge loser portion, 0.03

or, at minimum, final_provider_id should continue to point to the actual winning/serving provider rather than the trailing hedge_loser_billed audit entry.

Suggested directions

Possible fixes:

  1. When deriving final_provider_id, ignore trailing hedge_loser_billed / hedge_loser_cancelled entries and use the winning provider entry, for example hedge_winner, request_success, or retry_success.
  2. For provider cost leaderboard, split hedge loser costs explicitly by expanding message_request.hedge_losers, so provider cost becomes:
    • winner provider: message_request.cost_usd - SUM(hedge_losers[].costUsd)
    • each loser provider: its own hedge_losers[].costUsd
  3. Longer term, consider a per-provider ledger table/rows for hedge billing rather than one request-level row with a single final_provider_id.

Option 1 fixes winner attribution, but option 2 is needed if the provider leaderboard should include billed loser costs under the loser provider as well.

Impact

Provider cost leaderboard and provider-level cost summaries can be significantly inflated for providers that frequently appear as billed hedge losers. The total user/request cost can still be correct; the bug is in provider attribution.

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions