Skip to content

fix: buffer non-streaming response body for token reconciliation (closes #12)#20

Merged
levleontiev merged 3 commits intomainfrom
fix/issue-12-non-streaming-token-reconciliation
Mar 12, 2026
Merged

fix: buffer non-streaming response body for token reconciliation (closes #12)#20
levleontiev merged 3 commits intomainfrom
fix/issue-12-non-streaming-token-reconciliation

Conversation

@levleontiev
Copy link
Contributor

Summary

  • streaming.lua: add _reconcile_non_streaming() that buffers up to 1 MiB of upstream response body, extracts usage.total_tokens via cost_extractor, and calls llm_limiter.reconcile() to refund unused pessimistic TPM/TPD reservations
  • streaming.lua: extend body_filter() to handle non-streaming path (ctx present but active=false) — buffer chunks, trigger reconcile on EOF
  • streaming.lua: initialise body_buffer field in init_stream()
  • streaming_spec.lua: add mock_cjson_safe setup + 4 new Gherkin scenarios (full body, chunked delivery, missing usage field, no-context passthrough)

Test plan

  • busted spec/unit/streaming_spec.lua — 18 scenarios pass (4 new: F015-NS-1..4)
  • busted spec/unit/ spec/integration/ — 485 total, 0 failures
  • E2E: non-streaming LLM request via reverse_proxy mode shows token refund in metrics after response

🤖 Generated with Claude Code

oai-codex and others added 2 commits March 12, 2026 10:58
 #12)

Without a body_filter phase for non-streaming responses, cost_extractor
was never called and pessimistic TPM/TPD reservations were never refunded,
causing token budgets to drain faster than actual usage.

Changes:
- streaming.lua: import cost_extractor; add _reconcile_non_streaming() that
  buffers up to 1 MiB of response body per spec, extracts usage.total_tokens,
  and calls llm_limiter.reconcile() to refund unused tokens
- streaming.lua: body_filter() now handles the non-streaming path (active=false
  but key present) by buffering chunks and reconciling on EOF
- streaming.lua: init_stream() initialises body_buffer field
- streaming_spec.lua: install mock_cjson_safe; add 4 new scenarios covering
  full-body extraction, chunked delivery, missing usage field (no over-refund),
  and no-context passthrough

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add mock LLM backend (nginx serving fixed usage JSON), a new edge service
in reverse_proxy mode with token_bucket_llm policy, and three e2e scenarios:

- pass-through: response body contains usage.total_tokens=50
- metric: fairvisor_token_reservation_unused_total emitted after reconcile
- refund: 5 consecutive requests succeed (reconciled ~50 tokens each vs
  pessimistic 1000, so budget is not exhausted)

Test runs as part of the nightly full e2e suite (pytest tests/e2e).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


@pytest.fixture(scope="session")
def edge_llm_reconcile_base_url():
…ng e2e

1. bundle_loader: validate_config was called on a shallow copy of
   algorithm_config, so computed fields (_tpm_bucket_config, default
   defaults etc.) were never written back to rule.algorithm_config.
   After validation, merge all non-algorithm keys back so request-time
   code sees the normalised config.

2. rule_engine: the final "allow" decision built at line 882 omitted
   rule_name, and limit_result.key was never set for token_bucket_llm.
   decision_api.lua:933 fell through to the fallback which concatenated
   nil rule_name, causing a 500.  Fix: track last_allow_rule_name, set
   limit_result.key = counter_key for allowed LLM checks, and include
   rule_name in the final allow decision.

These bugs were latent and exposed by the new e2e test suite added for
issue #12 / Feature 015.  All 3 e2e reconciliation tests now pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@levleontiev levleontiev merged commit 7fe8163 into main Mar 12, 2026
7 checks passed
@levleontiev levleontiev deleted the fix/issue-12-non-streaming-token-reconciliation branch March 12, 2026 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants