fix: buffer non-streaming response body for token reconciliation (closes #12)#20
Merged
levleontiev merged 3 commits intomainfrom Mar 12, 2026
Merged
Conversation
#12) Without a body_filter phase for non-streaming responses, cost_extractor was never called and pessimistic TPM/TPD reservations were never refunded, causing token budgets to drain faster than actual usage. Changes: - streaming.lua: import cost_extractor; add _reconcile_non_streaming() that buffers up to 1 MiB of response body per spec, extracts usage.total_tokens, and calls llm_limiter.reconcile() to refund unused tokens - streaming.lua: body_filter() now handles the non-streaming path (active=false but key present) by buffering chunks and reconciling on EOF - streaming.lua: init_stream() initialises body_buffer field - streaming_spec.lua: install mock_cjson_safe; add 4 new scenarios covering full-body extraction, chunked delivery, missing usage field (no over-refund), and no-context passthrough Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add mock LLM backend (nginx serving fixed usage JSON), a new edge service in reverse_proxy mode with token_bucket_llm policy, and three e2e scenarios: - pass-through: response body contains usage.total_tokens=50 - metric: fairvisor_token_reservation_unused_total emitted after reconcile - refund: 5 consecutive requests succeed (reconciled ~50 tokens each vs pessimistic 1000, so budget is not exhausted) Test runs as part of the nightly full e2e suite (pytest tests/e2e). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ng e2e 1. bundle_loader: validate_config was called on a shallow copy of algorithm_config, so computed fields (_tpm_bucket_config, default defaults etc.) were never written back to rule.algorithm_config. After validation, merge all non-algorithm keys back so request-time code sees the normalised config. 2. rule_engine: the final "allow" decision built at line 882 omitted rule_name, and limit_result.key was never set for token_bucket_llm. decision_api.lua:933 fell through to the fallback which concatenated nil rule_name, causing a 500. Fix: track last_allow_rule_name, set limit_result.key = counter_key for allowed LLM checks, and include rule_name in the final allow decision. These bugs were latent and exposed by the new e2e test suite added for issue #12 / Feature 015. All 3 e2e reconciliation tests now pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
streaming.lua: add_reconcile_non_streaming()that buffers up to 1 MiB of upstream response body, extractsusage.total_tokensviacost_extractor, and callsllm_limiter.reconcile()to refund unused pessimistic TPM/TPD reservationsstreaming.lua: extendbody_filter()to handle non-streaming path (ctx present butactive=false) — buffer chunks, trigger reconcile on EOFstreaming.lua: initialisebody_bufferfield ininit_stream()streaming_spec.lua: addmock_cjson_safesetup + 4 new Gherkin scenarios (full body, chunked delivery, missing usage field, no-context passthrough)Test plan
busted spec/unit/streaming_spec.lua— 18 scenarios pass (4 new: F015-NS-1..4)busted spec/unit/ spec/integration/— 485 total, 0 failures🤖 Generated with Claude Code