fix: avoid inflated debug TPS#848
Closed
7Sageer wants to merge 1 commit into
Closed
Conversation
🦋 Changeset detectedLatest commit: 4a667de The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
5 tasks
Collaborator
Author
|
Superseded by #849, which keeps decode-TPS semantics and only skips TPS when the stream window is too short to measure, instead of redefining TPS over the full response window. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related Issue
No linked issue. Fixes a reproducible debug-mode reporting issue where short tool-call streams can display unrealistically high TPS values.
Problem
Debug timing computed TPS from the stream window after the first chunk. For tool-call steps that stream their output in only a few milliseconds while still reporting total output tokens, debug mode could show rates like tens of thousands of tokens per second.
What changed
Compute debug TPS over the full model response window (
TTFT + stream duration) while still showing the raw stream duration in the diagnostic text. Added a regression test for a 1ms stream window and included a patch changeset.Validation
pnpm --filter @moonshot-ai/kimi-code exec vitest run test/utils/usage/debug-timing.test.tsgit diff --checkpnpm -w run build:packagespnpm --filter @moonshot-ai/vis-server run buildpnpm --filter @moonshot-ai/kimi-code run typecheckfails with existingTS2307: Cannot find module '@moonshot-ai/vis-server/start'atsrc/cli/sub/vis.ts:134:49Checklist
gen-changesetsskill, or this PR needs no changeset.gen-docsskill, or this PR needs no doc update.