Skip to content

Approx usage for interrupted streaming requests#55

Open
KeremTurgutlu wants to merge 1 commit into
mainfrom
interrupted-usage
Open

Approx usage for interrupted streaming requests#55
KeremTurgutlu wants to merge 1 commit into
mainfrom
interrupted-usage

Conversation

@KeremTurgutlu

@KeremTurgutlu KeremTurgutlu commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Streaming wrappers now build an interrupted Completion when a stream is cancelled or closed before provider usage is returned. The wrapper estimates prompt/output tokens, assumes 80% of input tokens were cached, normalizes the synthetic usage through the provider’s existing norm_usage, and tracks it with the normal AsyncChat usage accounting.

Providers can now register approx_raw_usage hooks, so approximate usage keeps the same provider-shaped raw usage and cost path as real responses. This adds hooks for OpenAI Responses, OpenAI Chat, Anthropic, and Gemini.

This lets callers such as Solveit show and log approximate token usage/cost for interrupted prompts, including cancellations before the first streamed token.

interrupted_usage_half.mov

@KeremTurgutlu KeremTurgutlu self-assigned this Jun 19, 2026
@KeremTurgutlu KeremTurgutlu added the enhancement New feature or request label Jun 19, 2026
@KeremTurgutlu KeremTurgutlu marked this pull request as draft June 19, 2026 17:04
Comment thread fastllm/types.py Outdated
Comment on lines +154 to +159
def approx_text_tokens(s): return (len(s or '') + 2)//3

def approx_obj_tokens(o):
try: s = json.dumps(obj2dict(o), ensure_ascii=False, default=str)
except Exception: s = str(o)
return approx_text_tokens(s)

@KeremTurgutlu KeremTurgutlu Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jph00 Should we instead use the tiktoken based estimator from solveit?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably overkill IMO

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why json.dumps instead of str here btw?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment thread fastllm/chat.py
api_name=api_name,
vendor_name=vendor_name,
usage=usage)
chat._track(self.value)

@KeremTurgutlu KeremTurgutlu Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cancelled request exits AsyncChat._call while yielding chunks (async for chunk in res: yield chunk # exits here) and never reaches the rest of the code:

# AsyncChat._call()
...
if stream:
    if self.prefill: yield _mk_prefill(self.prefill)
    res = astream_with_complete(self, res, postproc=postproc)
    async for chunk in res: yield chunk # exits here
    res = res.value

So we manually call chat._track(self.value) here to set c.use for the interrupted request.

Comment thread fastllm/acomplete.py
def mk_client(model=None, vendor_name=None, api_name=None, api_key=None, base_url=None, xtra_hdrs=None,
timeout=httpx.Timeout(connect=30, read=300, write=30, pool=10)):
# %% ../nbs/06_acomplete.ipynb #c714601e
def resolve_api_vendor(model=None, vendor_name=None, api_name=None, api_key=None, base_url=None):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factored this out to be able to use it during interrupted Completion construction.

Comment thread fastllm/chat.py
yield postproc(chunk)
self.value = chunk
except (GeneratorExit, asyncio.CancelledError):
api_name,vendor_name,*_ = resolve_api_vendor(chat.model, chat.vendor_name, chat.api_name, chat.api_key, chat.base_url)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api_name and vendor_name are inferred in acomplete inside mk_client and not stored in AsyncChat, so we resolve them here using the new helper.

@KeremTurgutlu KeremTurgutlu marked this pull request as ready for review June 27, 2026 10:13
@AnswerDotAI AnswerDotAI deleted a comment from KeremTurgutlu Jun 29, 2026
Comment thread fastllm/types.py Outdated
FinishReason = str_enum('finish_reason', 'stop', 'tool_calls', 'length', 'content_filter', 'interrupted')

# %% ../nbs/00_types.ipynb #c5a88e6f
def approx_text_tokens(s): return (len(s or '') + 2)//3

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be len((s or '').split()... ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then *3/2 ?

@KeremTurgutlu KeremTurgutlu Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some heuristic AI came up with, I couldn't find our pre-tiktoken estimator in the git history. If you have that available that would be awesome. Was it something like:

def str_tokens(s): return int(len(s)/3.4) + 1

from https://github.com/AnswerDotAI/solveit/blob/b3d4b09dbef1f6a7437ca1c79a81d796f9ac50ed/00_db.ipynb ?

@KeremTurgutlu

KeremTurgutlu commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

@jph00 I've simplified the token approx logic down to approx_str_tokens (from solveit history) which works both for objects like chat.turn_msgs and strings like chat.turn_sysp:

def approx_str_tokens(o): return int(len(str(o))/3.4) + 1

@KeremTurgutlu KeremTurgutlu requested a review from jph00 June 29, 2026 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants