Extension does not work correctly with "thinking" models

If using e.g. Ollama with locally running Gemma4, the response from the LLM is quite often stripped in the middle.

Reason: the hard-coded max_token limit is too low. This kind of LLMs produces "thinking" tokens, which, being counted together with response tokens, exhaust the tokens' limit causing the generation to stop with "length" as the stop reason.

Suggested solution: make max_tokens configurable so that user can set the limit higher if this occurs. See also #65 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extension does not work correctly with "thinking" models #66

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Extension does not work correctly with "thinking" models #66

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions