Skip to content

Add Presence Penalty support to create_generator#319

Open
AirRunner wants to merge 2 commits into
lmstudio-ai:mainfrom
AirRunner:feat/presence-penalty
Open

Add Presence Penalty support to create_generator#319
AirRunner wants to merge 2 commits into
lmstudio-ai:mainfrom
AirRunner:feat/presence-penalty

Conversation

@AirRunner
Copy link
Copy Markdown

mlx_lm already implements make_presence_penalty in sample_utils.py, but mlx_engine did not expose it through create_generator. This makes it impossible for callers to apply a presence penalty, which is useful for reducing token repetition in long generations.

Changes

Added presence_penalty and presence_context_size to create_generator, _sequential_generation, and _batched_generation, following the existing repetition_penalty / repetition_context_size interface. Both default to None.

Refactor: TokenPenaltyProcessor

The existing RepetitionPenaltyProcessor was a custom wrapper whose sole purpose was to prepend cached prefix tokens to the penalty window (since logits processors only receive tokens generated in the current turn, not those already in the KV cache). This logic is now extracted into a generic TokenPenaltyProcessor that works with any mlx_lm penalty function.

Testing

Added test_presence_penalty_applies mirroring the existing test_repetition_penalty_applies.


This PR exposes presence_penalty at the Python level. To make it available in the LM Studio UI, the Node bridge needs to map presencePenalty to presence_penalty in the kwargs passed to create_generator (analogous to how llm.prediction.llama.presencePenalty is already handled for the llama.cpp backend).

See lmstudio-bug-tracker#1604, lmstudio-bug-tracker#1842.

Adds presence_penalty and presence_context_size parameters to create_generator, _sequential_generation, and _batched_generation, mirroring the existing repetition_penalty interface.

Internally refactors RepetitionPenaltyProcessor into a generic TokenPenaltyProcessor that handles KV cache awareness for any mlx_lm penalty function (repetition, presence, frequency). This removes the now-redundant RepetitionPenaltyProcessor class and setup_repetition_penalty helper.

The Node bridge in LM Studio needs to map presencePenalty to presence_penalty in kwargs to expose this via the UI.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 25, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@AirRunner
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@github-actions github-actions Bot added the CLA signed Indicates that all contributors have signed label Apr 25, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bbcfc3295d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread mlx_engine/utils/generation_helpers.py Outdated
if repetition_penalty and repetition_penalty != 0.0:
logits_processors.append(
TokenPenaltyProcessor(
make_repetition_penalty(repetition_penalty, repetition_context_size or 20),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve explicit zero context size values

Using repetition_context_size or 20 silently rewrites an explicit 0 to 20, so callers cannot pass 0 through to mlx_lm anymore (the same pattern is repeated for presence_context_size). This is a behavioral regression from the previous implementation, where provided values were forwarded verbatim, and it can change generation outputs in experiments that intentionally set context size to zero. Please switch to an explicit None check (e.g. 20 if repetition_context_size is None else repetition_context_size) so only missing values default.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 655d788 using an explicit is not None check.

(In practice a context size of 0 makes no semantic sense for penalty, but the explicit check is still cleaner).

@reneleonhardt
Copy link
Copy Markdown

Thank you very much for your work, everyone is waiting for presence_penalty to enjoy the amazing agentic coding results of Qwen 3.6 27b!

Is there a human here to start the workflows and review?

https://unsloth.ai/docs/models/qwen3.6#qwen3.6-27b
https://www.reddit.com/r/LocalLLaMA/comments/1strodp/qwen_36_27b_makes_huge_gains_in_agency_on/

@AirRunner
Copy link
Copy Markdown
Author

Is there a human here?

@reneleonhardt haha what times we live in 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA signed Indicates that all contributors have signed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants