Add Presence Penalty support to create_generator#319
Conversation
Adds presence_penalty and presence_context_size parameters to create_generator, _sequential_generation, and _batched_generation, mirroring the existing repetition_penalty interface. Internally refactors RepetitionPenaltyProcessor into a generic TokenPenaltyProcessor that handles KV cache awareness for any mlx_lm penalty function (repetition, presence, frequency). This removes the now-redundant RepetitionPenaltyProcessor class and setup_repetition_penalty helper. The Node bridge in LM Studio needs to map presencePenalty to presence_penalty in kwargs to expose this via the UI.
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bbcfc3295d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if repetition_penalty and repetition_penalty != 0.0: | ||
| logits_processors.append( | ||
| TokenPenaltyProcessor( | ||
| make_repetition_penalty(repetition_penalty, repetition_context_size or 20), |
There was a problem hiding this comment.
Preserve explicit zero context size values
Using repetition_context_size or 20 silently rewrites an explicit 0 to 20, so callers cannot pass 0 through to mlx_lm anymore (the same pattern is repeated for presence_context_size). This is a behavioral regression from the previous implementation, where provided values were forwarded verbatim, and it can change generation outputs in experiments that intentionally set context size to zero. Please switch to an explicit None check (e.g. 20 if repetition_context_size is None else repetition_context_size) so only missing values default.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 655d788 using an explicit is not None check.
(In practice a context size of 0 makes no semantic sense for penalty, but the explicit check is still cleaner).
|
Thank you very much for your work, everyone is waiting for Is there a human here to start the workflows and review? https://unsloth.ai/docs/models/qwen3.6#qwen3.6-27b |
@reneleonhardt haha what times we live in 😅 |
mlx_lmalready implementsmake_presence_penaltyinsample_utils.py, butmlx_enginedid not expose it throughcreate_generator. This makes it impossible for callers to apply a presence penalty, which is useful for reducing token repetition in long generations.Changes
Added
presence_penaltyandpresence_context_sizetocreate_generator,_sequential_generation, and_batched_generation, following the existingrepetition_penalty/repetition_context_sizeinterface. Both default toNone.Refactor:
TokenPenaltyProcessorThe existing
RepetitionPenaltyProcessorwas a custom wrapper whose sole purpose was to prepend cached prefix tokens to the penalty window (since logits processors only receive tokens generated in the current turn, not those already in the KV cache). This logic is now extracted into a genericTokenPenaltyProcessorthat works with anymlx_lmpenalty function.Testing
Added
test_presence_penalty_appliesmirroring the existingtest_repetition_penalty_applies.This PR exposes
presence_penaltyat the Python level. To make it available in the LM Studio UI, the Node bridge needs to mappresencePenaltytopresence_penaltyin the kwargs passed tocreate_generator(analogous to howllm.prediction.llama.presencePenaltyis already handled for the llama.cpp backend).See lmstudio-bug-tracker#1604, lmstudio-bug-tracker#1842.