Bug
src/workers/continuum-core/src/ai/openai_adapter.rs:619-625 — the DMR POST body sent to local DMR (Docker Model Runner) only includes model, messages, temperature, max_tokens, stream. No frequency_penalty or presence_penalty. DMR therefore runs with OpenAI's defaults of 0.0, meaning ZERO repetition pressure during sampling.
Meanwhile llamacpp_adapter.rs:391 + llamacpp_scheduler.rs:648 set sampling.repeat_penalty = 1.1 for the in-process Mac path. So the platforms diverge: Mac in-process has a mild penalty, Linux DMR has none.
Symptom: Linux/CUDA chat shows personas verbatim-echoing each other's prior responses — e.g. multiple personas in turn outputting the same "Sentinel: dev/build-feature ... please continue to support the team and provide any necessary updates" block. Conversation degrades into a self-reinforcing echo loop within 3-4 turns.
Repro
# Linux/CUDA Carl
./jtag collaboration/chat/send --room=general --message='introduce yourselves'
./jtag collaboration/chat/export --room=General --limit=20
Look for verbatim repetition of multi-paragraph text across personas.
Fix shape
Add to openai_adapter.rs:619 body construction:
"frequency_penalty": 0.4,
"presence_penalty": 0.4,
Defaults chosen to match the perceptual mildness of llamacpp's repeat_penalty=1.1 — strong enough to break verbatim repetition without forcing weird vocabulary.
Should also evaluate whether to bump llamacpp_adapter's default to 1.15 if echo loops persist on Mac after the Linux gap closes.
Linked
PR #950 — coherent personas merge gate (RULE 1: Mac vs Linux paths must converge in behavior).
Bug
src/workers/continuum-core/src/ai/openai_adapter.rs:619-625— the DMR POST body sent to local DMR (Docker Model Runner) only includesmodel,messages,temperature,max_tokens,stream. Nofrequency_penaltyorpresence_penalty. DMR therefore runs with OpenAI's defaults of 0.0, meaning ZERO repetition pressure during sampling.Meanwhile
llamacpp_adapter.rs:391+llamacpp_scheduler.rs:648setsampling.repeat_penalty = 1.1for the in-process Mac path. So the platforms diverge: Mac in-process has a mild penalty, Linux DMR has none.Symptom: Linux/CUDA chat shows personas verbatim-echoing each other's prior responses — e.g. multiple personas in turn outputting the same "Sentinel: dev/build-feature ... please continue to support the team and provide any necessary updates" block. Conversation degrades into a self-reinforcing echo loop within 3-4 turns.
Repro
Look for verbatim repetition of multi-paragraph text across personas.
Fix shape
Add to
openai_adapter.rs:619body construction:Defaults chosen to match the perceptual mildness of llamacpp's
repeat_penalty=1.1— strong enough to break verbatim repetition without forcing weird vocabulary.Should also evaluate whether to bump
llamacpp_adapter's default to 1.15 if echo loops persist on Mac after the Linux gap closes.Linked
PR #950 — coherent personas merge gate (RULE 1: Mac vs Linux paths must converge in behavior).