feat: add selectable context window (200K/1M) in model picker#156
feat: add selectable context window (200K/1M) in model picker#156rbinar wants to merge 3 commits into
Conversation
Adds a "Context Window" dropdown in the Copilot Chat model configuration panel, letting users switch between 200K and 1M token context windows per model. - Combine reasoningEffort and contextSize into a single configurationSchema so both controls appear in the model picker dropdown (non-public VS Code API, same mechanism used by Copilot for thinking effort). - maxInputTokens raised from 656K to 1,000,000 to support 1M mode. - maxOutputTokens set to 128,000 — a conservative middle ground that keeps the displayed context window reasonable (200K input + 128K output = 328K, 1M input + 128K output = ~1.1M) without sacrificing DeepSeek's actual generation headroom. This does NOT affect API-level max_tokens, which is controlled by a separate VS Code setting. - Sync dropdown choice back to workspace setting (deepseek-copilot.contextSize) after each request so the value persists across sessions. - Both DeepSeek V4 Flash and Pro supported. - English and Chinese i18n for dropdown labels and descriptions. Why: the default 656K input window was too small for long sessions, while always reserving 1M adds latency and cost. Giving users control lets them pick the right trade-off between speed/cost (200K) and capacity (1M).
6ab12af to
73b591d
Compare
|
Hi @rbinar Thanks for working on this. The implementation direction looks aligned with the model configuration mechanism VS Code/Copilot exposes for context size selection. I have one concern about the rationale for choosing 200K as the secondary context size. Since DeepSeek API already provides a 1M context window at the same base price, it would be helpful to understand what evidence led to 200K specifically: is it based on DeepSeek guidance, Copilot harness behavior, latency measurements, task-quality benchmarks, or observed failure modes with larger contexts? There is also a cost-related concern here. A major part of DeepSeek’s cost efficiency comes from prefix caching. If reporting a 200K context window causes Copilot to compact/summarize earlier, the request prefix may change more often and previously cacheable context could be lost. That may silently increase the user’s effective API cost, even if the nominal 200K vs 1M pricing is the same. Could we add some measurements or explanation comparing 200K vs 1M, for example task success rate, compaction frequency, latency, input/cache-hit tokens, and effective cost? Without that, it is hard to evaluate whether 200K is the right trade-off or whether it may accidentally make long-running agent sessions worse. One more compatibility note: this extension currently supports VS Code as low as 1.116. Since context-size model configuration is relatively new and the behavior may differ across VS Code versions, we probably need additional forward/backward compatibility testing around 1.116+ to make sure the dropdown, request options, and fallback behavior all work as expected. |
Context Rot Analysis: Why 200K Outperforms 1MThe "Lost-at-the-Edges" EffectThe benchmark reveals a specific degradation pattern in the 1M context window — the model does not uniformly forget information. Instead, accuracy varies dramatically by needle position:
What This Means
The Root Cause: Attention DilutionIn a 1M-token context, the attention mechanism must distribute its limited capacity across ~1,000,000 tokens. The filler content (config documentation, log excerpts, JSON schemas) competes with the actual needles for attention weight. At 200K tokens, the signal-to-noise ratio is 5× higher — the needle stands out more clearly because there is less irrelevant content around it. The head position is particularly vulnerable in 1M because:
Practical Implication for CopilotCoding sessions place critical context at the beginning (project structure, file layout, imports) and require the model to reference it throughout. If the 1M window loses track of head-positioned information 42.9% of the time, this could manifest as:
200K avoids this by keeping the context dense enough that even early information remains within the model's effective attention radius. Benchmark Methodology & Full ResultsHypothesis
Methodology3-tier benchmark (
Model: Results Summary
Tier 1 — Per-Run Cache Data1M (no compaction) 200K (compaction at 180K tokens) Tier 2 — Per-Item AccuracyFocused (200K) — 19/21 correct (90.5%) Full (1M) — 15/21 correct (71.4%) Key Findings
Recommendation
Caveats
|
|
One more important context: we previously fixed this exact accounting issue in #71. That PR changed the metadata from This PR changes the metadata to:
which reports roughly 1.128M total context to VS Code/Copilot. Could you clarify the source for this new split and whether DeepSeek’s 1M limit is input-only or input+output combined? If the official limit is still 1M total, then this seems to partially revert the fix from #71 and may cause Copilot to over-budget before compaction. I think the context-size selector should preserve correct total-context accounting unless we have a clear source or test showing that |
VS Code/Copilot derives the displayed context window from maxInputTokens + maxOutputTokens. The selector reported 1,000,000 + 128,000 ≈ 1.128M for the 1M option, partially reverting the accounting fixed in Vizards#71. Restore the model default to 655,360 + 393,216 (= 1,048,576, DeepSeek's official combined input+output limit) and map each selectable window to an input/output split that sums to the advertised total (200K → 125,000 + 75,000), preserving the same 5:3 reservation ratio.
|
Thanks @Vizards — agreed, and you're right that it partially reverted #71. Fixed in ed441a6. The root cause was treating the dropdown value as
On the input-only vs combined question: I don't have a public DeepSeek source that splits the 1M into separate input/output budgets, so I deliberately kept #71's assumption that 1M is the total (combined) window rather than introduce a larger total. The base model metadata is back to |
|
Addressing the 1.116+ forward/backward compatibility concern from your first comment. tl;dr — the fallback is safe on every supported version. The worst case is the dropdown not rendering, which silently degrades to the existing 1M default with no regression. Behaviour at each layer
Why this is already-validated territoryThis is the same mechanism the existing Net result
Every access to the non-public fields is optional-chained against unknown shapes, so there's nothing to guard beyond what's already in place. The fallback path is the safe default at every layer. |
Resolve conflict in src/provider/models.ts: main added isBYOK: true to toChatInfo return, while this branch replaced individual maxInputTokens/maxOutputTokens with ...resolveContextWindow(). Keep both.
What
Adds a "Context Window" dropdown in the Copilot Chat model configuration panel, letting users switch between 200K and 1M token context windows per model — just like the existing thinking effort dropdown.
Changes
reasoningEffortandcontextSizeinto a singleconfigurationSchemaso both controls appear in the model pickermaxInputTokens + maxOutputTokens:655,360 + 393,216 = 1,048,576— DeepSeek's official combined limit; this is byte-for-byte the split from Fix DeepSeek V4 reported context window #71, so the default path keeps the corrected total accounting125,000 + 75,000 = 200,000— same 5:3 input:output reservation, scaled down655,360 / 393,216, so only selecting 200K changes the splitdeepseek-copilot.contextSizeworkspace setting after each request, persisting across sessionsWhy
Previously the extension reported a single fixed context window with no way to choose. This gives users control over the speed/cost-vs-capacity trade-off (see the context-rot benchmark in the thread for why a focused 200K window can outperform 1M on retrieval), while keeping the reported total within DeepSeek's real 1M limit and preserving the #71 accounting fix.