Add Foundation Model (on-device LLM) smart assist integration#833
Add Foundation Model (on-device LLM) smart assist integration#833Andrew Carter (andrewroycarter) wants to merge 2 commits intodevelopfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| private var lastUserResponse: (prompt: String, response: String)? | ||
|
|
||
| /// Proactively reset the session after this many turns to avoid context window exhaustion. | ||
| private let maxTurnsBeforeReset = 8 |
There was a problem hiding this comment.
This is the only thing that I think could be improved. Would be weird to have it hear a context-needing question and be like, uh, what?
Maybe good to summarize the context instead.
There was a problem hiding this comment.
This is an interesting problem! I think it quickly turns into a situation where we need multiple signals to make that decision.
What if we asked the model to include a session topic string with each response? So if we encode the timing of each exchange, as well as the context we're already providing, it can reply with everything it already returns, plus an identifier we can use to track when it decides it's likely a new session. That way, we can purge context when the ID changes, avoiding any manual heuristics for those decisions.
Compaction will still be a thing to address, but I think we can do so in a reasonably concise way that's proportional to the problem.
Summary
Changes by Chris Stroud
FoundationModelService— a newSmartAssistServiceimplementation backed by Apple's on-deviceFoundationModelsframework (iOS 26+)SmartAssistServiceprotocol andSmartAssistServiceErrorto unify the local and cloud backends behind a common interfaceSmartAssistServiceCoordinatorwhich implements a priority waterfall: try Foundation Models first, fall back to the cloud API, then fall back to the existing CoreML classifier if both failSmartAssistResponse— a@Generablestruct used for structured output from the local modelListeningResponseViewControllerandListeningResponseContentViewControllerfoundationModelsSupportedproperty toListenModeFeatureConfigurationChanges by Andrew Carter
FoundationModelServicesystem prompt andSmartAssistResponsegeneration guides to match the instructions used by the existing cloud API, so both backends produce consistent responses.The cloud API uses the following system prompt:
The original
FoundationModelServiceinstructions diverged from this — it capped responses at 3–5, allowed up to 10 words per response, and had no guidance on opinions, yes/no responses, or unclear prompts. The@GuideonSmartAssistResponse.responsesalso enforced.maximumCount(5)at the schema level, which would have overridden the instructions. Both have been updated to match the API's behaviour.Test plan
🤖 Generated with Claude Code