Skip to content

Add Foundation Model (on-device LLM) smart assist integration#833

Open
Andrew Carter (andrewroycarter) wants to merge 2 commits intodevelopfrom
feature/foundation-model-integration
Open

Add Foundation Model (on-device LLM) smart assist integration#833
Andrew Carter (andrewroycarter) wants to merge 2 commits intodevelopfrom
feature/foundation-model-integration

Conversation

@andrewroycarter
Copy link
Copy Markdown
Contributor

Summary

Changes by Chris Stroud

  • Adds FoundationModelService — a new SmartAssistService implementation backed by Apple's on-device FoundationModels framework (iOS 26+)
  • Adds SmartAssistService protocol and SmartAssistServiceError to unify the local and cloud backends behind a common interface
  • Adds SmartAssistServiceCoordinator which implements a priority waterfall: try Foundation Models first, fall back to the cloud API, then fall back to the existing CoreML classifier if both fail
  • Adds SmartAssistResponse — a @Generable struct used for structured output from the local model
  • Wires the coordinator into ListeningResponseViewController and ListeningResponseContentViewController
  • Adds foundationModelsSupported property to ListenModeFeatureConfiguration

Changes by Andrew Carter

  • Aligned the FoundationModelService system prompt and SmartAssistResponse generation guides to match the instructions used by the existing cloud API, so both backends produce consistent responses.

The cloud API uses the following system prompt:

You are helping someone with communication assistance needs have a conversation.

* You will be given a transcribed prompt from a speaker the user is talking to.
* You will call the show_responses function with responses to the speaker's prompt.
* The maximum number of responses is 14. Prefer as few as possible. Do not always use 14.
* Prefer responses that do not reflect opinions.
* If a opinionated response is needed, include varying opinions.
* Responses should be short. Prefer 1-3 words is possible.
* If the prompt isn't clear give one option in the responses that requests the speaker to reword the prompt.
* If a response can be 'yes' or 'no', include a simple 'yes' and 'no' responses in addition to the others.

The original FoundationModelService instructions diverged from this — it capped responses at 3–5, allowed up to 10 words per response, and had no guidance on opinions, yes/no responses, or unclear prompts. The @Guide on SmartAssistResponse.responses also enforced .maximumCount(5) at the schema level, which would have overridden the instructions. Both have been updated to match the API's behaviour.

Test plan

  • Build and run on an iOS 26 device/simulator with Foundation Models available — verify smart assist produces responses consistent with the cloud API (short, non-opinionated, yes/no included where appropriate, up to 14 responses)
  • Verify fallback to cloud API when Foundation Models are unavailable
  • Verify fallback to CoreML classifier when both Foundation Models and cloud API are unavailable
  • Verify smart assist disabled path still routes directly to the CoreML classifier

🤖 Generated with Claude Code

private var lastUserResponse: (prompt: String, response: String)?

/// Proactively reset the session after this many turns to avoid context window exhaustion.
private let maxTurnsBeforeReset = 8
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only thing that I think could be improved. Would be weird to have it hear a context-needing question and be like, uh, what?

Maybe good to summarize the context instead.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting problem! I think it quickly turns into a situation where we need multiple signals to make that decision.

What if we asked the model to include a session topic string with each response? So if we encode the timing of each exchange, as well as the context we're already providing, it can reply with everything it already returns, plus an identifier we can use to track when it decides it's likely a new session. That way, we can purge context when the ID changes, avoiding any manual heuristics for those decisions.

Compaction will still be a thing to address, but I think we can do so in a reasonably concise way that's proportional to the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants