Add Foundation Model (on-device LLM) smart assist integration by andrewroycarter · Pull Request #833 · willowtreeapps/vocable-ios

Andrew Carter (andrewroycarter) · 2026-04-01T03:47:20Z

Summary

Changes by Chris Stroud

Adds FoundationModelService — a new SmartAssistService implementation backed by Apple's on-device FoundationModels framework (iOS 26+)
Adds SmartAssistService protocol and SmartAssistServiceError to unify the local and cloud backends behind a common interface
Adds SmartAssistServiceCoordinator which implements a priority waterfall: try Foundation Models first, fall back to the cloud API, then fall back to the existing CoreML classifier if both fail
Adds SmartAssistResponse — a @Generable struct used for structured output from the local model
Wires the coordinator into ListeningResponseViewController and ListeningResponseContentViewController
Adds foundationModelsSupported property to ListenModeFeatureConfiguration

Changes by Andrew Carter

Aligned the FoundationModelService system prompt and SmartAssistResponse generation guides to match the instructions used by the existing cloud API, so both backends produce consistent responses.

The cloud API uses the following system prompt:

You are helping someone with communication assistance needs have a conversation.

* You will be given a transcribed prompt from a speaker the user is talking to.
* You will call the show_responses function with responses to the speaker's prompt.
* The maximum number of responses is 14. Prefer as few as possible. Do not always use 14.
* Prefer responses that do not reflect opinions.
* If a opinionated response is needed, include varying opinions.
* Responses should be short. Prefer 1-3 words is possible.
* If the prompt isn't clear give one option in the responses that requests the speaker to reword the prompt.
* If a response can be 'yes' or 'no', include a simple 'yes' and 'no' responses in addition to the others.

The original FoundationModelService instructions diverged from this — it capped responses at 3–5, allowed up to 10 words per response, and had no guidance on opinions, yes/no responses, or unclear prompts. The @Guide on SmartAssistResponse.responses also enforced .maximumCount(5) at the schema level, which would have overridden the instructions. Both have been updated to match the API's behaviour.

Test plan

Build and run on an iOS 26 device/simulator with Foundation Models available — verify smart assist produces responses consistent with the cloud API (short, non-opinionated, yes/no included where appropriate, up to 14 responses)
Verify fallback to cloud API when Foundation Models are unavailable
Verify fallback to CoreML classifier when both Foundation Models and cloud API are unavailable
Verify smart assist disabled path still routes directly to the CoreML classifier

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Andrew Carter (andrewroycarter) · 2026-04-01T03:53:35Z

Vocable/APIClient/FoundationModelService.swift

+    private var lastUserResponse: (prompt: String, response: String)?
+
+    /// Proactively reset the session after this many turns to avoid context window exhaustion.
+    private let maxTurnsBeforeReset = 8


This is the only thing that I think could be improved. Would be weird to have it hear a context-needing question and be like, uh, what?

Maybe good to summarize the context instead.

This is an interesting problem! I think it quickly turns into a situation where we need multiple signals to make that decision.

What if we asked the model to include a session topic string with each response? So if we encode the timing of each exchange, as well as the context we're already providing, it can reply with everything it already returns, plus an identifier we can use to track when it decides it's likely a new session. That way, we can purge context when the ID changes, avoiding any manual heuristics for those decisions.

Compaction will still be a thing to address, but I think we can do so in a reasonably concise way that's proportional to the problem.

Chris Stroud (Clstroud) and others added 2 commits March 31, 2026 23:19

testing foundation model integration

ff05add

align Foundation Model prompt with cloud API instructions

1d219e9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Andrew Carter (andrewroycarter) requested review from Chris Stroud (Clstroud) and Steve Foster (stevefosterwta) as code owners April 1, 2026 03:47

Andrew Carter (andrewroycarter) commented Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Foundation Model (on-device LLM) smart assist integration#833

Add Foundation Model (on-device LLM) smart assist integration#833
Andrew Carter (andrewroycarter) wants to merge 2 commits intodevelopfrom
feature/foundation-model-integration

Andrew Carter (andrewroycarter) commented Apr 1, 2026

Uh oh!

Andrew Carter (andrewroycarter) Apr 1, 2026

Uh oh!

Chris Stroud (Clstroud) Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Andrew Carter (andrewroycarter) commented Apr 1, 2026

Summary

Changes by Chris Stroud

Changes by Andrew Carter

Test plan

Uh oh!

Andrew Carter (andrewroycarter) Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Chris Stroud (Clstroud) Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants