Unified Swift 6.2 SDK for local and cloud LLM inference
Conduit gives you a single Swift-native API that can target Anthropic, OpenRouter, Ollama, MLX, HuggingFace, and Apple’s Foundation Models without rewriting your prompt pipeline. Everything conforms to TextGenerator, so switching between highly capable Claude 4.5, GPT-5.2 on OpenRouter, Ollama-hosted Llama3, and local MLX is literally swapping one initializer.
- One API, Many Providers — Switch between local (MLX), cloud (Anthropic, OpenAI, HuggingFace, OpenRouter), and system (Foundation Models) with minimal code changes
- Download Models from HuggingFace — Download any model from HuggingFace Hub for local MLX inference with progress tracking
- Type-Safe Structured Output — Generate Swift types directly from LLM responses with the
@Generablemacro - Privacy-First Options — Run models entirely on-device with MLX, Ollama, or Foundation Models
- Swift 6.2 Concurrency — Built from the ground up with actors, Sendable types, and AsyncSequence
- Features
- Installation
- Quick Start
- Providers
- MLXProvider
- HuggingFaceProvider
- Foundation Models
- Anthropic Claude
- OpenAI Provider (OpenAI, OpenRouter, Ollama, Azure)
- Model Management
- Streaming
- Structured Output
- Tool Calling
- ChatSession
- Documentation
- Contributing
- Community
- License
| Capability | MLX | HuggingFace | Anthropic | OpenAI | Foundation Models |
|---|---|---|---|---|---|
| Text Generation | ✓ | ✓ | ✓ | ✓ | ✓ |
| Streaming | ✓ | ✓ | ✓ | ✓ | ✓ |
| Structured Output | ✓ | ✓ | ✓ | ✓ | ✓ |
| Tool Calling | — | — | ✓ | ✓ | — |
| Vision | — | — | ✓ | ✓ | — |
| Extended Thinking | — | — | ✓ | — | — |
| Embeddings | — | ✓ | — | ✓ | — |
| Transcription | — | ✓ | — | ✓ | — |
| Image Generation | — | ✓ | — | ✓ | — |
| Token Counting | ✓ | — | — | ✓* | — |
| Offline | ✓ | — | — | —** | ✓ |
| Privacy | ✓ | — | — | —** | ✓ |
*Estimated token counting **Offline/privacy available when using Ollama local endpoint
Add Conduit to your Package.swift:
dependencies: [
.package(url: "https://github.com/christopherkarani/Conduit", from: "0.3.0")
]Then add "Conduit" to your target's dependencies.
To use on-device MLX inference, enable the MLX trait:
dependencies: [
.package(url: "https://github.com/christopherkarani/Conduit", from: "0.3.0", traits: ["MLX"])
]Note: MLX requires Apple Silicon with Metal GPU. Without the trait, only cloud providers are available.
| Platform | Status | Available Providers |
|---|---|---|
| macOS 14+ | Full | MLX, Anthropic, OpenAI, HuggingFace, Foundation Models |
| iOS 17+ | Full | MLX, Anthropic, OpenAI, HuggingFace, Foundation Models |
| visionOS 1+ | Full | MLX, Anthropic, OpenAI, HuggingFace, Foundation Models |
| Linux | Partial | Anthropic, OpenAI, HuggingFace |
Conduit supports Linux for server-side Swift deployments. Build normally with Swift 6.2+:
swift build
swift testBy default, MLX dependencies are excluded (no trait enabled). This makes Conduit Linux-compatible out of the box.
- MLX Provider: Requires Apple Silicon with Metal GPU (not available on Linux)
- Foundation Models: Requires iOS 26+/macOS 26+ (not available on Linux)
- Image Generation:
GeneratedImage.imagereturnsnil(usedataorsave(to:)) - Keychain: Token storage falls back to environment variables
For local LLM inference on Linux, use Ollama via the OpenAI provider:
# Install Ollama on Linux
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3.2import Conduit
// Use Ollama for local inference on Linux
let provider = OpenAIProvider(endpoint: .ollama, apiKey: nil)
let response = try await provider.generate(
"Hello from Linux!",
model: .ollama("llama3.2"),
config: .default
)import Conduit
enum ExampleError: Error {
case missingAPIKey(String)
}Local inference on Apple Silicon. Zero network traffic, complete privacy.
Best for: Privacy-sensitive apps, offline functionality, consistent latency
// Default configuration
let provider = MLXProvider()
// Optimized for M1 devices
let provider = MLXProvider(configuration: .m1Optimized)
// Full control
let config = MLXConfiguration.default
.memoryLimit(.gigabytes(8))
.withQuantizedKVCache(bits: 4)
let provider = MLXProvider(configuration: config)Configuration Presets:
| Preset | Memory | Use Case |
|---|---|---|
.default |
Auto | Balanced performance |
.m1Optimized |
6 GB | M1 MacBooks, base iPads |
.mProOptimized |
12 GB | M1/M2 Pro, Max chips |
.memoryEfficient |
4 GB | Constrained devices |
.highPerformance |
16+ GB | M2/M3 Max, Ultra |
Warmup for Fast First Response:
let provider = MLXProvider()
// Warm up Metal shaders before first generation
try await provider.warmUp(model: .llama3_2_1B, maxTokens: 5)
// Now first response is fast
let response = try await provider.generate("Hello", model: .llama3_2_1B)Cloud inference via HuggingFace Inference API. Access hundreds of models.
Best for: Large models, embeddings, transcription, image generation, model variety
Setup:
export HF_TOKEN=hf_your_token_here// Auto-detects HF_TOKEN from environment
let provider = HuggingFaceProvider()
// Or provide token explicitly
let provider = HuggingFaceProvider(token: "hf_...")
// Custom configuration
let config = HFConfiguration.default.timeout(120)
let provider = HuggingFaceProvider(configuration: config)Embeddings:
let provider = HuggingFaceProvider()
let embedding = try await provider.embed(
"Conduit makes LLM inference easy",
model: .huggingFace("sentence-transformers/all-MiniLM-L6-v2")
)
print("Dimensions: \(embedding.dimensions)")
print("Vector: \(embedding.vector)")
// Similarity comparison
let other = try await provider.embed("AI frameworks for Swift", model: /* ... */)
let similarity = embedding.cosineSimilarity(with: other)Transcription:
let provider = HuggingFaceProvider()
let result = try await provider.transcribe(
audioURL: audioFileURL,
model: .huggingFace("openai/whisper-large-v3"),
config: .detailed
)
print(result.text)
for segment in result.segments {
print("\(segment.startTime)s: \(segment.text)")
}Image Generation:
let provider = HuggingFaceProvider()
// Simple text-to-image with defaults
let result = try await provider.textToImage(
"A cat wearing a top hat, digital art",
model: .huggingFace("stabilityai/stable-diffusion-3")
)
// Use directly in SwiftUI
result.image // SwiftUI Image (cross-platform)
// With configuration presets
let result = try await provider.textToImage(
"Mountain landscape at sunset, photorealistic",
model: .huggingFace("stabilityai/stable-diffusion-xl-base-1.0"),
config: .highQuality.width(1024).height(768)
)
// Available presets: .default, .highQuality, .fast, .square512, .square1024, .landscape, .portrait
// Save to file
try result.save(to: URL.documentsDirectory.appending(path: "landscape.png"))
// Save to Photos library (iOS only, requires NSPhotoLibraryAddUsageDescription)
try await result.saveToPhotos()
// Access raw data if needed
let data = result.dataSystem-integrated on-device AI. Zero setup, managed by the OS.
Best for: iOS 26+ apps, system integration, no model downloads
if #available(iOS 26.0, *) {
let provider = FoundationModelsProvider()
let response = try await provider.generate(
"What can you help me with?",
model: .foundationModels,
config: .default
)
print(response)
}Conduit includes first-class support for Anthropic's Claude models via the Anthropic API.
Best for: Advanced reasoning, vision tasks, extended thinking, production applications
Setup:
export ANTHROPIC_API_KEY=sk-ant-api-03-...import Conduit
// Simple generation
let provider = AnthropicProvider(apiKey: "sk-ant-...")
let response = try await provider.generate(
"Explain quantum computing",
model: .claudeSonnet45,
config: .default.maxTokens(500)
)
// Streaming
for try await chunk in provider.stream(
"Write a poem about Swift",
model: .claude3Haiku,
config: .default
) {
print(chunk, terminator: "")
}Available Models:
| Model | ID | Best For |
|---|---|---|
| Claude Opus 4.5 | .claudeOpus45 |
Most capable, complex reasoning |
| Claude Sonnet 4.5 | .claudeSonnet45 |
Balanced performance and speed |
| Claude 3.5 Sonnet | .claude35Sonnet |
Fast, high-quality responses |
| Claude 3 Haiku | .claude3Haiku |
Fastest, most cost-effective |
Features:
- Text generation (streaming and non-streaming)
- Multi-turn conversations with context
- Vision support (multimodal image+text)
- Extended thinking mode for complex reasoning
- Comprehensive error handling
- Environment variable support (ANTHROPIC_API_KEY)
Vision Example:
let messages = Messages {
Message.user([
.text("What's in this image?"),
.image(base64Data: imageData, mimeType: "image/jpeg")
])
}
let result = try await provider.generate(
messages: messages,
model: .claudeSonnet45,
config: .default
)Extended Thinking:
var config = AnthropicConfiguration.standard(apiKey: "sk-ant-...")
config.thinkingConfig = .standard
let provider = AnthropicProvider(configuration: config)
let result = try await provider.generate(
"Solve this complex problem...",
model: .claudeOpus45,
config: .default
)Get your API key at: https://console.anthropic.com/
Conduit includes a powerful OpenAI-compatible provider that works with multiple backends through a unified interface.
Supported Backends:
- OpenAI — Official GPT-4, DALL-E, Whisper APIs
- OpenRouter — Aggregator with access to OpenAI, Anthropic, Google, and 100+ models
- Ollama — Local inference server for offline/privacy use
- Azure OpenAI — Microsoft's enterprise OpenAI service
- Custom — Any OpenAI-compatible endpoint
import Conduit
// Simple usage
let provider = OpenAIProvider(apiKey: "sk-...")
let response = try await provider.generate("Hello", model: .gpt4o)
// Streaming
for try await chunk in provider.stream("Tell me a story", model: .gpt4oMini) {
print(chunk, terminator: "")
}Available Models:
| Model | ID | Best For |
|---|---|---|
| GPT-4o | .gpt4o |
Latest multimodal flagship |
| GPT-4o Mini | .gpt4oMini |
Fast, cost-effective |
| GPT-4 Turbo | .gpt4Turbo |
Vision + function calling |
| o1 | .o1 |
Complex reasoning |
| o1 Mini | .o1Mini |
Fast reasoning |
| o3 Mini | .o3Mini |
Latest mini reasoning |
Setup:
export OPENAI_API_KEY=sk-...Access 100+ models from OpenAI, Anthropic, Google, Mistral, and more through a single API.
// Simple usage
let provider = OpenAIProvider(endpoint: .openRouter, apiKey: "sk-or-...")
let response = try await provider.generate(
"Explain quantum computing",
model: .openRouter("anthropic/claude-3-opus")
)
// With routing configuration
let config = OpenAIConfiguration(
endpoint: .openRouter,
authentication: .bearer("sk-or-..."),
openRouterConfig: OpenRouterRoutingConfig(
providers: [.anthropic, .openai], // Prefer these providers
fallbacks: true, // Auto-fallback on failure
routeByLatency: true // Route to fastest provider
)
)
let provider = OpenAIProvider(configuration: config)Popular OpenRouter Models:
// OpenAI via OpenRouter
.openRouter("openai/gpt-4-turbo")
// Anthropic via OpenRouter
.openRouter("anthropic/claude-3-opus")
.claudeOpus // Convenience alias
.claudeSonnet
.claudeHaiku
// Google via OpenRouter
.openRouter("google/gemini-pro-1.5")
.geminiPro15 // Convenience alias
// Meta via OpenRouter
.openRouter("meta-llama/llama-3.1-70b-instruct")
.llama31B70B // Convenience alias
// Mistral via OpenRouter
.openRouter("mistralai/mixtral-8x7b-instruct")Setup:
export OPENROUTER_API_KEY=sk-or-...Get your API key at: https://openrouter.ai/keys
Run LLMs locally on your machine with complete privacy—no API key required.
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull llama3.2// Default localhost:11434
let provider = OpenAIProvider(endpoint: .ollama())
let response = try await provider.generate(
"Hello from local inference!",
model: .ollama("llama3.2")
)
// Custom host/port
let provider = OpenAIProvider(endpoint: .ollama(host: "192.168.1.100", port: 11434))
// With Ollama-specific configuration
let config = OpenAIConfiguration(
endpoint: .ollama(),
authentication: .none,
ollamaConfig: OllamaConfiguration(
keepAlive: "30m", // Keep model in memory
pullOnMissing: true, // Auto-download models
numGPU: 35 // GPU layers to use
)
)
let provider = OpenAIProvider(configuration: config)Popular Ollama Models:
.ollamaLlama32 // Llama 3.2 (default size)
.ollamaLlama32B3B // Llama 3.2 3B
.ollamaLlama32B1B // Llama 3.2 1B
.ollamaMistral // Mistral 7B
.ollamaCodeLlama // CodeLlama 7B
.ollamaPhi3 // Phi-3
.ollamaGemma2 // Gemma 2
.ollamaQwen25 // Qwen 2.5
.ollamaDeepseekCoder // DeepSeek Coder
// Any Ollama model by name
.ollama("llama3.2:3b")
.ollama("codellama:7b-instruct")Ollama Configuration Presets:
OllamaConfiguration.default // Standard settings
OllamaConfiguration.lowMemory // For constrained systems
OllamaConfiguration.interactive // Longer keep-alive for chat
OllamaConfiguration.batch // Unload immediately after use
OllamaConfiguration.alwaysOn // Keep model loaded indefinitelyMicrosoft's enterprise Azure-hosted OpenAI service with compliance and security features.
let provider = OpenAIProvider(
endpoint: .azure(
resource: "my-resource",
deployment: "gpt-4",
apiVersion: "2024-02-15-preview"
),
apiKey: "azure-api-key"
)
let response = try await provider.generate(
"Hello from Azure",
model: .azure(deployment: "gpt-4")
)Use any OpenAI-compatible API endpoint (self-hosted, proxy servers, etc.):
let provider = OpenAIProvider(
endpoint: .custom(URL(string: "https://my-proxy.com/v1")!),
apiKey: "custom-key"
)Image Generation (DALL-E):
let provider = OpenAIProvider(apiKey: "sk-...")
let image = try await provider.textToImage(
"A cat astronaut on the moon",
model: .dallE3,
config: .highQuality.size(.square1024)
)
// Use in SwiftUI
image.image
// Save to file
try image.save(to: URL.documentsDirectory.appending(path: "cat.png"))Embeddings:
let provider = OpenAIProvider(apiKey: "sk-...")
let embedding = try await provider.embed(
"Conduit makes LLM inference easy",
model: .textEmbedding3Small
)
print("Dimensions: \(embedding.dimensions)")Capability Detection:
let provider = OpenAIProvider(endpoint: .openRouter, apiKey: "...")
let caps = await provider.capabilities
if caps.contains(.imageGeneration) {
// DALL-E available
}
if caps.supports(.functionCalling) {
// Tool calling available
}Access 200+ models from OpenAI, Anthropic, Google, Meta, and more through a single unified API.
Best for: Model flexibility, provider redundancy, cost optimization, trying different models
Setup:
export OPENROUTER_API_KEY=sk-or-...import Conduit
// Simple - minimal configuration
let provider = OpenAIProvider(openRouterKey: "sk-or-...")
let response = try await provider.generate(
"Hello",
model: .openRouter("anthropic/claude-3-opus"),
config: .default
)
// Optimized for Claude models
let provider = OpenAIProvider.forClaude(apiKey: "sk-or-...")
let response = try await provider.generate(
"Hello",
model: .claudeOpus, // Predefined model constant
config: .default
)
// Fastest available provider
let provider = OpenAIProvider.fastest(apiKey: "sk-or-...")Advanced Routing:
// Configure provider preferences and fallbacks
let config = OpenAIConfiguration.openRouter(apiKey: "sk-or-...")
.preferring(.anthropic, .openai)
.routeByLatency()
let provider = OpenAIProvider(configuration: config)
// Full control with OpenRouterRoutingConfig
let routing = OpenRouterRoutingConfig(
providers: [.anthropic, .openai],
fallbacks: true,
routeByLatency: true,
dataCollection: .deny // Privacy control
)
let config = OpenAIConfiguration.openRouter(apiKey: "sk-or-...")
.routing(routing)Streaming:
for try await chunk in provider.stream(
"Write a story",
model: .openRouter("meta-llama/llama-3.1-70b-instruct"),
config: .default
) {
print(chunk.text, terminator: "")
}Model Format:
OpenRouter uses provider/model format:
openai/gpt-4-turboanthropic/claude-3-opusgoogle/gemini-pro-1.5meta-llama/llama-3.1-70b-instruct
Predefined Model Constants:
| Constant | Model |
|---|---|
.claudeOpus |
anthropic/claude-3-opus |
.claudeSonnet |
anthropic/claude-3-sonnet |
.geminiPro15 |
google/gemini-pro-1.5 |
.llama31B70B |
meta-llama/llama-3.1-70b-instruct |
.mixtral8x7B |
mistralai/mixtral-8x7b-instruct |
Features:
- 200+ models from 20+ providers
- Automatic fallbacks on provider failure
- Latency-based routing
- Provider preference ordering
- Data collection controls (privacy)
- Streaming support
- Tool/function calling
Get your API key at: https://openrouter.ai/
Conduit requires explicit model selection—no magic auto-detection:
// MLX models (local)
.mlx("mlx-community/Llama-3.2-1B-Instruct-4bit")
.llama3_2_1B // Convenience alias
.phi4
.qwen2_5_3B
// HuggingFace models (cloud)
.huggingFace("meta-llama/Llama-3.1-70B-Instruct")
.huggingFace("sentence-transformers/all-MiniLM-L6-v2")
// Anthropic models (cloud)
.claudeOpus45
.claudeSonnet45
.claude35Sonnet
.claude3Haiku
// OpenAI models (cloud)
.gpt4o
.gpt4oMini
.gpt4Turbo
.o1
.o3Mini
// OpenRouter models (cloud aggregator)
.openRouter("anthropic/claude-3-opus")
.openRouter("google/gemini-pro-1.5")
.openRouter("meta-llama/llama-3.1-70b-instruct")
// Ollama models (local)
.ollama("llama3.2")
.ollamaLlama32
.ollamaMistral
.ollamaCodeLlama
// Azure OpenAI (enterprise cloud)
.azure(deployment: "my-gpt4-deployment")
// Foundation Models (iOS 26+)
.foundationModelsControl generation behavior with presets or custom settings:
// Presets
.default // temperature: 0.7, topP: 0.9
.creative // temperature: 1.0, topP: 0.95
.precise // temperature: 0.3, topP: 0.8
.code // temperature: 0.2, topP: 0.9
// Custom
let config = GenerateConfig(
temperature: 0.8,
maxTokens: 500,
topP: 0.9,
stopSequences: ["END"]
)
// Fluent API
let config = GenerateConfig.default
.temperature(0.8)
.maxTokens(500)Build conversations with the Messages result builder:
let messages = Messages {
Message.system("You are a Swift expert.")
Message.user("What are actors?")
}
let result = try await provider.generate(
messages: messages,
model: .llama3_2_1B,
config: .default
)
print(result.text)
print("Tokens: \(result.usage.totalTokens)")Real-time token streaming with AsyncSequence:
// Simple text streaming
for try await text in provider.stream("Tell me a joke", model: .llama3_2_1B) {
print(text, terminator: "")
}
// With metadata
let stream = provider.streamWithMetadata(
messages: messages,
model: .llama3_2_1B,
config: .default
)
for try await chunk in stream {
print(chunk.text, terminator: "")
if let tokensPerSecond = chunk.tokensPerSecond {
// Track performance
}
if let reason = chunk.finishReason {
print("\nFinished: \(reason)")
}
return key
}
let prompt = "Plan a three-day SwiftUI sprint for a side project with daily goals."
func run<P: TextGenerator>(provider: P, model: P.ModelID) async throws -> String {
try await provider.generate(prompt, model: model, config: .creative)
}
func gatherPlans() async throws {
let anthropic = AnthropicProvider(apiKey: try requireAPIKey("ANTHROPIC_API_KEY"))
let openRouter = OpenAIProvider.forOpenRouter(
apiKey: try requireAPIKey("OPENROUTER_API_KEY"),
preferring: [.anthropic, .openai]
)
let ollama = OpenAIProvider(endpoint: .ollama, apiKey: nil)
let mlx = MLXProvider()
let plans: [(label: String, job: () async throws -> String)] = [
("Claude Opus 4.5", { try await run(provider: anthropic, model: .claudeOpus45) }),
("OpenRouter GPT-5.2", { try await run(provider: openRouter, model: .openRouter("openai/gpt-5.2-opus")) }),
("Ollama Llama3.2", { try await run(provider: ollama, model: .ollamaLlama32) }),
("MLX Llama3.2 1B", { try await run(provider: mlx, model: .llama3_2_1b) })
]
for plan in plans {
let text = try await plan.job()
print("\(plan.label): \(text)\n")
}
}Every provider call uses the same run helper, so you can part-hybrid your stack (a private MLX answer plus a Claude-derived reasoning path) without copying prompts.
Want a single line of code that just works? Conduit keeps it simple:
import Conduit
let provider = MLXProvider()
let quickWins = try await provider.generate(
"Explain how `async let` differs from `Task` in Swift.",
model: .llama3_2_1b
)
print(quickWins)Enable the MLX trait in Package.swift when targeting Apple Silicon, or switch to AnthropicProvider, OpenAIProvider, or HuggingFaceProvider for cloud-ready inference.
Conduit provides a comprehensive model management system for downloading models from HuggingFace Hub and managing local storage.
Download any model from HuggingFace Hub for local MLX inference:
let manager = ModelManager.shared
// Download a pre-configured model
let url = try await manager.download(.llama3_2_1B) { progress in
print("Downloading: \(progress.percentComplete)%")
if let speed = progress.formattedSpeed {
print("Speed: \(speed)")
}
if let eta = progress.formattedETA {
print("ETA: \(eta)")
}
}
// Download any HuggingFace model by repository ID
let customModel = ModelIdentifier.mlx("mlx-community/Mistral-7B-Instruct-v0.3-4bit")
let url = try await manager.download(customModel)Finding Models:
Browse the mlx-community on HuggingFace for 4-bit quantized models optimized for Apple Silicon. Any model with MLX-compatible weights can be downloaded.
Validate model compatibility before downloading to avoid wasted bandwidth:
do {
// Validates MLX compatibility, estimates size, then downloads
let url = try await manager.downloadValidated(.llama3_2_1B) { progress in
print("Progress: \(progress.percentComplete)%")
}
} catch AIError.incompatibleModel(let model, let reasons) {
print("Cannot download \(model.rawValue):")
for reason in reasons {
print(" - \(reason)")
}
}Check download size before committing:
if let size = await manager.estimateDownloadSize(.llama3_2_1B) {
print("Download size: \(size.formatted)") // e.g., "2.1 GB"
// Check available storage
let available = try FileManager.default.availableCapacity(forUsage: .opportunistic)
if available < size.bytes {
print("Warning: Insufficient storage space")
}
}The DownloadTask is @Observable for seamless SwiftUI integration:
struct ModelDownloadView: View {
@State private var downloadTask: DownloadTask?
var body: some View {
if let task = downloadTask {
VStack {
ProgressView(value: task.progress.fractionCompleted)
Text("\(task.progress.percentComplete)%")
if let speed = task.progress.formattedSpeed {
Text(speed)
}
Button("Cancel") { task.cancel() }
}
} else {
Button("Download") {
Task {
downloadTask = await ModelManager.shared.downloadTask(for: .llama3_2_1B)
}
}
}
}
}let manager = ModelManager.shared
// Check if model is cached
if await manager.isCached(.llama3_2_1B) {
print("Model ready")
}
// Get local path for cached model
if let path = await manager.localPath(for: .llama3_2_1B) {
print("Model at: \(path)")
}
// List all cached models
let cached = try await manager.cachedModels()
for model in cached {
print("\(model.identifier.displayName): \(model.size.formatted)")
}
// Cache size
let size = await manager.cacheSize()
print("Cache size: \(size.formatted)")
// Evict least-recently-used models to fit storage limit
try await manager.evictToFit(maxSize: .gigabytes(30))
// Remove specific model
try await manager.delete(.llama3_2_1B)
// Clear entire cache
try await manager.clearCache()Discover available models with metadata:
// Get all known models
let allModels = ModelRegistry.allModels
// Filter by provider
let mlxModels = ModelRegistry.models(for: .mlx)
let cloudModels = ModelRegistry.models(for: .huggingFace)
// Filter by capability
let embeddingModels = ModelRegistry.models(with: .embeddings)
let reasoningModels = ModelRegistry.models(with: .reasoning)
// Get recommended models
let recommended = ModelRegistry.recommendedModels()
// Look up model info
if let info = ModelRegistry.info(for: .llama3_2_1B) {
print("Name: \(info.name)")
print("Size: \(info.size.displayName)")
print("Context: \(info.contextWindow) tokens")
print("Disk: \(info.diskSize?.formatted ?? "N/A")")
}Storage Location:
- MLX models:
~/Library/Caches/Conduit/Models/mlx/ - HuggingFace models:
~/Library/Caches/Conduit/Models/huggingface/
Manage context windows with precise token counts:
dependencies: [
.package(url: "https://github.com/christopherkarani/Conduit", from: "2.0.0", traits: ["MLX"])
]Then add "Conduit" to your target dependencies.
Documentation-driven examples are covered by Tests/ConduitTests/DocumentationExamplesTests.swift. Run
the tests that keep the README code working:
swift test --filter DocumentationExamplesTests| Platform | Available Providers |
|---|---|
| macOS 14+ | MLX, Anthropic, OpenRouter/OpenAI, HuggingFace, Foundation Models |
| iOS 17+ / visionOS 1+ | MLX, Anthropic, OpenRouter/OpenAI, HuggingFace, Foundation Models |
| Linux | Anthropic, OpenRouter/OpenAI, HuggingFace, Ollama |
MLX runs on Apple Silicon only; Linux builds exclude MLX by default. Most cloud providers require network connectivity whereas MLX and Ollama (local server mode) work offline.
- Protocol-first: everything conforms to
TextGenerator,TranscriptionGenerator, orEmbeddingGeneratorso your app code stays provider-agnostic. - Explicit model selection: choose
AnthropicModelID,OpenAIModelID, orModelIdentifiersymbols so there is no guesswork about which model is in use. - Streaming + structured output: shared helpers for chunk streaming, structured response macros (
@Generable), and tool execution keep advanced scenarios consistent across providers.
Comprehensive guides are available in the docs folder:
| Guide | Description |
|---|---|
| Getting Started | Installation, setup, and first generation |
| Providers | Detailed guides for each provider |
| Structured Output | Type-safe responses with @Generable |
| Tool Calling | Define and execute LLM-invokable tools |
| Streaming | Real-time token streaming patterns |
| ChatSession | Stateful conversation management |
| Model Management | Download, cache, and manage models |
| Error Handling | Handle errors gracefully |
| Architecture | Design principles and internals |
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
Please ensure your code:
- Follows existing code style and conventions
- Includes tests for new functionality
- Updates documentation as needed
- Maintains backward compatibility
- GitHub Discussions — Ask questions, share ideas
- GitHub Issues — Report bugs, request features
MIT License — see LICENSE for details.
