A production-ready, cross-platform AI chat client built with Flutter. It unifies local on-device LLM inference (Android) with cloud API access, giving users full control over how their models run.
Image generation tested on Moto G71 (Snapdragon), Oneplus 10r (Mediatek), Pixel 6A (Tensor), Poco F1 (Snapdragon), Samsung s23 (Snapdragon) 4 steps fast
Generated on pixel 6 with 20 step
- Local Inference on Android — Download and run GGUF models directly on your phone using GPU-accelerated inference (Vulkan). No internet required after download.
- Cloud API Fallback — Seamlessly switch to OpenAI, Anthropic, Google Gemini, or Kimi (Moonshot AI) when you need more power or are on unsupported platforms.
- Multimodal Chat — Send text and images in conversations. Vision support works with both local models (Qwen2-VL) and cloud providers.
- Persistent Sessions — All chats, tasks, and settings are stored locally via Hive. Nothing leaves your device unless you explicitly choose cloud mode.
- Background Services — Firebase Cloud Messaging integration for push updates and background task handling.
- Smart Auto-Configuration — On first launch, the app detects your device's RAM and recommends optimal context size and token limits automatically.
- Task Management — A dedicated task view for structured AI-assisted workflows alongside free-form chat.
- Framework: Flutter 3.x (Dart >=3.3.0)
- State Management: GetX
- Local Storage: Hive
- Networking: Dio +
package:http - Background Execution:
flutter_background_service+flutter_local_notifications - Push Notifications: Firebase Core + Firebase Messaging
┌─────────────────────────────────────────────────────────────┐
│ UI Layer │
│ ChatView / TaskView / ModelView / SettingsView │
└──────────────────────────┬──────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────┐
│ Controllers (GetX) │
│ ChatController · TaskController · ModelController │
│ SettingsController · HomeController │
└──────────────────────────┬──────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────┐
│ Services │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ InferenceService│ │ CloudService │ │DownloadSvc │ │
│ │ (local GGUF) │ │ (OpenAI/Claude/ │ │ (model dl) │ │
│ │ │ │ Gemini/Kimi) │ │ │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ HiveService │ │ DeviceInfoSvc │ │ExecutionSvc │ │
│ │ ( persistence) │ │ (RAM/GPU tier) │ │ (bg tasks) │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
The app uses llama_flutter_android, a custom Flutter plugin wrapping llama.cpp for ARM64 devices. At runtime it:
- Detects GPU capabilities via Vulkan to determine offload layers.
- Selects thread count based on device tier (ultra / high / mid / low).
- Loads the GGUF model with progress streaming.
- Generates tokens via
generateChat()with native chat-template support (ChatML, Llama-3, Gemma, Phi). - Falls back to manual prompt construction if native templates fail.
Idle detection (5s) and hard timeouts (180s) keep the UX responsive even on underpowered hardware.
CloudService normalizes four different API shapes into a single interface:
- OpenAI — standard
/v1/chat/completions - Anthropic — Messages API with separate system param
- Google Gemini —
generateContentwith inline image base64 - Kimi — OpenAI-compatible endpoint from Moonshot AI
API keys are stored in Hive and never transmitted anywhere except to the provider's endpoint.
Local inference is conditionally compiled:
- Android →
inference_android.dart(full llama.cpp engine) - Web →
inference_stub.dart(cloud-only, local coming soon) - iOS →
inference_android.dart(full llama.cpp engine via Metal GPU)
The InferenceService exposes supportsLocalInference so the UI can hide local-model UI on unsupported platforms.
| Platform | Local Inference | Cloud APIs | Notes |
|---|---|---|---|
| Android | ✅ Yes | ✅ Yes | CPU offload via NEON; minSdk 28 |
| iOS | ✅ Yes | ✅ Yes | Metal GPU acceleration |
| Web | ❌ No | ✅ Yes | Cloud-only (local coming soon) |
The iPad release is distributed as a standalone ZIP package for sideloading. Download the latest PrivateLM-iOS.zip from the Releases page, extract it, and install the .ipa via AltStore, Sideloadly, or Xcode. iPhone support is experimental — iPad is the recommended iOS target due to RAM requirements for local models.
- Flutter SDK >=3.3.0
- Android SDK (API 26+)
- JDK 17
- NDK (bundled with Android SDK)
flutter pub get
cd android
./gradlew assembleDebug # or assembleReleaseFor release builds you should configure your own signing in android/app/build.gradle.kts:
buildTypes {
release {
signingConfig = signingConfigs.getByName("release")
isMinifyEnabled = true
isShrinkResources = true
proguardFiles(
getDefaultProguardFile("proguard-android-optimize.txt"),
"proguard-rules.pro"
)
}
}flutter pub get
cd ios
pod install
flutter build iosflutter pub get
flutter build web --releaseMIT — see LICENSE for details.