PrivateLM

A production-ready, cross-platform AI chat client built with Flutter. It unifies local on-device LLM inference (Android) with cloud API access, giving users full control over how their models run.

Image generation tested on Moto G71 (Snapdragon), Oneplus 10r (Mediatek), Pixel 6A (Tensor), Poco F1 (Snapdragon), Samsung s23 (Snapdragon) 4 steps fast

Generated on pixel 6 with 20 step

What It Does

Local Inference on Android — Download and run GGUF models directly on your phone using GPU-accelerated inference (Vulkan). No internet required after download.
Cloud API Fallback — Seamlessly switch to OpenAI, Anthropic, Google Gemini, or Kimi (Moonshot AI) when you need more power or are on unsupported platforms.
Multimodal Chat — Send text and images in conversations. Vision support works with both local models (Qwen2-VL) and cloud providers.
Persistent Sessions — All chats, tasks, and settings are stored locally via Hive. Nothing leaves your device unless you explicitly choose cloud mode.
Background Services — Firebase Cloud Messaging integration for push updates and background task handling.
Smart Auto-Configuration — On first launch, the app detects your device's RAM and recommends optimal context size and token limits automatically.
Task Management — A dedicated task view for structured AI-assisted workflows alongside free-form chat.

Technical Architecture

Stack

Framework: Flutter 3.x (Dart >=3.3.0)
State Management: GetX
Local Storage: Hive
Networking: Dio + package:http
Background Execution: flutter_background_service + flutter_local_notifications
Push Notifications: Firebase Core + Firebase Messaging

Inference Pipeline

┌─────────────────────────────────────────────────────────────┐
│                        UI Layer                              │
│   ChatView / TaskView / ModelView / SettingsView            │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│                    Controllers (GetX)                        │
│   ChatController · TaskController · ModelController         │
│   SettingsController · HomeController                       │
└──────────────────────────┬──────────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────────┐
│                      Services                                │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────┐ │
│  │ InferenceService│  │  CloudService   │  │DownloadSvc  │ │
│  │  (local GGUF)   │  │ (OpenAI/Claude/ │  │ (model dl)  │ │
│  │                 │  │  Gemini/Kimi)   │  │             │ │
│  └─────────────────┘  └─────────────────┘  └─────────────┘ │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────┐ │
│  │  HiveService    │  │ DeviceInfoSvc   │  │ExecutionSvc │ │
│  │  ( persistence) │  │  (RAM/GPU tier) │  │ (bg tasks)  │ │
│  └─────────────────┘  └─────────────────┘  └─────────────┘ │
└─────────────────────────────────────────────────────────────┘

Local Inference (Android)

The app uses llama_flutter_android, a custom Flutter plugin wrapping llama.cpp for ARM64 devices. At runtime it:

Detects GPU capabilities via Vulkan to determine offload layers.
Selects thread count based on device tier (ultra / high / mid / low).
Loads the GGUF model with progress streaming.
Generates tokens via generateChat() with native chat-template support (ChatML, Llama-3, Gemma, Phi).
Falls back to manual prompt construction if native templates fail.

Idle detection (5s) and hard timeouts (180s) keep the UX responsive even on underpowered hardware.

Cloud Inference

CloudService normalizes four different API shapes into a single interface:

OpenAI — standard /v1/chat/completions
Anthropic — Messages API with separate system param
Google Gemini — generateContent with inline image base64
Kimi — OpenAI-compatible endpoint from Moonshot AI

API keys are stored in Hive and never transmitted anywhere except to the provider's endpoint.

Cross-Platform Abstraction

Local inference is conditionally compiled:

Android → inference_android.dart (full llama.cpp engine)
Web → inference_stub.dart (cloud-only, local coming soon)
iOS → inference_android.dart (full llama.cpp engine via Metal GPU)

The InferenceService exposes supportsLocalInference so the UI can hide local-model UI on unsupported platforms.

Supported Platforms

Platform	Local Inference	Cloud APIs	Notes
Android	✅ Yes	✅ Yes	CPU offload via NEON; minSdk 28
iOS	✅ Yes	✅ Yes	Metal GPU acceleration
Web	❌ No	✅ Yes	Cloud-only (local coming soon)

iOS / iPad

The iPad release is distributed as a standalone ZIP package for sideloading. Download the latest PrivateLM-iOS.zip from the Releases page, extract it, and install the .ipa via AltStore, Sideloadly, or Xcode. iPhone support is experimental — iPad is the recommended iOS target due to RAM requirements for local models.

Build Configuration

Prerequisites

Flutter SDK >=3.3.0
Android SDK (API 26+)
JDK 17
NDK (bundled with Android SDK)

Android

flutter pub get
cd android
./gradlew assembleDebug   # or assembleRelease

For release builds you should configure your own signing in android/app/build.gradle.kts:

buildTypes {
    release {
        signingConfig = signingConfigs.getByName("release")
        isMinifyEnabled = true
        isShrinkResources = true
        proguardFiles(
            getDefaultProguardFile("proguard-android-optimize.txt"),
            "proguard-rules.pro"
        )
    }
}

iOS

flutter pub get
cd ios
pod install
flutter build ios

Web

flutter pub get
flutter build web --release

License

MIT — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
android		android
assets		assets
ios		ios
lib		lib
local_plugins		local_plugins
test		test
web		web
.firebaserc		.firebaserc
.gitignore		.gitignore
.metadata		.metadata
IMG_2390.png		IMG_2390.png
LICENSE		LICENSE
PrivateLM.png		PrivateLM.png
README.md		README.md
analysis_options.yaml		analysis_options.yaml
firebase.json		firebase.json
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PrivateLM

What It Does

Technical Architecture

Stack

Inference Pipeline

Local Inference (Android)

Cloud Inference

Cross-Platform Abstraction

Supported Platforms

iOS / iPad

Build Configuration

Prerequisites

Android

iOS

Web

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PrivateLM

What It Does

Technical Architecture

Stack

Inference Pipeline

Local Inference (Android)

Cloud Inference

Cross-Platform Abstraction

Supported Platforms

iOS / iPad

Build Configuration

Prerequisites

Android

iOS

Web

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages