RAY AI is a high-performance, private, and secure offline AI assistant for Android. It leverages llama.cpp and ONNX Runtime to provide state-of-the-art LLM inference directly on your device without requiring an internet connection.
- 100% Offline Inference: No data leaves your device. Private by design.
- llama.cpp Integration: High-performance C++ backend optimized for mobile ARM processors.
- Dual Engine Support: Supports both GGUF (via llama.cpp) and ONNX models.
- Modern Copilot UI: Commercial-grade Material 3 design with smooth animations.
- Model Management: Download and manage lightweight models optimized for mobile (TinyLlama, Qwen, etc.).
- Real-time Streaming: Watch responses generate token-by-token.
- Secure Storage: Models and sensitive data are protected with hardware-backed encryption.
Here’s how RAY AI looks in action:
![]() |
![]() |
![]() |
![]() |
From left to right, top to bottom:
Chat · Model management · Thinking → Response · Complete answer streaming
- Language: Java / C++ (JNI)
- UI Framework: Android Material Components 3
- Inference Engines:
- llama.cpp (Native C++)
- ONNX Runtime
- Build System: CMake & Gradle
- Min SDK: Android 24 (Nougat)
- Android Studio Koala or newer
- Android NDK 26.1.10909125+
- CMake 3.22.1+
- Clone the repository:
git clone https://github.com/Rohithdgrr/working-RAY-AI.git
- Open the project in Android Studio.
- Sync project with Gradle files.
- Build and run on a physical device (arm64-v8a recommended).
- Architecture - Deep dive into JNI and engine integration.
- Installation Guide - Detailed build instructions and troubleshooting.
- User Guide - How to use the app and manage models.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.



