Skip to content

Modassir2/desktop-assistant

Repository files navigation

💻 Desktop Assistant Agent

A lightweight, terminal-based (TUI) desktop assistant designed for speed and efficiency. It features a beautifully formatted command-line interface and runs completely locally using LM Studio.

✨ Key Features

  • 📟 Rich Terminal UI: Clean, pretty-printed console output for easy reading and structured data display.
  • 🏠 100% Local AI Support: Tested and optimized in local OpenAI endpoints (LM Studio, ollama, llama.cpp/llama-server.exe).
  • ⌨️ Keyboard Driven: Quick text-based commands without leaving your terminal environment.
  • Lightweight: No heavy GUI overhead, running directly in your shell with minimal system usage.

🚀 Getting Started

Model Selection

  • For Local LLM: There are many open source LLMs that can be used. While selecting a LLM for this assistant look for its UI Grounding benchmarks (like screenspot, osworld) to be 75%+ for best results. I would recommend pick the largest model from the Qwen3vl series that can fit on your computer, but don't go lower than 4b! If your own a low end harware then the best model for you is the Qwen3vl 4b with Q4_K_M quantization. Get a Q8_0 quantized mmproj and not the BF16. ( Hugginface Qwen/Qwen3-VL-4B-Instruct-GGUF). Then get ggml-org/llama.cpp release best for your GPU or mac. Run the following code in terminal in the folder with llama.cpp, Qwen3vl 8b Q8_0 and mmproj Q8_0 file-

    .\llama.cpp-b8931\llama-server.exe --model .\Qwen3-VL-4B-Instruct-Q4_K_M.gguf --mmproj .\mmproj-Qwen3-VL-4B-Instruct-q8_0.gguf --flash-attn on --threads <your-cpu-core-count> --ctx-size 8192 --host 0.0.0.0 --port 8002 --parallel 1 -ctv q8_0 -ctk q8_0 --threads-batch 2x<cpu_core_count>

    Note: Add --flash-attn on if you have nvidia GPU for better performance Now the local server will be running at "http://127.0.0.1:8002" and requires no api key.

  • For Cloud LLM: This has been tested to work on the lowest 4b parameter model, thus almost all closed source AI models will work given that it supports images and tool calling/agentic tasks and is good enough at UI grounding.

Installation

  1. Clone the repository:

    git clone https://github.com/Modassir2/desktop-assistant
    cd desktop-assistant
  2. Install dependencies:

    pip install -r requirements.txt
  3. Setup config.json: Open config.json All the settings are obvious but here are the explanations anyway-

    • base_url This sets the base url to look for the OpenAI endpoint, some examples are: Google Gemini (via AI Studio): "https://generativelanguage.googleapis.com/v1beta/openai/"
    • api_key Set the api key for paid AI platforms. For local LLMs usually out any placeholder text as it is not used.
    • image_tokens Sets how many tokens the AI uses to see/process a single image.This is used to calculate and turnacate the history as it grows more than max_tokens . You can ignore and leave to default 1105.
    • max_tokens Set The context limit supported by your api endpoint. At least 8192 tokens is recommended and is what I used to test it.
    • last_n_images The more the number of images, the more the processing time, especially for local LLMs, this this setting allows how many last n images can the AI see in history. Recommnded at least 3. Keep a very large number to set it to infinity (Not-recommnded).
    • width The width of your monitor's resolution, used to calculate the x coordinates to click at from the AI output.
    • height The height of your monitor's resolution, used to calculate the y coordinate to click at from the AI ouutput.
    • dev_mode This currently does nothing. Just ignore this
    • monitor If your own more than one monitor setup, then set the monitor number that the AI will see. The monitor number can be seen in windows+I (Open settings) > System > Display Then click on the "Identify" button. The number shown on each monitor is the monitor's number.
  4. Run the assistant: Now finally-

    python run.py

🛠️ Tech Stack

  • Language: Python
  • Libraries used for Pretty Printing: textual, rich, colorma

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.


*Created by Modassir <3

About

A Desktop Agent that can work on any OpenAI compatible API endpoint including local LLMs too.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors