This project allows you to interact with DeepSeek LLM models (locally run with Ollama) for query processing, without needing to rely on cloud services. DeepSeek models provide a local solution for performing reasoning tasks, such as math, code generation, and general knowledge answering.
- Run DeepSeek LLM models locally using Ollama
- Query processing with DeepSeek's first-generation reasoning models
- Support for multiple DeepSeek models with varying parameter sizes (1.5B, 7B, 8B, 14B, 32B, 70B)
Before running the project, you need to set up a Python virtual environment and install required dependencies.
In your terminal, navigate to the project directory and create a virtual environment:
python3 -m venv deepseek-envActivate the environment:
source deepseek-env/bin/activateInstall the necessary dependencies:
pip install requestsDeepSeek models require the Ollama server for local model processing. Follow these steps to start the server.
Follow the instructions from Ollama's official website to install Ollama for your system.
Once installed, use the following command to start the server:
ollama serveThis will start the local Ollama server. It will listen on 127.0.0.1:11434 by default.
You can use any available DeepSeek model (e.g., deepseek-r1:1.5b) by running the following:
ollama run deepseek-r1:1.5bOnce the environment and server are set up, you can start interacting with the models.
To query the local LLM, run:
python3 LocalLLMQuery.pyIt will ask you to input a question and will return the model’s response.
After completing the interaction, you can stop the Ollama server with the following command:
ollama stopDeepSeek-R1 is the first-generation of reasoning models with performance comparable to OpenAI’s models. It includes various dense models distilled from DeepSeek-R1 based on Llama and Qwen, with various parameter sizes to accommodate different use cases:
- 1.5b - Suitable for general tasks and smaller systems
- 7b - Enhanced for more complex queries
- 8b - High-performing model for reasoning tasks
- 14b - Powerful for advanced tasks
- 32b - Ideal for massive-scale computations
- 70b - For demanding tasks requiring large model capabilities
Here’s how you can run a specific model:
ollama run deepseek-r1:1.5bYou can replace 1.5b with any of the other model sizes as per your requirements.
DeepSeek's models can be fine-tuned for various applications, and performance is comparable to OpenAI models for reasoning tasks across math, code, and general knowledge. The models listed below show the different distilled variants based on Qwen and Llama models:
- DeepSeek-R1-Distill-Qwen-1.5B
- DeepSeek-R1-Distill-Qwen-7B
- DeepSeek-R1-Distill-Llama-8B
- DeepSeek-R1-Distill-Qwen-14B
- DeepSeek-R1-Distill-Qwen-32B
- DeepSeek-R1-Distill-Llama-70B
Each of these models performs exceptionally well in benchmarks like math reasoning, coding, and logical reasoning.
ollama serve: Starts the local server where the models are loaded and ready to serve queries. The server should be kept running while you interact with the models.ollama stop: Stops the server after you finish interacting with the models. This ensures resources are released properly.
-
Start the server:
ollama serve
-
Stop the server:
ollama stop
The model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications, and derivative works, including distillation for training other LLMs.
For any issues or inquiries, feel free to raise a ticket or email us at [Gmail][s.bidwai2000@gmail.com].
Ensure the ollama service is running on your local machine before executing any queries. You can access the models directly from Ollama without the need for cloud servers, providing a local, efficient, and customizable environment for all your LLM-based tasks.