🤖 Local Agent Framework

A multi-agent coding framework that runs entirely on local hardware with 6GB GPU support. Uses quantized GGUF models via llama-cpp-python for LLM inference and ChromaDB + sentence-transformers for local RAG (Retrieval-Augmented Generation).

No API keys. No cloud. Everything runs on your machine.

Architecture

User Request
     │
     ▼
┌────────────────────────────────────────┐
│         AgentOrchestrator              │
│  • Detects intent (code/debug/docs)    │
│  • Fetches RAG context from codebase   │
│  • Routes to agent pipeline            │
└───────────────┬────────────────────────┘
                │
    ┌───────────┼──────────────────┐
    ▼           ▼                  ▼
┌────────┐ ┌──────────┐ ┌──────────────┐
│ Coder  │ │Debugger  │ │  DocAgent    │
│ Agent  │ │ Agent    │ │              │
└───┬────┘ └────┬─────┘ └──────────────┘
    │            │
    ▼            ▼
┌────────────────────┐
│   ReviewerAgent    │
│ (auto quality gate)│
└────────────────────┘
    │
    ▼
Final Output

RAG Stack (CPU):
  ChromaDB ──► bge-small-en ──► Your Codebase

GPU Memory Layout (6GB)

Component	VRAM	Notes
DeepSeek Coder 6.7B (28 layers)	~4.2GB	Primary LLM
KV Cache (4096 ctx)	~0.8GB	Context window
Overhead / buffers	~0.5GB	CUDA runtime
Total	~5.5GB	Leaves 0.5GB headroom
Embedding model	CPU only	bge-small-en, 133MB RAM

Installation

1. Install the package

git clone <this-repo>
cd local_agent_framework
pip install -e .

2. Install llama-cpp-python with CUDA support

# For NVIDIA GPU (CUDA 12.x):
CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python --force-reinstall

# CPU-only (slower but works anywhere):
pip install llama-cpp-python

3. Download a model

# Download default model (DeepSeek Coder 6.7B Q4_K_M, ~4GB)
lagent download-model

# Or download a specific model:
lagent download-model \
  --repo TheBloke/CodeLlama-7B-Instruct-GGUF \
  --file codellama-7b-instruct.Q4_K_M.gguf

4. Verify setup

lagent gpu-check

Quick Start

CLI

# Interactive chat mode
lagent chat

# Single task
lagent run "Write a Python async HTTP client with retry logic"

# Index your codebase for RAG
lagent index ./my_project

# Debug an error (paste traceback)
lagent run "Fix: AttributeError: 'NoneType' object has no attribute 'split'"

# Generate documentation
lagent run "Write docstrings for all functions in auth.py" --lang python

# Custom pipeline
lagent run "Refactor this to use async/await" --pipeline coder,reviewer

# Save output to file
lagent run "Create a FastAPI CRUD app" --output api.py

Python API

from local_agent_framework import AgentOrchestrator, RAGPipeline

# Basic usage
orchestrator = AgentOrchestrator()
result = orchestrator.run("Write a Redis cache decorator")
print(result.content)

# With RAG (index your project first)
rag = RAGPipeline()
rag.index_directory("./my_project")

orchestrator = AgentOrchestrator(rag=rag)
result = orchestrator.run(
    "Add rate limiting to the existing API endpoints",
    language="python"
)

# Print extracted code blocks
for code in result.code_blocks:
    print(code)

Agents

🧠 OrchestratorAgent

Routes requests to the appropriate pipeline automatically:

Bug/error keywords → debugger → reviewer
Code generation → coder → reviewer
Documentation → doc
Explanation → coder (explain mode)
Refactor → coder → reviewer

💻 CoderAgent

Writes, refactors, explains, and converts code. Follows language-specific best practices, adds type hints, docstrings, and error handling.

🔍 ReviewerAgent

Reviews code for:

🔴 Critical: Bugs, security vulnerabilities, data loss risks
🟡 Warnings: Performance issues, bad practices
🔵 Suggestions: Style, readability improvements

Returns: APPROVED | NEEDS_CHANGES | REJECTED

🐛 DebuggerAgent

Given a traceback + code, identifies root cause and provides a precise fix with explanation.

📚 DocAgent

Generates: docstrings (Google/NumPy style), README files, API documentation, inline comments.

RAG Pipeline

from local_agent_framework import RAGPipeline

rag = RAGPipeline()

# Index a directory
stats = rag.index_directory(
    "./my_project",
    recursive=True,
    exclude_dirs=[".git", "node_modules", "venv"],
    force=False,  # Skip files already indexed
)
# → {"files_indexed": 47, "chunks_added": 312, "files_skipped": 3}

# Manual retrieval
results = rag.retrieve("how does user authentication work?", top_k=5)
for r in results:
    print(f"[{r['score']:.2f}] {r['metadata']['file_name']}")
    print(r['content'])

# Stats
print(rag.stats())
# → {"total_chunks": 312, "collection_name": "codebase", ...}

Supported file types: .py .js .ts .jsx .tsx .java .go .rs .cpp .c .h .cs .rb .php .md .txt .yaml .yml .json .toml .sh .sql

Configuration

Generate a config file:

lagent show-config --save config.yaml

Edit config.yaml:

model:
  model_name: deepseek-coder-6.7b-instruct.Q4_K_M.gguf
  n_gpu_layers: 28   # Reduce if OOM errors (try 20-24)
  n_ctx: 4096        # Reduce to 2048 to save VRAM
  temperature: 0.1   # Low = deterministic code

rag:
  top_k: 5
  chunk_size: 1000
  min_relevance_score: 0.3

Use custom config:

lagent run "..." --config ./my_config.yaml

Recommended Models for 6GB VRAM

Model	Size	VRAM	Best For
DeepSeek Coder 6.7B Q4_K_M ⭐	4.0GB	~4.5GB	Code generation (recommended)
CodeLlama 7B Q4_K_M	4.1GB	~4.5GB	Code + instruction following
Mistral 7B Q4_K_M	4.1GB	~4.5GB	General coding tasks
Phi-3 Mini 3.8B Q4	2.3GB	~2.8GB	Fast responses, lighter tasks

For 8GB+ VRAM: use CodeLlama 13B or DeepSeek Coder 33B (Q4_K_S)

Custom Agents

from local_agent_framework import AgentOrchestrator
from local_agent_framework.agents.base import BaseAgent, AgentRole, AgentTask, AgentResult

class TestWriterAgent(BaseAgent):
    @property
    def role(self):
        return AgentRole.CODER  # Reuse existing role enum or extend it
    
    @property
    def system_prompt(self):
        return """You are a test engineer. Write comprehensive pytest test suites.
Always include: unit tests, edge cases, fixtures, and mocks where appropriate."""
    
    def run(self, task: AgentTask) -> AgentResult:
        prompt = self._build_task_prompt(task)
        response = self._generate(prompt)
        return AgentResult(
            agent_name=self.name,
            task_id=task.task_id,
            success=True,
            content=response,
        )

# Register and use
orchestrator = AgentOrchestrator()
orchestrator.add_agent("test_writer", TestWriterAgent(orchestrator.model_loader))

result = orchestrator.run(
    "Write tests for my UserAuth class",
    pipeline=["test_writer"]
)

Troubleshooting

CUDA out of memory:

# In config.yaml, reduce GPU layers:
model:
  n_gpu_layers: 20  # or 16
  n_ctx: 2048       # Reduce context window

Model not found:

lagent download-model
# Check models dir:
ls ~/.local_agent_framework/models/

Slow inference:

# Use a smaller/faster model:
lagent download-model --repo microsoft/Phi-3-mini-4k-instruct-gguf --file Phi-3-mini-4k-instruct-q4.gguf

Poor code quality:

Increase max_tokens in config
Lower temperature (try 0.05)
Enable auto-review: AgentOrchestrator(auto_review=True)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
public		public
src/local_coding_agents		src/local_coding_agents
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Local Agent Framework

Architecture

GPU Memory Layout (6GB)

Installation

1. Install the package

2. Install llama-cpp-python with CUDA support

3. Download a model

4. Verify setup

Quick Start

CLI

Python API

Agents

🧠 OrchestratorAgent

💻 CoderAgent

🔍 ReviewerAgent

🐛 DebuggerAgent

📚 DocAgent

RAG Pipeline

Configuration

Recommended Models for 6GB VRAM

Custom Agents

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Local Agent Framework

Architecture

GPU Memory Layout (6GB)

Installation

1. Install the package

2. Install llama-cpp-python with CUDA support

3. Download a model

4. Verify setup

Quick Start

CLI

Python API

Agents

🧠 OrchestratorAgent

💻 CoderAgent

🔍 ReviewerAgent

🐛 DebuggerAgent

📚 DocAgent

RAG Pipeline

Configuration

Recommended Models for 6GB VRAM

Custom Agents

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages