Skip to content

kwierman/LocalCodingAgents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Local Agent Framework

A multi-agent coding framework that runs entirely on local hardware with 6GB GPU support. Uses quantized GGUF models via llama-cpp-python for LLM inference and ChromaDB + sentence-transformers for local RAG (Retrieval-Augmented Generation).

No API keys. No cloud. Everything runs on your machine.


Architecture

User Request
     β”‚
     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         AgentOrchestrator              β”‚
β”‚  β€’ Detects intent (code/debug/docs)    β”‚
β”‚  β€’ Fetches RAG context from codebase   β”‚
β”‚  β€’ Routes to agent pipeline            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β–Ό           β–Ό                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Coder  β”‚ β”‚Debugger  β”‚ β”‚  DocAgent    β”‚
β”‚ Agent  β”‚ β”‚ Agent    β”‚ β”‚              β”‚
β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚            β”‚
    β–Ό            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   ReviewerAgent    β”‚
β”‚ (auto quality gate)β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
Final Output

RAG Stack (CPU):
  ChromaDB ──► bge-small-en ──► Your Codebase

GPU Memory Layout (6GB)

Component VRAM Notes
DeepSeek Coder 6.7B (28 layers) ~4.2GB Primary LLM
KV Cache (4096 ctx) ~0.8GB Context window
Overhead / buffers ~0.5GB CUDA runtime
Total ~5.5GB Leaves 0.5GB headroom
Embedding model CPU only bge-small-en, 133MB RAM

Installation

1. Install the package

git clone <this-repo>
cd local_agent_framework
pip install -e .

2. Install llama-cpp-python with CUDA support

# For NVIDIA GPU (CUDA 12.x):
CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python --force-reinstall

# CPU-only (slower but works anywhere):
pip install llama-cpp-python

3. Download a model

# Download default model (DeepSeek Coder 6.7B Q4_K_M, ~4GB)
lagent download-model

# Or download a specific model:
lagent download-model \
  --repo TheBloke/CodeLlama-7B-Instruct-GGUF \
  --file codellama-7b-instruct.Q4_K_M.gguf

4. Verify setup

lagent gpu-check

Quick Start

CLI

# Interactive chat mode
lagent chat

# Single task
lagent run "Write a Python async HTTP client with retry logic"

# Index your codebase for RAG
lagent index ./my_project

# Debug an error (paste traceback)
lagent run "Fix: AttributeError: 'NoneType' object has no attribute 'split'"

# Generate documentation
lagent run "Write docstrings for all functions in auth.py" --lang python

# Custom pipeline
lagent run "Refactor this to use async/await" --pipeline coder,reviewer

# Save output to file
lagent run "Create a FastAPI CRUD app" --output api.py

Python API

from local_agent_framework import AgentOrchestrator, RAGPipeline

# Basic usage
orchestrator = AgentOrchestrator()
result = orchestrator.run("Write a Redis cache decorator")
print(result.content)

# With RAG (index your project first)
rag = RAGPipeline()
rag.index_directory("./my_project")

orchestrator = AgentOrchestrator(rag=rag)
result = orchestrator.run(
    "Add rate limiting to the existing API endpoints",
    language="python"
)

# Print extracted code blocks
for code in result.code_blocks:
    print(code)

Agents

🧠 OrchestratorAgent

Routes requests to the appropriate pipeline automatically:

  • Bug/error keywords β†’ debugger β†’ reviewer
  • Code generation β†’ coder β†’ reviewer
  • Documentation β†’ doc
  • Explanation β†’ coder (explain mode)
  • Refactor β†’ coder β†’ reviewer

πŸ’» CoderAgent

Writes, refactors, explains, and converts code. Follows language-specific best practices, adds type hints, docstrings, and error handling.

πŸ” ReviewerAgent

Reviews code for:

  • πŸ”΄ Critical: Bugs, security vulnerabilities, data loss risks
  • 🟑 Warnings: Performance issues, bad practices
  • πŸ”΅ Suggestions: Style, readability improvements

Returns: APPROVED | NEEDS_CHANGES | REJECTED

πŸ› DebuggerAgent

Given a traceback + code, identifies root cause and provides a precise fix with explanation.

πŸ“š DocAgent

Generates: docstrings (Google/NumPy style), README files, API documentation, inline comments.


RAG Pipeline

from local_agent_framework import RAGPipeline

rag = RAGPipeline()

# Index a directory
stats = rag.index_directory(
    "./my_project",
    recursive=True,
    exclude_dirs=[".git", "node_modules", "venv"],
    force=False,  # Skip files already indexed
)
# β†’ {"files_indexed": 47, "chunks_added": 312, "files_skipped": 3}

# Manual retrieval
results = rag.retrieve("how does user authentication work?", top_k=5)
for r in results:
    print(f"[{r['score']:.2f}] {r['metadata']['file_name']}")
    print(r['content'])

# Stats
print(rag.stats())
# β†’ {"total_chunks": 312, "collection_name": "codebase", ...}

Supported file types: .py .js .ts .jsx .tsx .java .go .rs .cpp .c .h .cs .rb .php .md .txt .yaml .yml .json .toml .sh .sql


Configuration

Generate a config file:

lagent show-config --save config.yaml

Edit config.yaml:

model:
  model_name: deepseek-coder-6.7b-instruct.Q4_K_M.gguf
  n_gpu_layers: 28   # Reduce if OOM errors (try 20-24)
  n_ctx: 4096        # Reduce to 2048 to save VRAM
  temperature: 0.1   # Low = deterministic code

rag:
  top_k: 5
  chunk_size: 1000
  min_relevance_score: 0.3

Use custom config:

lagent run "..." --config ./my_config.yaml

Recommended Models for 6GB VRAM

Model Size VRAM Best For
DeepSeek Coder 6.7B Q4_K_M ⭐ 4.0GB ~4.5GB Code generation (recommended)
CodeLlama 7B Q4_K_M 4.1GB ~4.5GB Code + instruction following
Mistral 7B Q4_K_M 4.1GB ~4.5GB General coding tasks
Phi-3 Mini 3.8B Q4 2.3GB ~2.8GB Fast responses, lighter tasks

For 8GB+ VRAM: use CodeLlama 13B or DeepSeek Coder 33B (Q4_K_S)


Custom Agents

from local_agent_framework import AgentOrchestrator
from local_agent_framework.agents.base import BaseAgent, AgentRole, AgentTask, AgentResult

class TestWriterAgent(BaseAgent):
    @property
    def role(self):
        return AgentRole.CODER  # Reuse existing role enum or extend it
    
    @property
    def system_prompt(self):
        return """You are a test engineer. Write comprehensive pytest test suites.
Always include: unit tests, edge cases, fixtures, and mocks where appropriate."""
    
    def run(self, task: AgentTask) -> AgentResult:
        prompt = self._build_task_prompt(task)
        response = self._generate(prompt)
        return AgentResult(
            agent_name=self.name,
            task_id=task.task_id,
            success=True,
            content=response,
        )

# Register and use
orchestrator = AgentOrchestrator()
orchestrator.add_agent("test_writer", TestWriterAgent(orchestrator.model_loader))

result = orchestrator.run(
    "Write tests for my UserAuth class",
    pipeline=["test_writer"]
)

Troubleshooting

CUDA out of memory:

# In config.yaml, reduce GPU layers:
model:
  n_gpu_layers: 20  # or 16
  n_ctx: 2048       # Reduce context window

Model not found:

lagent download-model
# Check models dir:
ls ~/.local_agent_framework/models/

Slow inference:

# Use a smaller/faster model:
lagent download-model --repo microsoft/Phi-3-mini-4k-instruct-gguf --file Phi-3-mini-4k-instruct-q4.gguf

Poor code quality:

  • Increase max_tokens in config
  • Lower temperature (try 0.05)
  • Enable auto-review: AgentOrchestrator(auto_review=True)

License

MIT

About

Local Multi-Agent Coding Framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages