AI-powered code completion and editing for VS Code using local LLM backends
Features • Installation • Configuration • Models • Contributing
Collama is a VS Code extension that uses local LLM backends to get code completions, refactoring suggestions, and documentation generation — all running privately on your machine with no external API calls. Supports Ollama and OpenAI-compatible API (verified only vLLM).
Status: This project is in heavy active development. For sure there will be a lot of strange output. If you have any ideas to improve the quality just let me know and contribute!
Code Completion
- Inline, multiline, and multiblock (more a "fun" feature) suggestions
- Uses currently opened tabs as context
Code Edits
- Generate docstrings and documentation
- Extract functions and refactor code
- Simplify complex code
- Fix syntax errors
- Manual instructions
Chat Interface
- Chat with multiple sessions - organize conversations by topic
- Send selected code or files to chat as context
- Context is automatically attached to messages with file reference and line numbers
- Real-time context usage bar showing token consumption vs model's context window
- Automatic context trimming — when conversations exceed the context window, older messages are removed from the LLM context while remaining visible in the chat
- Visual indicators for messages no longer included in the LLM context
Commit Messages
- AI-generated conventional commit messages from staged changes
- Analyzes git diff to create meaningful commit descriptions
- Accessible via command palette or Source Control view
Context Management
- Automatic detection of the model's context window size (Ollama and OpenAI)
- Prompts that exceed the context window are blocked with a clear notification
- Open editor tabs are used as additional context for completions, with smart pruning to fit the context window
- VS Code 1.107.0 or higher
- Ollama running locally (or accessible on your network), or any OpenAI-compatible API
- A supported code model (see Models)
Use the marketplace to install the extension or build the vsix yourself. Furthermore you need an ollama instance in your local network.
See this link how to install ollama or this link for the docker image.
Alternatively, you can use vLLM (tested). Point the endpoint settings to your server and the backend is auto-detected. If authentication is required, see Bearer Tokens.
Configure Collama via VS Code Settings (Preferences → Settings, search "collama"):
| Setting | Type | Default | Description |
|---|---|---|---|
collama.apiEndpointCompletion |
string | http://http://127.0.0.1:11434 |
Endpoint for code auto-completion |
collama.apiEndpointInstruct |
string | http://http://127.0.0.1:11434 |
Endpoint for code edits/chat |
collama.apiCompletionModel |
string | qwen2.5-coder:3b |
Model for code completions |
collama.apiInstructionModel |
string | qwen2.5-coder:3b-instruct |
Model for code edits (use instruct/base variant) |
collama.autoComplete |
boolean | true |
Enable auto-suggestions |
collama.suggestMode |
string | inline |
Suggestion style: inline, multiline, or multiblock |
collama.suggestDelay |
number | 1500 |
Delay (ms) before requesting completion |
If your API endpoints require authentication (e.g. vLLM with --api-key, or a reverse proxy), you can securely store bearer tokens using VS Code's encrypted credential storage. Tokens are sent as Authorization: Bearer <token> headers with every request.
- Open the Command Palette (
Ctrl+Shift+P/Cmd+Shift+P) - Run one of these commands:
collama: Set Bearer Token (Completion)- for the completion endpointcollama: Set Bearer Token (Instruct)- for the instruct/chat endpoint
- Enter your token in the password input (characters will be masked)
- The token is stored encrypted in your system's credential manager
Note: Tokens are stored securely and never appear in plain text in your settings or configuration files. To clear a token, run the same command and leave the input empty.
Collama is tested primarily with the Qwen Coder series and performs best with specialized code models:
- any qwen coder > 3b recommended
- qwen2.5-coder:7b (stable quality)
- any instruct model and thinking recommended
- gpt-oss:20b (stable quality)
| Model | Tested Sizes | FIM Support | Status | Notes |
|---|---|---|---|---|
| qwen2.5-coder | 1.5B, 3B, 7B | ✅ | Stable | Recommended for most use cases |
| qwen3-coder | 30B | ✅ | Stable | Excellent quality, higher resource usage |
| starcoder | — | Untested | May work; contributions welcome | |
| starcoder2 | 3B | ✅ | Stable | Improved over v1 |
| codellama | 7B, 13B | Limited | Limited file context support; FIM is ok | |
| codeqwen | — | Untested | May work; contributions welcome |
Note: Models are tested primarily with quantization level q4. Results may vary with other quantization levels.
Note: ChatML format is not supported - that means only true fim-models will work for autocomplete!
- Trigger Completion: Use
editor.action.inlineSuggest.trigger(default keybinding varies by OS)- Set custom keybinding:
Alt + SorCtrl + NumPad 1(example)
- Set custom keybinding:
- Auto-Trigger: Completions trigger automatically after 1.5 seconds of inactivity (configurable via
suggestDelay) - Accept: Press
Tabto accept suggestion,Escto dismiss
- Select code in the editor
- Right-click → collama (on selection) and choose:
- Write Docstring - Generate documentation
- Extract Functions - Refactor into separate functions
- Simplify Code - Improve readability and efficiency
- Fix Syntax - Correct syntax errors
- Edit (manual) - Custom AI-assisted edits
- Open the Chat view in the sidebar (Collama icon)
- Right-click on selected code in editor → Send to Chat
- Type your message - the context is automatically attached with:
- File name and path
- Line number references
- Selected code or full file content
- Monitor token usage with the real-time context bar (shows usage vs. model's max context)
- Create multiple chat sessions to organize conversations by topic
- Stage your changes in Git (using Source Control view or
git add) - Open Command Palette (
Ctrl+Shift+P/Cmd+Shift+P) - Run collama: Generate Commit Message
- The AI analyzes your staged diff and generates a conventional commit message
- Review and edit the message in the Source Control input box before committing
Contributions are welcome! Here's how you can help:
- Report Issues Open an issue
- Submit PRs: Fork, create a feature branch, and submit a pull request

