|
| 1 | +--- |
| 2 | +import DocsLayout from '../../layouts/DocsLayout.astro'; |
| 3 | +--- |
| 4 | + |
| 5 | +<DocsLayout title="Open Source Models | AgentCrew" lang="en"> |
| 6 | + <div class="docs-prose"> |
| 7 | + <h1>Open Source Models</h1> |
| 8 | + |
| 9 | + <p> |
| 10 | + AgentCrew includes built-in support for running open source models locally |
| 11 | + via <a href="https://ollama.com" target="_blank" rel="noopener">Ollama</a>. |
| 12 | + When you select <strong>Ollama</strong> as your model provider, AgentCrew |
| 13 | + automatically manages the entire lifecycle: starting the Ollama container, |
| 14 | + pulling models, warming them up, and stopping the container when no teams |
| 15 | + need it anymore. |
| 16 | + </p> |
| 17 | + |
| 18 | + <p> |
| 19 | + This means you can run AI agent teams entirely on your own hardware, with |
| 20 | + no external API keys required and full data privacy. |
| 21 | + </p> |
| 22 | + |
| 23 | + <h2>How It Works</h2> |
| 24 | + |
| 25 | + <h3>Shared Infrastructure</h3> |
| 26 | + |
| 27 | + <p> |
| 28 | + Unlike team containers (which are isolated per team), Ollama runs as |
| 29 | + <strong>shared infrastructure</strong>. A single |
| 30 | + <code>agentcrew-ollama</code> container serves all teams that use the |
| 31 | + Ollama provider. This avoids duplicating large model files and reduces |
| 32 | + resource usage. |
| 33 | + </p> |
| 34 | + |
| 35 | + <ul> |
| 36 | + <li> |
| 37 | + <strong>Reference counting</strong>: AgentCrew tracks how many teams are |
| 38 | + using Ollama. The container starts when the first Ollama team deploys and |
| 39 | + stops when the last one is removed. |
| 40 | + </li> |
| 41 | + <li> |
| 42 | + <strong>Persistent storage</strong>: Downloaded models are stored in a |
| 43 | + Docker volume (<code>agentcrew-ollama-models</code>) that persists even |
| 44 | + when the container stops. Models only need to be downloaded once. |
| 45 | + </li> |
| 46 | + <li> |
| 47 | + <strong>Multi-network</strong>: The Ollama container connects to each |
| 48 | + team's Docker network, so agent containers can reach it via DNS |
| 49 | + (<code>agentcrew-ollama:11434</code>). |
| 50 | + </li> |
| 51 | + </ul> |
| 52 | + |
| 53 | + <h3>Automatic Lifecycle</h3> |
| 54 | + |
| 55 | + <p> |
| 56 | + When you deploy a team with the Ollama provider, AgentCrew automatically: |
| 57 | + </p> |
| 58 | + |
| 59 | + <ol> |
| 60 | + <li>Starts the <code>agentcrew-ollama</code> container (or reuses it if already running).</li> |
| 61 | + <li>Connects it to the team's Docker network.</li> |
| 62 | + <li>Pulls the selected model (if not already downloaded).</li> |
| 63 | + <li>Warms up the model by loading weights into RAM, avoiding cold-start delays on the first message.</li> |
| 64 | + <li>Deploys the team's agent containers with <code>OLLAMA_BASE_URL</code> pre-configured.</li> |
| 65 | + </ol> |
| 66 | + |
| 67 | + <p> |
| 68 | + When you stop a team, AgentCrew disconnects Ollama from that team's |
| 69 | + network and decrements the reference count. If no other teams are using |
| 70 | + Ollama, the container is stopped (but the volume with downloaded models |
| 71 | + is preserved). |
| 72 | + </p> |
| 73 | + |
| 74 | + <h2>GPU Support</h2> |
| 75 | + |
| 76 | + <p> |
| 77 | + AgentCrew automatically detects NVIDIA GPUs on the host machine. If |
| 78 | + <code>nvidia-smi</code> is found in the system PATH, GPU passthrough is |
| 79 | + enabled for the Ollama container, giving models access to all available |
| 80 | + GPUs for dramatically faster inference. |
| 81 | + </p> |
| 82 | + |
| 83 | + <p> |
| 84 | + No manual configuration is needed. If a GPU is available, it will be used |
| 85 | + automatically. You can verify GPU status via the |
| 86 | + <a href="#status-endpoint">status endpoint</a>. |
| 87 | + </p> |
| 88 | + |
| 89 | + <h2>Using Ollama in AgentCrew</h2> |
| 90 | + |
| 91 | + <h3>Creating a Team</h3> |
| 92 | + |
| 93 | + <ol> |
| 94 | + <li>In the team creation wizard, select <strong>OpenCode</strong> as the provider.</li> |
| 95 | + <li>Choose <strong>Ollama</strong> as the model provider.</li> |
| 96 | + <li> |
| 97 | + Select a model for your agents. The default model is |
| 98 | + <code>qwen3:4b</code>, but you can use any model available in the |
| 99 | + <a href="https://ollama.com/library" target="_blank" rel="noopener">Ollama model library</a>. |
| 100 | + </li> |
| 101 | + <li>Configure your agents as usual. All agents in the team will use the selected Ollama model provider.</li> |
| 102 | + </ol> |
| 103 | + |
| 104 | + <h3>Model Format</h3> |
| 105 | + |
| 106 | + <p> |
| 107 | + When specifying agent models, use the <code>ollama/</code> prefix followed |
| 108 | + by the model name and optional tag: |
| 109 | + </p> |
| 110 | + |
| 111 | + <ul> |
| 112 | + <li><code>ollama/qwen3:4b</code></li> |
| 113 | + <li><code>ollama/llama3.3:8b</code></li> |
| 114 | + <li><code>ollama/codellama:13b</code></li> |
| 115 | + <li><code>ollama/mistral:7b</code></li> |
| 116 | + <li><code>ollama/devstral</code></li> |
| 117 | + </ul> |
| 118 | + |
| 119 | + <p> |
| 120 | + You can also use <code>inherit</code> to let the agent use the team's |
| 121 | + default model. |
| 122 | + </p> |
| 123 | + |
| 124 | + <h3>Model Provider Constraint</h3> |
| 125 | + |
| 126 | + <p> |
| 127 | + When a team's model provider is set to <strong>Ollama</strong>, all agents |
| 128 | + in that team must use Ollama models. You cannot mix providers within a |
| 129 | + single OpenCode team (e.g., one agent using Ollama and another using |
| 130 | + OpenAI). This constraint ensures consistent runtime behavior since all |
| 131 | + agents share the same container environment. |
| 132 | + </p> |
| 133 | + |
| 134 | + <p> |
| 135 | + If you change the model provider on an existing team, all agent model |
| 136 | + selections are automatically reset to <code>inherit</code>. |
| 137 | + </p> |
| 138 | + |
| 139 | + <h2 id="status-endpoint">Status Endpoint</h2> |
| 140 | + |
| 141 | + <p> |
| 142 | + You can check the current state of the Ollama infrastructure via the API: |
| 143 | + </p> |
| 144 | + |
| 145 | + <pre><code>GET /api/ollama/status</code></pre> |
| 146 | + |
| 147 | + <p>Response example:</p> |
| 148 | + |
| 149 | + <pre><code>{"{"} |
| 150 | + "running": true, |
| 151 | + "container_id": "abc123...", |
| 152 | + "models_pulled": ["qwen3:4b", "codellama:13b"], |
| 153 | + "ref_count": 2, |
| 154 | + "gpu_available": true |
| 155 | +{"}"}</code></pre> |
| 156 | + |
| 157 | + <table> |
| 158 | + <thead> |
| 159 | + <tr> |
| 160 | + <th>Field</th> |
| 161 | + <th>Description</th> |
| 162 | + </tr> |
| 163 | + </thead> |
| 164 | + <tbody> |
| 165 | + <tr> |
| 166 | + <td><code>running</code></td> |
| 167 | + <td>Whether the Ollama container is currently running.</td> |
| 168 | + </tr> |
| 169 | + <tr> |
| 170 | + <td><code>container_id</code></td> |
| 171 | + <td>Docker container ID (empty if not running).</td> |
| 172 | + </tr> |
| 173 | + <tr> |
| 174 | + <td><code>models_pulled</code></td> |
| 175 | + <td>List of models already downloaded and available.</td> |
| 176 | + </tr> |
| 177 | + <tr> |
| 178 | + <td><code>ref_count</code></td> |
| 179 | + <td>Number of active teams using Ollama.</td> |
| 180 | + </tr> |
| 181 | + <tr> |
| 182 | + <td><code>gpu_available</code></td> |
| 183 | + <td>Whether NVIDIA GPU passthrough is available.</td> |
| 184 | + </tr> |
| 185 | + </tbody> |
| 186 | + </table> |
| 187 | + |
| 188 | + <h2>Requirements</h2> |
| 189 | + |
| 190 | + <ul> |
| 191 | + <li><strong>Docker</strong>: Ollama runs as a Docker container, so Docker must be available on the host.</li> |
| 192 | + <li><strong>Disk space</strong>: Models range from ~2 GB (small 4B parameter models) to ~10+ GB (larger 13B+ models). The persistent volume stores all downloaded models.</li> |
| 193 | + <li><strong>RAM</strong>: Models are loaded into RAM (or VRAM if GPU is available). Ensure your host has enough memory for the selected model size.</li> |
| 194 | + <li><strong>GPU (optional)</strong>: NVIDIA GPU with <code>nvidia-smi</code> and the <a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html" target="_blank" rel="noopener">NVIDIA Container Toolkit</a> installed for GPU acceleration.</li> |
| 195 | + </ul> |
| 196 | + |
| 197 | + <h2>Next Steps</h2> |
| 198 | + |
| 199 | + <ul> |
| 200 | + <li> |
| 201 | + <a href="/docs/providers">Providers</a>: Learn about all supported |
| 202 | + providers and how they compare. |
| 203 | + </li> |
| 204 | + <li> |
| 205 | + <a href="/docs/configuration">Configuration</a>: Review environment |
| 206 | + variables and application settings. |
| 207 | + </li> |
| 208 | + <li> |
| 209 | + <a href="/docs/architecture">Architecture</a>: Understand how containers, |
| 210 | + sidecars, and networking work together. |
| 211 | + </li> |
| 212 | + </ul> |
| 213 | + </div> |
| 214 | +</DocsLayout> |
0 commit comments