A production-grade agentic customer support system built with Claude AI and the Model Context Protocol (MCP), designed for the Claude Architect Certification - Scenario 1.
Project Status: ✓ Complete - All tests passing, 100% FCR achieved (target: ≥80%)
- Overview
- Architecture
- Key Learnings
- Implementation Details
- Test Results
- Setup & Usage
- Project Structure
This project implements an autonomous customer support agent that handles high-ambiguity support requests (returns, billing disputes, account issues) with minimal human intervention. The agent achieves ≥80% first-contact resolution (FCR) by intelligently orchestrating tool calls while knowing when to escalate complex cases.
The agent handles three primary customer support scenarios:
- Refund Processing - Verifies customer identity, checks order eligibility, processes refunds up to $150 autonomously
- Escalation Management - Detects trigger words ("lawyer", "unacceptable", "fraud") and policy exceptions, routes to appropriate human agents
- Cross-Account Protection - Prevents customers from accessing other customers' orders, escalates security concerns immediately
| Metric | Target | Achieved | Status |
|---|---|---|---|
| First-Contact Resolution | ≥80% | 100% | ✓ PASS |
| Scenario Test Coverage | 5 scenarios | 5/5 passed | ✓ PASS |
| Security (No Code Leakage) | 0 leaks | 0 leaks | ✓ PASS |
| Tool Prerequisite Enforcement | 100% | 28/28 tests pass | ✓ PASS |
The system uses a 3-layer architecture with programmatic enforcement of business rules:
┌─────────────────────────────────────────────────────────────┐
│ API Layer (FastAPI) │
│ - POST /api/chat (single turn) │
│ - GET /api/chat/stream (streaming SSE) │
│ - GET /api/health (readiness probe) │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────────────────▼─────────────────────────────────────┐
│ Agent Layer (Orchestrator) │
│ - Manages conversation state & history │
│ - Implements agentic loop (call → tool use → continue) │
│ - Calls Claude API with MCP tool schema │
│ - Streams events (text, tool calls, results, errors) │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────────────────▼─────────────────────────────────────┐
│ MCP Layer (Tools + Middleware) │
│ Tools: │
│ 1. get_customer (READ - no side effects) │
│ 2. lookup_order (READ - requires customer_id) │
│ 3. process_refund (WRITE - requires eligibility) │
│ 4. escalate_to_human (ESCALATE - anytime) │
│ │
│ Middleware (Enforcement Layer): │
│ - Prerequisites: Enforces call ordering at code level │
│ - Idempotency: Prevents duplicate refunds (UUID v4) │
│ - Error Handling: Never leaks internal codes to users │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────────────────▼─────────────────────────────────────┐
│ Backend Layer (Stub APIs) │
│ - CRM: get_customer_by_email() │
│ - Orders: get_order_by_id() │
│ - Payments: process_refund_internal() │
│ - Tickets: create_escalation_ticket() │
│ (Returns fixture data - no real API calls in demo) │
└─────────────────────────────────────────────────────────────┘
What I Learned:
- Agents need a loop-based architecture that can handle multi-turn tool use
- The orchestrator must manage both conversation history and session state
- Streaming events (text, tool calls, results) provides real-time feedback to users
Implementation Pattern:
async def _agent_loop(self) -> AsyncIterator[AgentEvent]:
"""Core agentic loop: call Claude → process tools → continue if needed"""
while True:
response = self.client.messages.create(...)
for block in response.content:
if block.type == "text":
yield TextEvent(content=block.text)
elif block.type == "tool_use":
yield ToolCallEvent(...)
result = await self._handle_tool_call(block.name, block.input)
yield ToolResultEvent(...)
if no_tool_use:
break # DoneKey Insight: The agent must continue calling Claude until it stops requesting tools. This creates a natural conversation flow where the agent can chain multiple operations (e.g., get_customer → lookup_order → process_refund) in a single user turn.
What I Learned:
- Tool prerequisites must be enforced in code, not just prompts
- Tools should be designed with clear responsibilities (READ vs WRITE vs ESCALATE)
- MCP provides a standardized protocol for tool integration with Claude
Middleware Pattern:
def check_prerequisites(tool_name: ToolName, session: SessionState):
"""Enforces tool ordering at the code level"""
if tool_name == ToolName.PROCESS_REFUND:
if session.customer_id is None:
raise PrerequisiteError("get_customer must be called first")
if not session.refund_eligible:
raise PrerequisiteError("lookup_order must return refund_eligible=True")Why This Matters:
- Prompts alone are insufficient - Claude can hallucinate or skip steps under ambiguous conditions
- Code enforcement provides guarantees - A refund cannot be processed without verification
- Clear error messages guide Claude to fix its approach (not exposed to users)
Tool Design Principles:
- READ tools (get_customer, lookup_order) have no side effects → safe to retry
- WRITE tools (process_refund) are irreversible → require idempotency keys
- ESCALATE tools (escalate_to_human) can be called anytime → no prerequisites except customer_id
What I Learned:
- Session state is critical for multi-turn conversations and prerequisite checks
- Idempotency keys prevent duplicate operations when retrying WRITE tools
- Error handling must never leak internal codes to end users
Idempotency Implementation:
# Generate ONCE per order_id, reuse on retry
if session.current_order_id != order_id:
# New order - generate new key
session.idempotency_key = str(uuid4())
session.current_order_id = order_id
else:
# Same order - reuse existing key
passError Sanitization:
# NEVER expose these to users
INTERNAL_CODES = ["LIMIT_EXCEEDED", "OWNERSHIP_MISMATCH", "$150", "NOT_FOUND"]
# Instead, use natural language
"I apologize, but this refund requires manager approval due to the amount."Key Insight: Reliability in production requires defense in depth:
- Prerequisite middleware prevents invalid tool sequences
- Idempotency prevents duplicate refunds even if Claude retries
- Error handlers sanitize all responses before they reach the user
The system enforces these rules programmatically (not just in prompts):
-
get_customer MUST be called first before lookup_order or process_refund
- Verifies customer identity
- Retrieves customer_id needed for subsequent calls
-
lookup_order MUST be called before process_refund
- Confirms order exists and belongs to customer
- Returns refund_eligible flag
-
process_refund requires refund_eligible=True
- If False, agent must explain and potentially escalate
- Cannot be bypassed (enforced in code)
-
OWNERSHIP_MISMATCH triggers immediate P1 escalation
- Security concern - never retry
- Example: Customer A tries to access Customer B's order
The agent escalates in these scenarios (detected by classifier + system prompt):
| Trigger | Priority | Reason |
|---|---|---|
| Customer says "lawyer", "fraud", "sue" | P1 | Legal/security concern |
| OWNERSHIP_MISMATCH from lookup_order | P1 | Potential account takeover |
| Customer requests human/manager | P1 | Honor user preference |
| Refund > $150 (LIMIT_EXCEEDED) | P2 | Requires supervisor approval |
| Tool failure after 2 retries | P2 | System issue |
| 2+ failed clarification rounds | P3 | Complexity beyond agent |
The system prompt (backend/prompts/system_prompt.py) includes:
- Role definition - Customer support agent with FCR goal
- Tool descriptions - What each tool does and when to use it
- Policy constraints - Refund limits, return windows (env vars)
- Prerequisite rules - Reinforces middleware enforcement
- Escalation triggers - When and how to escalate
- Few-shot examples - Demonstrates correct tool sequencing
- Prerequisites (14 tests): Validates middleware blocks invalid tool sequences
- Idempotency (7 tests): Verifies UUID generation and reuse logic
- Error Handlers (7 tests): Confirms no internal code leakage
- Refund Flow (3 tests): End-to-end happy path with real Claude API
- Escalation (3 tests): Trigger word detection, OWNERSHIP_MISMATCH, human requests
- Cross-Account (2 tests): Security - blocks access to other customers' orders
Latest benchmark run (2026-03-30):
FCR BENCHMARK
Target: ≥80% first-contact resolution
═══════════════════════════════════════════════════════════
✓ PASS | Damaged item refund → auto_resolved
✓ PASS | Distress trigger escalation → escalated
✓ PASS | Human request honored → escalated
✓ PASS | Ineligible order explained → escalated
✓ PASS | Cross-account blocked → escalated
═══════════════════════════════════════════════════════════
Scenarios passed: 5/5
FCR rate: 100% (1/1 eligible scenarios)
FCR target (≥80%): ✓ ACHIEVED
Internal code leaks: 0 (✓ NONE)
Key Achievement: 100% of auto-resolvable scenarios were resolved in first contact, with zero internal code leakage.
- Python 3.11+
- Anthropic API key (Claude Sonnet 4)
- Optional:
uvfor faster dependency installation
# Clone the repository
git clone https://github.com/yourusername/customer-support-agent.git
cd customer-support-agent/backend
# Install dependencies (option 1: pip)
pip install -e .
# Install dependencies (option 2: uv - faster)
uv pip install -e .
# Set up environment variables
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY# .env file
ANTHROPIC_API_KEY=sk-ant-... # Required: Your Anthropic API key
REFUND_LIMIT=150.0 # Optional: Max autonomous refund amount
RETURN_WINDOW_DAYS=30 # Optional: Return eligibility window# Start FastAPI server with hot reload
uvicorn backend.api.main:app --reload --port 8000
# Server available at:
# - API: http://localhost:8000/api/chat
# - Health: http://localhost:8000/api/health
# - Docs: http://localhost:8000/docs# Run all tests with verbose output
pytest backend/tests/ -v
# Run specific test suites
pytest backend/tests/unit/ -v # Unit tests (middleware)
pytest backend/tests/integration/ -v # Integration tests (end-to-end)
pytest backend/tests/evals/ -v # Evaluation benchmarks
# Run a single test
pytest backend/tests/unit/test_prerequisites.py::test_lookup_requires_customer -v
# Run FCR benchmark
python -m backend.tests.evals.fcr_benchmark# Single-turn chat request
curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{
"message": "Hi, email sarah@example.com. My blender arrived broken, order ORD-8842. I want a refund.",
"session_id": "test-session-123"
}'
# Streaming response (SSE)
curl -N http://localhost:8000/api/chat/stream?session_id=test-session-456customer-support-agent/
├── backend/
│ ├── agent/ # Agent orchestration layer
│ │ ├── orchestrator.py # Main agentic loop (349 lines)
│ │ ├── session.py # Session management & FCR tracking
│ │ └── classifier.py # Pre-processing for escalation triggers
│ │
│ ├── api/ # FastAPI layer
│ │ ├── main.py # App initialization, CORS, routes
│ │ ├── schemas.py # Pydantic request/response models
│ │ └── routes/
│ │ ├── chat.py # POST /api/chat, GET /api/chat/stream
│ │ └── health.py # GET /api/health
│ │
│ ├── mcp_layer/ # MCP tools and middleware
│ │ ├── mcp_server.py # FastMCP server instance
│ │ ├── session_storage.py # In-memory session state
│ │ ├── middleware/
│ │ │ ├── prerequisites.py # Tool ordering enforcement (88 lines)
│ │ │ └── idempotency.py # Duplicate refund prevention (40 lines)
│ │ └── tools/
│ │ ├── get_customer.py # Customer lookup (102 lines)
│ │ ├── lookup_order.py # Order verification (90 lines)
│ │ ├── process_refund.py # Refund execution (129 lines)
│ │ └── escalate_to_human.py # Escalation (97 lines)
│ │
│ ├── backends/ # Stub implementations (fixture data)
│ │ ├── crm.py # Customer data (51 lines)
│ │ ├── orders.py # Order data (66 lines)
│ │ ├── payments.py # Refund processing (66 lines)
│ │ └── tickets.py # Escalation tickets (56 lines)
│ │
│ ├── prompts/ # Agent behavior definition
│ │ ├── system_prompt.py # Core system prompt (143 lines)
│ │ └── few_shot_examples.py # Example conversations (40 lines)
│ │
│ ├── types/
│ │ └── models.py # Pydantic models & enums (271 lines)
│ │
│ └── tests/
│ ├── unit/ # Middleware tests (28 tests)
│ ├── integration/ # End-to-end flows (8 tests)
│ └── evals/ # FCR benchmark (5 scenarios)
│ ├── fcr_benchmark.py
│ └── latest_benchmark.json
│
├── CLAUDE.md # Project instructions for Claude Code
├── README.md # This file
├── .env.example # Environment template
└── pyproject.toml # Python dependencies
Total Lines of Code: ~1,850 Python files (implementation + tests)
This project demonstrates mastery of the three core domains required for Scenario 1:
- Implemented: Agentic loop in
orchestrator.pythat handles multi-turn tool use - Demonstrated: Event streaming (TextEvent, ToolCallEvent, ToolResultEvent, ErrorEvent)
- Proven: Agent continues calling Claude until task completion, chaining operations seamlessly
- Implemented: 4 MCP tools with clear READ/WRITE/ESCALATE semantics
- Demonstrated: Prerequisite middleware enforces tool ordering at code level
- Proven: 28/28 unit tests pass, validating prerequisite enforcement and idempotency
- Implemented: Session state tracking for multi-turn conversations
- Demonstrated: Idempotency keys prevent duplicate refunds on retry
- Proven: 100% FCR rate achieved, 0 internal code leaks in evaluation
- FCR Target: ≥80% → Achieved: 100%
- Test Coverage: 5/5 scenarios passed
- Security: Zero internal code leakage
- Reliability: All 36 tests passing (28 unit + 8 integration)
When the Python Agent SDK becomes available, this implementation can be simplified:
# Current implementation (manual tool routing)
orchestrator = Orchestrator(session_id)
async for event in orchestrator.run(user_message):
yield event
# Future implementation (Agent SDK with native MCP)
agent = Agent(
model="claude-sonnet-4-20250514",
system_prompt=build_system_prompt(),
mcp_servers=[{
"type": "stdio",
"command": "python",
"args": ["-m", "backend.mcp_layer.mcp_server"]
}]
)
async for event in agent.run(user_message):
yield event # Tools automatically discovered and executedThe Agent SDK will handle:
- Automatic tool discovery from MCP servers
- Tool execution and result feeding back to Claude
- Built-in retry logic and context management
- Session state (our prerequisites middleware hooks in via MCP)
MIT License - See LICENSE file for details.
For questions about this implementation or the Claude Architect Certification:
- Repository: github.com/g3ram/customer-support-agent
- Issues: github.com/g3ram/customer-support-agent/issues
Built with: Claude Sonnet 4 | FastAPI | Model Context Protocol (MCP) | Python 3.11