Skip to content

G3Ram/customer-support-agent

Repository files navigation

Customer Support Resolution Agent

A production-grade agentic customer support system built with Claude AI and the Model Context Protocol (MCP), designed for the Claude Architect Certification - Scenario 1.

Project Status: ✓ Complete - All tests passing, 100% FCR achieved (target: ≥80%)


Table of Contents


Overview

This project implements an autonomous customer support agent that handles high-ambiguity support requests (returns, billing disputes, account issues) with minimal human intervention. The agent achieves ≥80% first-contact resolution (FCR) by intelligently orchestrating tool calls while knowing when to escalate complex cases.

What This Agent Does

The agent handles three primary customer support scenarios:

  1. Refund Processing - Verifies customer identity, checks order eligibility, processes refunds up to $150 autonomously
  2. Escalation Management - Detects trigger words ("lawyer", "unacceptable", "fraud") and policy exceptions, routes to appropriate human agents
  3. Cross-Account Protection - Prevents customers from accessing other customers' orders, escalates security concerns immediately

Target Metrics (Scenario 1 Requirements)

Metric Target Achieved Status
First-Contact Resolution ≥80% 100% ✓ PASS
Scenario Test Coverage 5 scenarios 5/5 passed ✓ PASS
Security (No Code Leakage) 0 leaks 0 leaks ✓ PASS
Tool Prerequisite Enforcement 100% 28/28 tests pass ✓ PASS

Architecture

The system uses a 3-layer architecture with programmatic enforcement of business rules:

┌─────────────────────────────────────────────────────────────┐
│                      API Layer (FastAPI)                     │
│  - POST /api/chat (single turn)                             │
│  - GET /api/chat/stream (streaming SSE)                     │
│  - GET /api/health (readiness probe)                        │
└───────────────────────┬─────────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────────┐
│                   Agent Layer (Orchestrator)                 │
│  - Manages conversation state & history                      │
│  - Implements agentic loop (call → tool use → continue)     │
│  - Calls Claude API with MCP tool schema                    │
│  - Streams events (text, tool calls, results, errors)       │
└───────────────────────┬─────────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────────┐
│                      MCP Layer (Tools + Middleware)          │
│  Tools:                                                      │
│    1. get_customer       (READ - no side effects)           │
│    2. lookup_order       (READ - requires customer_id)      │
│    3. process_refund     (WRITE - requires eligibility)     │
│    4. escalate_to_human  (ESCALATE - anytime)               │
│                                                              │
│  Middleware (Enforcement Layer):                            │
│    - Prerequisites: Enforces call ordering at code level    │
│    - Idempotency: Prevents duplicate refunds (UUID v4)      │
│    - Error Handling: Never leaks internal codes to users    │
└───────────────────────┬─────────────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────────────┐
│                   Backend Layer (Stub APIs)                  │
│  - CRM: get_customer_by_email()                             │
│  - Orders: get_order_by_id()                                │
│  - Payments: process_refund_internal()                      │
│  - Tickets: create_escalation_ticket()                      │
│  (Returns fixture data - no real API calls in demo)         │
└─────────────────────────────────────────────────────────────┘

Key Learnings

1. Agentic Architecture & Orchestration

What I Learned:

  • Agents need a loop-based architecture that can handle multi-turn tool use
  • The orchestrator must manage both conversation history and session state
  • Streaming events (text, tool calls, results) provides real-time feedback to users

Implementation Pattern:

async def _agent_loop(self) -> AsyncIterator[AgentEvent]:
    """Core agentic loop: call Claude → process tools → continue if needed"""
    while True:
        response = self.client.messages.create(...)

        for block in response.content:
            if block.type == "text":
                yield TextEvent(content=block.text)
            elif block.type == "tool_use":
                yield ToolCallEvent(...)
                result = await self._handle_tool_call(block.name, block.input)
                yield ToolResultEvent(...)

        if no_tool_use:
            break  # Done

Key Insight: The agent must continue calling Claude until it stops requesting tools. This creates a natural conversation flow where the agent can chain multiple operations (e.g., get_customer → lookup_order → process_refund) in a single user turn.


2. Tool Design & MCP Integration

What I Learned:

  • Tool prerequisites must be enforced in code, not just prompts
  • Tools should be designed with clear responsibilities (READ vs WRITE vs ESCALATE)
  • MCP provides a standardized protocol for tool integration with Claude

Middleware Pattern:

def check_prerequisites(tool_name: ToolName, session: SessionState):
    """Enforces tool ordering at the code level"""
    if tool_name == ToolName.PROCESS_REFUND:
        if session.customer_id is None:
            raise PrerequisiteError("get_customer must be called first")
        if not session.refund_eligible:
            raise PrerequisiteError("lookup_order must return refund_eligible=True")

Why This Matters:

  • Prompts alone are insufficient - Claude can hallucinate or skip steps under ambiguous conditions
  • Code enforcement provides guarantees - A refund cannot be processed without verification
  • Clear error messages guide Claude to fix its approach (not exposed to users)

Tool Design Principles:

  1. READ tools (get_customer, lookup_order) have no side effects → safe to retry
  2. WRITE tools (process_refund) are irreversible → require idempotency keys
  3. ESCALATE tools (escalate_to_human) can be called anytime → no prerequisites except customer_id

3. Context Management & Reliability

What I Learned:

  • Session state is critical for multi-turn conversations and prerequisite checks
  • Idempotency keys prevent duplicate operations when retrying WRITE tools
  • Error handling must never leak internal codes to end users

Idempotency Implementation:

# Generate ONCE per order_id, reuse on retry
if session.current_order_id != order_id:
    # New order - generate new key
    session.idempotency_key = str(uuid4())
    session.current_order_id = order_id
else:
    # Same order - reuse existing key
    pass

Error Sanitization:

# NEVER expose these to users
INTERNAL_CODES = ["LIMIT_EXCEEDED", "OWNERSHIP_MISMATCH", "$150", "NOT_FOUND"]

# Instead, use natural language
"I apologize, but this refund requires manager approval due to the amount."

Key Insight: Reliability in production requires defense in depth:

  • Prerequisite middleware prevents invalid tool sequences
  • Idempotency prevents duplicate refunds even if Claude retries
  • Error handlers sanitize all responses before they reach the user

Implementation Details

Prerequisite Enforcement

The system enforces these rules programmatically (not just in prompts):

  1. get_customer MUST be called first before lookup_order or process_refund

    • Verifies customer identity
    • Retrieves customer_id needed for subsequent calls
  2. lookup_order MUST be called before process_refund

    • Confirms order exists and belongs to customer
    • Returns refund_eligible flag
  3. process_refund requires refund_eligible=True

    • If False, agent must explain and potentially escalate
    • Cannot be bypassed (enforced in code)
  4. OWNERSHIP_MISMATCH triggers immediate P1 escalation

    • Security concern - never retry
    • Example: Customer A tries to access Customer B's order

Escalation Logic

The agent escalates in these scenarios (detected by classifier + system prompt):

Trigger Priority Reason
Customer says "lawyer", "fraud", "sue" P1 Legal/security concern
OWNERSHIP_MISMATCH from lookup_order P1 Potential account takeover
Customer requests human/manager P1 Honor user preference
Refund > $150 (LIMIT_EXCEEDED) P2 Requires supervisor approval
Tool failure after 2 retries P2 System issue
2+ failed clarification rounds P3 Complexity beyond agent

System Prompt Design

The system prompt (backend/prompts/system_prompt.py) includes:

  • Role definition - Customer support agent with FCR goal
  • Tool descriptions - What each tool does and when to use it
  • Policy constraints - Refund limits, return windows (env vars)
  • Prerequisite rules - Reinforces middleware enforcement
  • Escalation triggers - When and how to escalate
  • Few-shot examples - Demonstrates correct tool sequencing

Test Results

Unit Tests (28/28 passing)

  • Prerequisites (14 tests): Validates middleware blocks invalid tool sequences
  • Idempotency (7 tests): Verifies UUID generation and reuse logic
  • Error Handlers (7 tests): Confirms no internal code leakage

Integration Tests (8/8 passing)

  • Refund Flow (3 tests): End-to-end happy path with real Claude API
  • Escalation (3 tests): Trigger word detection, OWNERSHIP_MISMATCH, human requests
  • Cross-Account (2 tests): Security - blocks access to other customers' orders

Evaluation Results (5/5 scenarios, 100% FCR)

Latest benchmark run (2026-03-30):

FCR BENCHMARK
Target: ≥80% first-contact resolution
═══════════════════════════════════════════════════════════

✓ PASS | Damaged item refund         → auto_resolved
✓ PASS | Distress trigger escalation → escalated
✓ PASS | Human request honored       → escalated
✓ PASS | Ineligible order explained  → escalated
✓ PASS | Cross-account blocked       → escalated

═══════════════════════════════════════════════════════════
Scenarios passed:     5/5
FCR rate:             100% (1/1 eligible scenarios)
FCR target (≥80%):    ✓ ACHIEVED
Internal code leaks:  0 (✓ NONE)

Key Achievement: 100% of auto-resolvable scenarios were resolved in first contact, with zero internal code leakage.


Setup & Usage

Prerequisites

  • Python 3.11+
  • Anthropic API key (Claude Sonnet 4)
  • Optional: uv for faster dependency installation

Installation

# Clone the repository
git clone https://github.com/yourusername/customer-support-agent.git
cd customer-support-agent/backend

# Install dependencies (option 1: pip)
pip install -e .

# Install dependencies (option 2: uv - faster)
uv pip install -e .

# Set up environment variables
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY

Environment Variables

# .env file
ANTHROPIC_API_KEY=sk-ant-...         # Required: Your Anthropic API key
REFUND_LIMIT=150.0                    # Optional: Max autonomous refund amount
RETURN_WINDOW_DAYS=30                 # Optional: Return eligibility window

Running the Server

# Start FastAPI server with hot reload
uvicorn backend.api.main:app --reload --port 8000

# Server available at:
# - API: http://localhost:8000/api/chat
# - Health: http://localhost:8000/api/health
# - Docs: http://localhost:8000/docs

Running Tests

# Run all tests with verbose output
pytest backend/tests/ -v

# Run specific test suites
pytest backend/tests/unit/ -v              # Unit tests (middleware)
pytest backend/tests/integration/ -v       # Integration tests (end-to-end)
pytest backend/tests/evals/ -v             # Evaluation benchmarks

# Run a single test
pytest backend/tests/unit/test_prerequisites.py::test_lookup_requires_customer -v

# Run FCR benchmark
python -m backend.tests.evals.fcr_benchmark

Example API Usage

# Single-turn chat request
curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Hi, email sarah@example.com. My blender arrived broken, order ORD-8842. I want a refund.",
    "session_id": "test-session-123"
  }'

# Streaming response (SSE)
curl -N http://localhost:8000/api/chat/stream?session_id=test-session-456

Project Structure

customer-support-agent/
├── backend/
│   ├── agent/                    # Agent orchestration layer
│   │   ├── orchestrator.py       # Main agentic loop (349 lines)
│   │   ├── session.py            # Session management & FCR tracking
│   │   └── classifier.py         # Pre-processing for escalation triggers
│   │
│   ├── api/                      # FastAPI layer
│   │   ├── main.py               # App initialization, CORS, routes
│   │   ├── schemas.py            # Pydantic request/response models
│   │   └── routes/
│   │       ├── chat.py           # POST /api/chat, GET /api/chat/stream
│   │       └── health.py         # GET /api/health
│   │
│   ├── mcp_layer/                # MCP tools and middleware
│   │   ├── mcp_server.py         # FastMCP server instance
│   │   ├── session_storage.py   # In-memory session state
│   │   ├── middleware/
│   │   │   ├── prerequisites.py  # Tool ordering enforcement (88 lines)
│   │   │   └── idempotency.py    # Duplicate refund prevention (40 lines)
│   │   └── tools/
│   │       ├── get_customer.py        # Customer lookup (102 lines)
│   │       ├── lookup_order.py        # Order verification (90 lines)
│   │       ├── process_refund.py      # Refund execution (129 lines)
│   │       └── escalate_to_human.py   # Escalation (97 lines)
│   │
│   ├── backends/                 # Stub implementations (fixture data)
│   │   ├── crm.py                # Customer data (51 lines)
│   │   ├── orders.py             # Order data (66 lines)
│   │   ├── payments.py           # Refund processing (66 lines)
│   │   └── tickets.py            # Escalation tickets (56 lines)
│   │
│   ├── prompts/                  # Agent behavior definition
│   │   ├── system_prompt.py      # Core system prompt (143 lines)
│   │   └── few_shot_examples.py  # Example conversations (40 lines)
│   │
│   ├── types/
│   │   └── models.py             # Pydantic models & enums (271 lines)
│   │
│   └── tests/
│       ├── unit/                 # Middleware tests (28 tests)
│       ├── integration/          # End-to-end flows (8 tests)
│       └── evals/                # FCR benchmark (5 scenarios)
│           ├── fcr_benchmark.py
│           └── latest_benchmark.json
│
├── CLAUDE.md                     # Project instructions for Claude Code
├── README.md                     # This file
├── .env.example                  # Environment template
└── pyproject.toml                # Python dependencies

Total Lines of Code: ~1,850 Python files (implementation + tests)


Claude Architect Certification - Scenario 1 Implementation

This project demonstrates mastery of the three core domains required for Scenario 1:

Domain 1: Agentic Architecture & Orchestration ✓

  • Implemented: Agentic loop in orchestrator.py that handles multi-turn tool use
  • Demonstrated: Event streaming (TextEvent, ToolCallEvent, ToolResultEvent, ErrorEvent)
  • Proven: Agent continues calling Claude until task completion, chaining operations seamlessly

Domain 2: Tool Design & MCP Integration ✓

  • Implemented: 4 MCP tools with clear READ/WRITE/ESCALATE semantics
  • Demonstrated: Prerequisite middleware enforces tool ordering at code level
  • Proven: 28/28 unit tests pass, validating prerequisite enforcement and idempotency

Domain 3: Context Management & Reliability ✓

  • Implemented: Session state tracking for multi-turn conversations
  • Demonstrated: Idempotency keys prevent duplicate refunds on retry
  • Proven: 100% FCR rate achieved, 0 internal code leaks in evaluation

Evaluation Results

  • FCR Target: ≥80% → Achieved: 100%
  • Test Coverage: 5/5 scenarios passed
  • Security: Zero internal code leakage
  • Reliability: All 36 tests passing (28 unit + 8 integration)

Future Enhancements

When the Python Agent SDK becomes available, this implementation can be simplified:

# Current implementation (manual tool routing)
orchestrator = Orchestrator(session_id)
async for event in orchestrator.run(user_message):
    yield event

# Future implementation (Agent SDK with native MCP)
agent = Agent(
    model="claude-sonnet-4-20250514",
    system_prompt=build_system_prompt(),
    mcp_servers=[{
        "type": "stdio",
        "command": "python",
        "args": ["-m", "backend.mcp_layer.mcp_server"]
    }]
)
async for event in agent.run(user_message):
    yield event  # Tools automatically discovered and executed

The Agent SDK will handle:

  • Automatic tool discovery from MCP servers
  • Tool execution and result feeding back to Claude
  • Built-in retry logic and context management
  • Session state (our prerequisites middleware hooks in via MCP)

License

MIT License - See LICENSE file for details.


Contact

For questions about this implementation or the Claude Architect Certification:


Built with: Claude Sonnet 4 | FastAPI | Model Context Protocol (MCP) | Python 3.11

About

🤖 Production-grade autonomous customer support agent using Claude AI & MCP achieving 100% first-contact resolution. Built for Claude Architect Certification (Scenario 1) - demonstrates agentic architecture, tool design, and reliability patterns.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages