Building a Production Customer Support Agent with Claude

Q: How do I prevent the agent from hallucinating product details?

Always require a search_knowledge_base tool call before answering factual product questions. Add a rule to your system prompt: "Never state a product fact without first searching the knowledge base." Evaluate KB retrieval accuracy separately from response quality.

A production Claude customer support agent routes queries to the right model, searches a knowledge base via tool use, creates tickets in your helpdesk, and hands off to a human when confidence drops — all in a single agentic loop. The minimum viable stack costs roughly $0.003 per resolved ticket using Claude Haiku 3.5 for tier-1 queries, rising to ~$0.015 for complex cases that require Sonnet. This guide covers every layer from system prompt to deployment. For foundational Agent SDK concepts, see the Claude Agent SDK Guide.

How a Production CS Agent Differs from a Demo

Most demo agents consist of a system prompt and a single messages.create() call. That works for showcasing intent; it fails in production for five reasons:

Layer	Demo gap	Production requirement
Intent detection	None — Claude guesses	Explicit classifier routes query type
Knowledge retrieval	Claude's training data	Tool call to live KB / vector store
Ticket lifecycle	Not modeled	Tool calls to create, update, escalate
Escalation	Never happens	Explicit triggers: low confidence, emotion, policy edge cases
Cost	Single model for everything	Model routing: Haiku → Sonnet → human

A production agent is a stateful loop, not a one-shot prompt. Each turn may call tools, update internal state, and decide whether to continue or exit to a human queue.

Step 1: System Prompt Design

The system prompt for a CS agent does four things: sets the persona, defines scope, lists escalation triggers, and constrains output format. Keep it under 800 tokens — it goes into every request.

You are Aria, a customer support agent for Acme SaaS. You help customers with:
- Account and billing questions
- Product feature usage
- Bug reports and status checks

You are NOT authorized to:
- Approve refunds over $200
- Modify subscription tiers without manager approval
- Access data from other customer accounts

ESCALATION: Transfer to a human agent if any of these are true:
- Customer expresses frustration two or more times in the conversation
- The query requires a policy exception
- Your confidence in the correct answer is low (say "I'm not sure")
- Customer explicitly requests a human

TOOLS: Always search the knowledge base before answering factual product questions.
Create a ticket for every interaction that doesn't resolve in one turn.

TONE: Concise, professional, empathetic. No marketing language.

Why explicit scope limits matter: Claude will try to be helpful by default. Without explicit "you are NOT authorized to" statements, it may offer to do things your backend doesn't support — or worse, things that violate policy.

Step 2: Tool Use Setup

A CS agent needs at minimum three tools: knowledge base lookup, ticket creation, and order/account status. Here is the complete tool schema for all three:

import anthropic
import json

client = anthropic.Anthropic()

CS_TOOLS = [
    {
        "name": "search_knowledge_base",
        "description": (
            "Search the product knowledge base for articles, FAQs, and "
            "troubleshooting guides. Use this before answering any product "
            "or policy question. Returns ranked articles with text excerpts."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Natural language search query"
                },
                "max_results": {
                    "type": "integer",
                    "description": "Number of results to return (default 3)",
                    "default": 3
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "create_support_ticket",
        "description": (
            "Create a support ticket in the helpdesk system. Call this when "
            "the issue cannot be resolved in one turn, or when the customer "
            "needs follow-up from the team."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "subject": {"type": "string"},
                "description": {"type": "string"},
                "priority": {
                    "type": "string",
                    "enum": ["low", "normal", "high", "urgent"]
                },
                "customer_email": {"type": "string"}
            },
            "required": ["subject", "description", "priority", "customer_email"]
        }
    },
    {
        "name": "get_order_status",
        "description": (
            "Look up the status of a customer order or subscription. "
            "Returns order details, current status, and last updated timestamp."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {"type": "string"},
                "customer_email": {"type": "string"}
            },
            "required": ["customer_email"]
        }
    }
]

Tool description quality is critical. Anthropic's documentation notes that Claude uses the description field as the primary signal for deciding when to call a tool. Vague descriptions ("search for things") produce unreliable behavior. Be explicit about what the tool returns, when to use it, and what inputs it expects.

Step 3: Escalation Logic

Escalation is not just a system prompt instruction — it needs to be enforced programmatically. Claude may ignore its own instructions under adversarial input or long context drift. The safest pattern: Claude signals escalation intent via a structured tool call, and your orchestration layer acts on it.

Add an escalation tool to your tool list:

ESCALATION_TOOL = {
    "name": "escalate_to_human",
    "description": (
        "Transfer this conversation to a human support agent. Use when: "
        "customer is frustrated (2+ expressions), policy exception needed, "
        "confidence is low, or customer requests human. "
        "This immediately ends the AI turn."
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "reason": {
                "type": "string",
                "enum": [
                    "customer_request",
                    "policy_exception",
                    "low_confidence",
                    "repeated_frustration",
                    "complex_technical"
                ]
            },
            "summary": {
                "type": "string",
                "description": "Brief summary of the issue for the human agent"
            }
        },
        "required": ["reason", "summary"]
    }
}

In your orchestration loop, when you detect escalate_to_human in a tool call, stop the loop and push the conversation to your human queue with the summary attached. The human agent gets context without reading the full transcript.

Step 4: Cost Controls via Model Routing

The biggest CS agent cost mistake is routing every query to Sonnet. Haiku 3.5 handles ~70% of tier-1 support volume (password resets, billing lookups, feature how-tos) at one-tenth the price. Route up:

def classify_query_complexity(query: str) -> str:
    """
    Fast classifier to route query to correct model tier.
    Returns 'simple', 'moderate', or 'complex'.
    """
    SIMPLE_PATTERNS = [
        "how do i", "where is", "what is my", "reset password",
        "billing date", "cancel subscription", "download invoice"
    ]
    COMPLEX_SIGNALS = [
        "not working", "error", "broken", "data loss", "security",
        "refund", "legal", "compliance", "outage"
    ]

    query_lower = query.lower()

    if any(p in query_lower for p in COMPLEX_SIGNALS):
        return "complex"
    if any(p in query_lower for p in SIMPLE_PATTERNS):
        return "simple"
    return "moderate"

MODEL_MAP = {
    "simple": "claude-haiku-4-5",
    "moderate": "claude-haiku-4-5",
    "complex": "claude-sonnet-4-5"
}

Real cost breakdown for 10,000 tickets/month (based on April 2026 pricing):

Tier	Volume	Model	Cost/ticket	Monthly cost
Simple (70%)	7,000	Haiku 3.5	~$0.002	~$14
Complex (30%)	3,000	Sonnet 4.5	~$0.018	~$54
Total	10,000		~$0.0068 avg	~$68

Compare that to a human agent at $15–25/hour handling 8–12 tickets/hour: the same 10,000 tickets costs $12,500–$31,000. The agent pays for itself by deflecting ~85% of volume.

Step 5: Testing and Evaluation

Ship no CS agent without a structured eval harness. CS agent quality has four measurable dimensions:

Resolution rate — % of conversations resolved without escalation (target: 75–85% for tier-1)
Escalation accuracy — % of escalations that were genuinely needed (false positives waste human time)
Factual accuracy — % of KB-backed answers that are correct (test with golden question set)
Tone compliance — automated check that responses match persona guidelines

Build a golden test set of 100–200 historical tickets with known resolutions. Run your agent against them weekly and track regression across model versions.

def evaluate_cs_agent(test_cases: list[dict]) -> dict:
    """
    Run CS agent against test cases and return metrics.
    Each test case: {query, expected_resolution, requires_escalation}
    """
    results = {"resolved": 0, "escalated": 0, "total": len(test_cases)}

    for case in test_cases:
        response = run_cs_agent(case["query"])
        if response["escalated"] == case["requires_escalation"]:
            results["escalation_accuracy"] = results.get("escalation_accuracy", 0) + 1
        if not case["requires_escalation"] and response["resolved"]:
            results["resolved"] += 1

    results["resolution_rate"] = results["resolved"] / results["total"]
    return results

Complete Production Example

This is a full, runnable CS agent with tool use, model routing, escalation detection, and conversation state management.

import anthropic
import json
from typing import Any

client = anthropic.Anthropic()

# --- Mock tool implementations ---

def search_knowledge_base(query: str, max_results: int = 3) -> dict:
    """Replace with your actual vector DB / search API call."""
    return {
        "results": [
            {
                "title": "How to reset your password",
                "excerpt": "Go to Settings > Security > Reset Password. A link will be emailed to you.",
                "url": "https://help.acme.com/password-reset"
            }
        ],
        "query": query
    }

def create_support_ticket(
    subject: str, description: str, priority: str, customer_email: str
) -> dict:
    """Replace with your helpdesk API (Zendesk, Linear, etc.)."""
    ticket_id = f"TKT-{abs(hash(subject)) % 100000:05d}"
    return {"ticket_id": ticket_id, "status": "created", "priority": priority}

def get_order_status(customer_email: str, order_id: str = None) -> dict:
    """Replace with your order management system call."""
    return {
        "order_id": order_id or "ORD-99234",
        "status": "active",
        "plan": "Pro",
        "next_billing": "2026-05-15",
        "amount": "$49/mo"
    }

def escalate_to_human(reason: str, summary: str) -> dict:
    """Replace with your queue push (Intercom, Zendesk, Slack alert, etc.)."""
    print(f"[ESCALATION] Reason: {reason} | Summary: {summary}")
    return {"escalated": True, "queue_position": 3, "eta_minutes": 4}

# --- Tool dispatch ---

TOOL_IMPLEMENTATIONS = {
    "search_knowledge_base": search_knowledge_base,
    "create_support_ticket": create_support_ticket,
    "get_order_status": get_order_status,
    "escalate_to_human": escalate_to_human,
}

def execute_tool(tool_name: str, tool_input: dict) -> Any:
    fn = TOOL_IMPLEMENTATIONS.get(tool_name)
    if not fn:
        return {"error": f"Unknown tool: {tool_name}"}
    return fn(**tool_input)

# --- Model routing ---

def select_model(query: str) -> str:
    COMPLEX_SIGNALS = ["error", "broken", "not working", "refund", "data", "outage", "security"]
    if any(s in query.lower() for s in COMPLEX_SIGNALS):
        return "claude-sonnet-4-5"
    return "claude-haiku-4-5"

# --- Core agent loop ---

SYSTEM_PROMPT = """You are Aria, a customer support agent for Acme SaaS.

You help with: account questions, billing, product features, and bug reports.
You are NOT authorized to: approve refunds over $200, modify subscriptions unilaterally.

RULES:
- Always search the knowledge base before answering product questions.
- Create a ticket if the issue requires follow-up.
- Escalate to human if: customer is frustrated 2+ times, policy exception needed,
  confidence is low, or customer requests a human.

Respond concisely and professionally."""

def run_cs_agent(user_message: str, customer_email: str = "user@example.com") -> dict:
    """
    Run the CS agent loop for a single customer message.
    Returns: {response: str, escalated: bool, ticket_id: str | None}
    """
    model = select_model(user_message)
    messages = [{"role": "user", "content": user_message}]

    all_tools = CS_TOOLS + [ESCALATION_TOOL]
    escalated = False
    ticket_id = None
    final_response = ""

    for _ in range(8):  # max 8 tool calls per turn
        response = client.messages.create(
            model=model,
            max_tokens=1024,
            system=SYSTEM_PROMPT,
            tools=all_tools,
            messages=messages
        )

        # Append assistant turn
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            for block in response.content:
                if hasattr(block, "text"):
                    final_response = block.text
            break

        if response.stop_reason == "tool_use":
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)

                    # Track side effects
                    if block.name == "escalate_to_human":
                        escalated = True
                    if block.name == "create_support_ticket" and "ticket_id" in result:
                        ticket_id = result["ticket_id"]

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result)
                    })

            messages.append({"role": "user", "content": tool_results})

            # Stop loop immediately after escalation
            if escalated:
                break

    return {
        "response": final_response,
        "escalated": escalated,
        "ticket_id": ticket_id,
        "model_used": model
    }

# --- Example usage ---

if __name__ == "__main__":
    queries = [
        "How do I reset my password?",
        "I've been charged twice this month and I'm really frustrated!",
        "What's the status of my subscription?"
    ]
    for q in queries:
        print(f"\nQuery: {q}")
        result = run_cs_agent(q, customer_email="customer@example.com")
        print(f"Response: {result['response'][:200]}")
        print(f"Model: {result['model_used']} | Escalated: {result['escalated']} | Ticket: {result['ticket_id']}")

Expected output for the three test queries:

"How do I reset my password?" → Haiku, KB search, direct answer, ~$0.002
"Charged twice, frustrated" → Sonnet routed via complexity signal, escalation triggered, ticket created
"Subscription status?" → Haiku, order status tool call, direct answer

Key Architecture Decisions

Why use a tool for escalation instead of text detection? Text-based escalation detection ("if 'human' in response") breaks under paraphrasing. A structured tool call is unambiguous and auditable.

Should I use streaming for CS agents? Yes for live chat interfaces — streaming reduces perceived latency from 2–4 seconds to sub-second first token. Use client.messages.stream() and flush tokens as they arrive. To choose the right model tier for each query, see Haiku vs Sonnet vs Opus: Which Model?.

How many conversation turns should the agent handle? Set a hard cap (typically 6–10 turns). Long conversations increase cost and hallucination risk. If not resolved in 8 turns, auto-escalate.

Should the agent remember previous conversations? Yes. Inject the last 3–5 interactions as context. Full history is expensive; summarize older turns with a Haiku call and inject the summary as a system context block.

FAQ

How much does a Claude-based CS agent cost per month? At 10,000 tickets/month with 70/30 Haiku/Sonnet routing, expect roughly $60–80/month in API costs. The primary cost driver is whether queries are simple (Haiku at $0.80/$4.00 per M tokens) or complex (Sonnet at $3.00/$15.00 per M tokens). Always implement model routing.

Can Claude handle multilingual support? Yes. Claude Sonnet and Haiku perform well in Spanish, French, German, Japanese, Korean, and Portuguese. You do not need separate models per language. Add the customer's language detection as a tool call if you want the agent to explicitly mirror the customer's language.

How do I prevent the agent from hallucinating product details? Always require a search_knowledge_base tool call before answering factual product questions. Add a rule to your system prompt: "Never state a product fact without first searching the knowledge base." Evaluate KB retrieval accuracy separately from response quality.

What's the right escalation rate target? Industry benchmarks for AI-first CS suggest 15–25% escalation rate for tier-1 volume. Below 15% often means the agent is over-confident on complex issues. Above 30% means your KB coverage or system prompt scope limits need work.

How do I handle customers who keep asking the same question differently? Track unresolved intent across turns in conversation state. After two rephrases without resolution, auto-escalate with reason: "low_confidence". This is faster than letting the agent loop indefinitely.

Sources

Anthropic Claude API documentation — Tool use: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
Anthropic model pricing (April 2026): https://www.anthropic.com/pricing
Claude Haiku 3.5 and Sonnet 4.5 model cards: https://www.anthropic.com/claude
Anthropic prompt caching documentation: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
Anthropic agent SDK guide: https://docs.anthropic.com/en/docs/build-with-claude/agents

Take It Further

Claude Agent SDK Cookbook: 40 Production Patterns — 40 battle-tested patterns for Claude agents in production. Retry logic, tool error recovery, parallel sub-agents, cost guardrails, deterministic testing.

167 pages. Complete, runnable Python and TypeScript code throughout.

→ Get Agent SDK Cookbook — $49

30-day money-back guarantee. Instant download.

Building a Production Customer Support Agent with Claude

How a Production CS Agent Differs from a Demo

Step 1: System Prompt Design

Step 2: Tool Use Setup

Step 3: Escalation Logic

Step 4: Cost Controls via Model Routing

Step 5: Testing and Evaluation

Complete Production Example

Key Architecture Decisions

FAQ

Sources

Take It Further

Related guides

Claude로 한국어 PDF 추출하기: 계약서, 보고서, 영수증 자동화 (2026)

Claude Tool Use: Complete Guide to Function Calling

Claude API Structured Output: JSON Mode and Tool Use Patterns

Tool Use in Claude Agent SDK: Complete Guide with Real Examples

Tools and references