Case Study: Customer Support Chatbot

📖 7 min read resourcescase-studychatbot

Building a RAG-powered support bot that deflects 40% of tickets while costing under $100/day

Key Takeaways

RAG-powered chatbot achieving 40% ticket deflection rate
Uses Claude Sonnet with Chroma vector database
Includes evaluation framework for measuring success

Company: Ecommerce platform with 100K+ active customers
Problem: 10K support tickets/day; response time 8+ hours; support costs $500K/year
Solution: RAG-powered chatbot with human handoff
Results: 40% ticket deflection; $50K/year savings; 15-minute average response time

The Challenge

Every day:

10,000 customer support tickets arrive
70% are routine questions (returns, shipping, billing)
20-person support team, working around the clock
Average resolution time: 8-12 hours
Customers complain about wait times in reviews

The business problem: Support costs are growing faster than revenue. Need to handle 3x volume by next year without hiring 3x staff.

Initial Approach (What Didn’t Work)

First attempt: Simple LLM prompt

System: "You are a helpful support agent. Answer customer questions."
User: "I ordered 3 weeks ago and haven't received my package"
Model: "I understand your frustration. Most packages arrive within 5-7 business days.
        I'd recommend waiting a bit longer, or you could contact customer service..."
Problem: Hallucinated - customer ordered 3 weeks ago, clearly overdue.

Why it failed:

No access to customer data (order history, shipping status)
Model had no way to check order status
Generated generic, unhelpful responses
Customers still had to contact support anyway

Final Architecture (RAG + Handoff)

Customer Question
    ↓
1. Search Knowledge Base
   (FAQs, return policy, shipping info)
    ↓
2. Retrieve Top 3 Documents
    ↓
3. Search Customer Database
   (Order status, past tickets, account info)
    ↓
4. Combine Context + Question
    ↓
5. Generate Response with Claude
    ↓
6. Can Model Confidently Answer?
   ├─ Yes → Send response to customer
   └─ No → Hand off to human agent

Key Components

1. Knowledge Base (RAG)

Indexed all customer-facing documentation:

Return policy (when/how returns work)
Shipping & delivery (how long, where to track)
Billing & payment (refunds, charges)
Account & login (password reset, 2FA)
Product FAQs (fits, materials, care)

Documents: 500 pages (2-3K tokens each)
Chunking: 512-token chunks with 50-token overlap
Embedding: OpenAI’s text-embedding-3-small
Vector DB: Pinecone (production-grade, fast)

2. Customer Data Integration

Connected to internal APIs:

Customer account (email, past orders, preferences)
Order status (when ordered, when shipped, tracking)
Return status (if applicable, when expected back)
Support history (past tickets, resolution)

This wasn’t indexed as embeddings - retrieved directly via API calls with the customer’s ID.

3. LLM with Tool Use

Claude with 3 tools:

tools = [
    Tool(
        name="search_knowledge_base",
        description="Search company FAQs and policies"
    ),
    Tool(
        name="get_order_status",
        description="Get a customer's order status and tracking"
    ),
    Tool(
        name="escalate_to_human",
        description="Escalate complex issues to human agent"
    )
]

4. Confidence Threshold

Model decides: can I answer this confidently?

if confidence_score < 0.7:
    # Complex issue, escalate
    return escalate_to_human()
else:
    # Confident answer
    return generate_response()

Implementation Details

How the key pieces work under the hood:

Confidence Scoring

Not just probability - actual heuristics:

confidence = 0.0

# +0.3 if we found relevant docs
if retrieved_docs_score > 0.7:
    confidence += 0.3

# +0.3 if we have clear customer data
if order_status == "shipped" or "delivered":
    confidence += 0.3

# +0.2 if question is factual (not emotional)
if question_type == "factual":
    confidence += 0.2

# -0.2 if model generated uncertainty ("I don't know", "unclear")
if "uncertain" in response:
    confidence -= 0.2

# Only respond if >= 0.7
if confidence >= 0.7:
    send_to_customer()
else:
    escalate_to_human()

Response Format

All responses follow a template:

Hi [Customer Name],

Thank you for reaching out. [Personalized answer with specific info from order/docs]

If this doesn't solve it, I'm escalating you to a specialist who'll reach out within 2 hours.

Best,
Support Bot

Why the template? Consistency, feels less like a bot, sets expectations.

Conversation Memory

For multi-turn conversations:

Customer Q1: "Where's my package?"
Bot: "[response + status]"
Customer Q2: "When will it arrive?"
Bot: "Based on tracking, it should arrive tomorrow..."

Kept last 5 messages in context (100-token budget).

Rollout Strategy

Phase 1 (Week 1): Pilot with 10% of tickets, monitor

Ran support team through bot responses before sending
Measured accuracy: 87% (benchmark: 100% human accuracy)
Identified edge cases manually

Phase 2 (Week 2-3): Ramp to 50%, auto-send

Started auto-sending responses without review
Set escalation threshold high (70% confidence)
Monitored “follow-up” rate (customer asks again = failure)

Phase 3 (Week 4+): Full rollout at 100%

Confident enough to fully automate
Ramped down escalation threshold to 60%
Monitored customer satisfaction

Results

The numbers after 8 weeks in production:

Metrics

Metric	Before	After	Change
Tickets/day	10,000	10,000	-
Bot deflection rate	0%	40%	+40%
Tickets handled by humans	10,000	6,000	-40%
Avg response time	8 hours	15 min (bot), 2 hours (human)	-95%
Customer satisfaction	3.2/5	4.1/5	+28%
Support costs/year	$500K	$450K	-$50K

What Worked

40% deflection is real
- Most support is refundable (returns) or informational (tracking)
- Automated responses handle 90% of these instantly
- Humans freed up for complex issues
Customers prefer quick bot to slow human
- Even if imperfect, 2-minute response from bot > 8-hour wait for human
- Satisfaction: quick generic answer > slow perfect answer
Escalation threshold critical
- Too high (above 80%): bot sends wrong answers, harms trust
- Too low (below 50%): escalates too much, defeats purpose
- Sweet spot: 60-70%

What Surprised Everyone

Escalation rate lower than expected
- Feared 30-50% escalation rate
- Actually 5-10% (meaning bot really confident)
- Shows model works really well with good context
Follow-up rate nearly zero
- Expected 20% of customers to ask again (bad answer)
- Actually 2% (almost all from escalated issues = human handles it)
- Strong signal that bot responses were good
Cost of RAG less than expected
- Vector DB + embeddings + LLM calls: ~$0.003 per ticket
- At 4000 tickets/day deflected: ~$12/day
- Human cost savings: ~$140/day
- ROI: 11:1

Lessons Learned

Key takeaways from building and shipping this system:

What We’d Do Differently

Index documents earlier
- Spent weeks manually writing FAQs mid-project
- Should have had 100% documentation before starting
- Implementation would have taken 2 weeks instead of 8
Test confidence thresholds with data
- Guessed at 70% initially
- Should have run A/B tests (60% vs 70% vs 80%) first
- Final threshold (60%) was very different from initial guess
Monitor escalation reasons
- Added analytics: why did we escalate?
- “Complex issue” (10%), “low confidence” (70%), “policy exception” (20%)
- Insights: could have tuned prompts for “low confidence” cases
Start with FAQ instead of full docs
- 500 pages was overkill
- First 50 FAQ items covered 80% of questions
- Should have started with FAQ, added more gradually

For Others Building Support Bots

Do this first:

Audit your actual support tickets (last 1000)
Categorize by type (tracking, returns, billing, etc.)
Write FAQs for top 80% (usually 40-50 questions)
Build bot with just those 50
Expand gradually based on real escalations

Don’t do this:

Don’t index 500 pages before testing
Don’t build perfect documentation first
Don’t aim for 95%+ confidence (overkill, less deflection)
Don’t skip the escalation/human-in-the-loop piece

Technical Stack

LLM: Claude 3 Sonnet (fast, accurate for retrieval)
Vector DB: Pinecone (production-ready, fast)
Embedding: OpenAI text-embedding-3-small
Customer DB: PostgreSQL (existing system)
API: Python FastAPI (handles requests from website)
Frontend: React chatbot widget on support.company.com

Cost breakdown (monthly):

Pinecone: ~$30 (low volume)
Embeddings: ~$50 (ingestion + searches)
LLM calls: ~$350 (4000 deflected tickets × 1000 tokens avg)
Infrastructure: ~$100
**Total: ~ $530/month** (vs$ 42K/month human support costs)