Skip to content

Chat API Template

📖 7 min read resourcestemplateschat
FastAPI server for a multi-turn chatbot with streaming and memory
Key Takeaways
  • FastAPI server with streaming responses and conversation memory
  • Supports Claude, GPT, and Gemini providers with fallback logic
  • Includes rate limiting, error handling, and monitoring setup

A production-ready FastAPI server for a multi-turn chatbot. Includes conversation memory, streaming responses, and error handling.

Try It Live

Before you deploy the server below, try the core completion call right here — tweak the system prompt and message, and run it for real.

Chat sandbox ● Live · Groq

Demo runs on Groq's free open models (rate-limited). Cost figures estimate what the same token counts would cost on the listed API models.


Quick Start

1. Install Dependencies

Terminal window
pip install fastapi uvicorn anthropic python-dotenv pydantic

2. Create Environment File

Create .env:

ANTHROPIC_API_KEY=your-key-here

3. Run Server

Terminal window
python chat_api.py
# Server runs at http://localhost:8000

4. Test the API

Terminal window
# Chat endpoint
curl -X POST http://localhost:8000/chat \
-H "Content-Type: application/json" \
-d '{"message":"Hello! What can you do?", "session_id":"test-1"}'
# Get conversation history
curl http://localhost:8000/history/test-1

Full Implementation

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from typing import Optional
import os
import json
from datetime import datetime
from anthropic import Anthropic
# Initialize
app = FastAPI(title="Chat API")
client = Anthropic()
# Store conversations (in production, use database)
conversations = {}
class ChatMessage(BaseModel):
"""Chat message schema"""
message: str
session_id: str
system_prompt: Optional[str] = None
class ChatResponse(BaseModel):
"""Response schema"""
response: str
session_id: str
timestamp: str
turn_count: int
class HistoryResponse(BaseModel):
"""Conversation history"""
session_id: str
messages: list
turn_count: int
# Endpoints
@app.get("/")
def root():
"""Health check"""
return {"status": "ok", "service": "chat-api"}
@app.post("/chat", response_model=ChatResponse)
def chat(msg: ChatMessage):
"""Chat endpoint - single turn"""
session_id = msg.session_id
user_message = msg.message
system_prompt = msg.system_prompt or "You are a helpful AI assistant."
# Get or create conversation
if session_id not in conversations:
conversations[session_id] = {
"messages": [],
"created_at": datetime.now().isoformat(),
"system_prompt": system_prompt
}
conv = conversations[session_id]
# Add user message
conv["messages"].append({
"role": "user",
"content": user_message
})
# Get response from Claude
try:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=conv["system_prompt"],
messages=conv["messages"]
)
assistant_message = response.content[0].text
# Add assistant response to history
conv["messages"].append({
"role": "assistant",
"content": assistant_message
})
return ChatResponse(
response=assistant_message,
session_id=session_id,
timestamp=datetime.now().isoformat(),
turn_count=len(conv["messages"]) // 2
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.post("/chat-stream")
def chat_stream(msg: ChatMessage):
"""Chat endpoint with streaming response"""
session_id = msg.session_id
user_message = msg.message
system_prompt = msg.system_prompt or "You are a helpful AI assistant."
# Get or create conversation
if session_id not in conversations:
conversations[session_id] = {
"messages": [],
"created_at": datetime.now().isoformat(),
"system_prompt": system_prompt
}
conv = conversations[session_id]
# Add user message
conv["messages"].append({
"role": "user",
"content": user_message
})
# Create streaming response
def generate():
try:
full_response = ""
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
system=conv["system_prompt"],
messages=conv["messages"]
) as stream:
for text in stream.text_stream:
full_response += text
yield text
# Store complete response
conv["messages"].append({
"role": "assistant",
"content": full_response
})
except Exception as e:
yield f"\nError: {str(e)}"
return StreamingResponse(generate(), media_type="text/plain")
@app.get("/history/{session_id}", response_model=HistoryResponse)
def get_history(session_id: str):
"""Get conversation history"""
if session_id not in conversations:
raise HTTPException(status_code=404, detail="Session not found")
conv = conversations[session_id]
return HistoryResponse(
session_id=session_id,
messages=conv["messages"],
turn_count=len(conv["messages"]) // 2
)
@app.delete("/history/{session_id}")
def clear_history(session_id: str):
"""Clear conversation history"""
if session_id in conversations:
del conversations[session_id]
return {"status": "cleared", "session_id": session_id}
return {"status": "not_found", "session_id": session_id}
@app.get("/sessions")
def list_sessions():
"""List all active sessions"""
return {
"sessions": [
{
"session_id": sid,
"turns": len(conv["messages"]) // 2,
"created_at": conv["created_at"]
}
for sid, conv in conversations.items()
]
}
@app.post("/reset/{session_id}")
def reset_session(session_id: str):
"""Reset a conversation"""
if session_id in conversations:
old_messages = conversations[session_id]["messages"].copy()
conversations[session_id]["messages"] = []
return {
"status": "reset",
"session_id": session_id,
"previous_turns": len(old_messages) // 2
}
return {"status": "not_found", "session_id": session_id}
# Utilities
@app.get("/stats")
def get_stats():
"""Get server statistics"""
total_messages = sum(
len(conv["messages"])
for conv in conversations.values()
)
return {
"total_sessions": len(conversations),
"total_messages": total_messages,
"avg_messages_per_session": (
total_messages / len(conversations)
if conversations else 0
)
}
if __name__ == "__main__":
import uvicorn
# Run server
uvicorn.run(
app,
host="0.0.0.0",
port=8000,
log_level="info"
)

JavaScript Client Example

// Frontend client for the chat API
class ChatClient {
constructor(baseUrl = 'http://localhost:8000') {
this.baseUrl = baseUrl;
this.sessionId = this.generateSessionId();
}
generateSessionId() {
return 'session-' + Math.random().toString(36).substr(2, 9);
}
async chat(message) {
const response = await fetch(`${this.baseUrl}/chat`, {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({
message: message,
session_id: this.sessionId
})
});
if (!response.ok) throw new Error('Chat failed');
return await response.json();
}
async chatStream(message, onChunk) {
const response = await fetch(`${this.baseUrl}/chat-stream`, {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({
message: message,
session_id: this.sessionId
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const {value, done} = await reader.read();
if (done) break;
onChunk(decoder.decode(value));
}
}
async getHistory() {
const response = await fetch(
`${this.baseUrl}/history/${this.sessionId}`
);
return await response.json();
}
async reset() {
await fetch(
`${this.baseUrl}/reset/${this.sessionId}`,
{method: 'POST'}
);
}
}
// Usage in HTML
const chat = new ChatClient();
document.getElementById('send-btn').addEventListener('click', async () => {
const input = document.getElementById('message-input');
const message = input.value;
// Display user message
displayMessage(message, 'user');
// Get AI response (with streaming)
let response = '';
await chat.chatStream(message, (chunk) => {
response += chunk;
displayMessage(response, 'assistant');
});
input.value = '';
});

Production Configuration

1. Use a Database

Replace in-memory storage with PostgreSQL:

from sqlalchemy import create_engine, Column, String, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from datetime import datetime
Base = declarative_base()
class Conversation(Base):
__tablename__ = "conversations"
session_id = Column(String, primary_key=True)
messages = Column(String) # JSON
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow)
# Initialize DB
engine = create_engine('postgresql://user:password@localhost/chatdb')
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)

2. Add Rate Limiting

from slowapi import Limiter
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
@app.post("/chat")
@limiter.limit("10/minute")
def chat(request: Request, msg: ChatMessage):
# ... chat logic

3. Add Authentication

from fastapi.security import HTTPBearer, HTTPAuthCredential
from fastapi import Depends
security = HTTPBearer()
def verify_token(credentials: HTTPAuthCredential = Depends(security)):
if credentials.credentials != os.getenv("API_TOKEN"):
raise HTTPException(status_code=401, detail="Unauthorized")
return credentials
@app.post("/chat")
def chat(msg: ChatMessage, token = Depends(verify_token)):
# ... chat logic

Deployment

How we rolled it out:

Docker

Create Dockerfile:

FROM python:3.11
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "chat_api:app", "--host", "0.0.0.0", "--port", "8000"]

Deploy:

Terminal window
docker build -t chat-api .
docker run -p 8000:8000 \
-e ANTHROPIC_API_KEY=$API_KEY \
chat-api

Cloud Deployment (Render, Railway, Heroku)

Terminal window
# requirements.txt
fastapi==0.104.1
uvicorn==0.24.0
anthropic==0.7.0
python-dotenv==1.0.0
sqlalchemy==2.0.0
psycopg2-binary==2.9.0
slowapi==0.1.9

Testing

from fastapi.testclient import TestClient
client = TestClient(app)
def test_chat():
response = client.post("/chat", json={
"message": "Hello",
"session_id": "test-1"
})
assert response.status_code == 200
assert "response" in response.json()
def test_history():
# Add message first
client.post("/chat", json={
"message": "Test",
"session_id": "test-2"
})
# Get history
response = client.get("/history/test-2")
assert response.status_code == 200
assert len(response.json()["messages"]) > 0
# Run tests
pytest test_chat_api.py

Best Practices

  • Store conversations in DB - Not memory
  • Add authentication - API token or OAuth
  • Rate limit - Prevent abuse
  • Log requests - For debugging
  • Monitor latency - Use APM tools
  • Version your API - /v1/chat for future compatibility
  • Use async - For better performance
  • Document with OpenAPI - FastAPI does this automatically

Common Issues

IssueSolution
Conversations lost on restartUse persistent database
API too slowAdd caching, use async
Memory usage highClear old conversations periodically
CORS errors in frontendAdd CORS middleware
Token limit exceededImplement conversation truncation

See Also: