Claude Workflows & Best Practices
Prompt Engineering for Claude
Claude responds best to structured, role-based prompts. Here are Claude-specific patterns:
System Prompts — Define the Role
system_prompt = """You are a senior software architect reviewing code for:- Security vulnerabilities (OWASP Top 10)- Performance bottlenecks (N+1 queries, memory leaks)- Architectural patterns (SOLID principles)
For each issue found, provide:1. Severity (Critical/High/Medium/Low)2. File and line reference3. Explanation in 1-2 sentences4. Remediation suggestion with code example"""Thinking Tags (Claude-Specific)
When Claude needs to reason through complex problems, structure its thinking explicitly:
messages = [{ "role": "user", "content": """<thinking>Let me break this down:1. First, I need to understand the user's intent2. Then check if I have sufficient context3. Finally, formulate a response that addresses the core need</thinking>
Analyze the following code for bugs: ..."""}]Cache-Aware Prompt Design
Structure prompts to maximize caching:
# ✅ Good: System prompt cached, only user message changesfor query in queries: client.messages.create( model="claude-sonnet-4-6", system=[{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral"}}], messages=[{"role": "user", "content": query}] )# Input cost: 1 write (1.25x) + N reads (0.1x each)
# ❌ Bad: Entire prompt changes each time# Input cost: N full writes (N × base input price)Few-Shot Examples
messages = [{ "role": "user", "content": """Extract the meeting action items. Format as JSON.
Example 1:Input: "We agreed to update the docs by Friday and Sarah will deploy the fix"Output: {"action_items": [ {"task": "update docs", "assignee": "team", "deadline": "Friday"}, {"task": "deploy fix", "assignee": "Sarah", "deadline": "implicit"}]}
Example 2:Input: "John needs to review PR #142 by EOD and Priya will set up the staging env"Output: {"action_items": [ {"task": "review PR #142", "assignee": "John", "deadline": "EOD"}, {"task": "set up staging env", "assignee": "Priya", "deadline": "implicit"}]}
Now process: {actual_meeting_notes}"""}]Tool Use Patterns
Parallel vs Sequential
# ✅ Parallel: Independent tool calls# Claude can call get_weather for multiple cities simultaneouslymessages = [{ "role": "user", "content": "What's the weather in SF, NYC, and London?"}]# Claude returns multiple tool_use blocks — execute them in parallel
# ✅ Sequential: Dependent tool calls# First get user location, then get weather for that locationmessages = [{ "role": "user", "content": "What's the weather at my current location?"}]# Claude calls get_current_location → you respond → Claude calls get_weatherTool Design Principles
| Principle | Example |
|---|---|
| Clear descriptions | ”Get current weather for a US city” — not “weather function” |
| Typed schemas | Use JSON Schema with types, descriptions, enums |
| Error handling | Return structured errors: {"error": "City not found", "suggestions": ["San Francisco, CA"]} |
| Idempotent where possible | create_issue vs get_or_create_issue |
Cost Optimization Strategies
1. Model Routing
def route_query(query, complexity="auto"): if complexity == "auto": # Use Haiku to classify complexity (cheap) classification = haiku.messages.create( model="claude-haiku-4-5", messages=[{"role": "user", "content": f"Classify complexity (simple/medium/complex): {query}"}] ) complexity = classification.content[0].text.strip().lower()
if complexity == "simple": return haiku.messages.create(model="claude-haiku-4-5", ...) elif complexity == "medium": return sonnet.messages.create(model="claude-sonnet-4-6", ...) else: return opus.messages.create(model="claude-opus-4-8", ...)2. Prompt Caching
Cache long system prompts and repeated document context:
| Scenario | Without Cache | With Cache | Savings |
|---|---|---|---|
| 1,000 queries with same 5K token system prompt | ~$15 (Sonnet) | ~$2.25 | 85% |
| 10,000 document analyses (same doc, different Qs) | ~$150 | ~$22.50 | 85% |
3. Batch Processing for Volume
# For 10,000 nightly classifications — don't use sync API# Use Batch API for 50% discountbatch = client.messages.beta.batches.create( model="claude-haiku-4-5", messages=[{"role": "user", "content": f"Classify: {text}"}], # Results in ~15 min, billed at 50% discount)4. Token Budget Management
# Set reasonable max_tokens — unused token budget is still reservedmax_tokens = { "classification": 100, "short_answer": 500, "analysis": 2000, "long_form": 8000, "code_generation": 16000,}Error Handling
import timefrom anthropic import Anthropic, RateLimitError, APIError
def call_with_retry(client, **kwargs): max_retries = 3 base_delay = 1 # seconds
for attempt in range(max_retries): try: return client.messages.create(**kwargs) except RateLimitError: if attempt < max_retries - 1: delay = base_delay * (2 ** attempt) # Exponential backoff time.sleep(delay) else: raise except APIError as e: if e.status_code >= 500: # Server error, retry if attempt < max_retries - 1: time.sleep(base_delay * (2 ** attempt)) else: raise else: # Client error (4xx), don't retry raiseStreaming Best Practices
# ✅ Good: Process events incrementally for better UXwith client.messages.stream(model="claude-sonnet-4-6", ...) as stream: for event in stream: if event.type == "content_block_delta": yield event.delta.text elif event.type == "message_stop": break
# ✅ Good: Track usage for cost monitoringwith client.messages.stream(model="claude-sonnet-4-6", ...) as stream: for text in stream.text_stream: print(text, end="") final_message = stream.get_final_message() print(f"\n\nTokens: {final_message.usage.input_tokens} in, " f"{final_message.usage.output_tokens} out")Rate Limit Management
import asynciofrom collections import dequefrom datetime import datetime
class RateLimiter: def __init__(self, max_rpm): self.max_rpm = max_rpm self.requests = deque()
async def acquire(self): now = datetime.now() # Remove requests older than 1 minute while self.requests and (now - self.requests[0]).seconds > 60: self.requests.popleft()
if len(self.requests) >= self.max_rpm: wait_time = 60 - (now - self.requests[0]).seconds await asyncio.sleep(max(wait_time, 0))
self.requests.append(now)For General Prompt Engineering
For broader prompt engineering techniques beyond Claude, see the Prompt Engineering Deep Dive.