Skip to content

Claude Workflows & Best Practices

📖 5 min read claudeworkflowsbest-practicesprompt-engineeringcost-optimization
Claude-specific prompt engineering, tool use patterns, cost optimization strategies, error handling, streaming best practices, and production deployment patterns.
Key Takeaways
  • System prompts with cache_control: 90% input cost savings on repeated context
  • Model routing: Haiku for classification, Sonnet for most tasks, Opus for complex reasoning
  • Tool use patterns: parallel calls when independent, sequential when dependent
  • Error handling: implement retries with exponential backoff, handle rate limits gracefully

Prompt Engineering for Claude

Claude responds best to structured, role-based prompts. Here are Claude-specific patterns:

System Prompts — Define the Role

system_prompt = """You are a senior software architect reviewing code for:
- Security vulnerabilities (OWASP Top 10)
- Performance bottlenecks (N+1 queries, memory leaks)
- Architectural patterns (SOLID principles)
For each issue found, provide:
1. Severity (Critical/High/Medium/Low)
2. File and line reference
3. Explanation in 1-2 sentences
4. Remediation suggestion with code example"""

Thinking Tags (Claude-Specific)

When Claude needs to reason through complex problems, structure its thinking explicitly:

messages = [{
"role": "user",
"content": """<thinking>
Let me break this down:
1. First, I need to understand the user's intent
2. Then check if I have sufficient context
3. Finally, formulate a response that addresses the core need
</thinking>
Analyze the following code for bugs: ..."""
}]

Cache-Aware Prompt Design

Structure prompts to maximize caching:

# ✅ Good: System prompt cached, only user message changes
for query in queries:
client.messages.create(
model="claude-sonnet-4-6",
system=[{"type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral"}}],
messages=[{"role": "user", "content": query}]
)
# Input cost: 1 write (1.25x) + N reads (0.1x each)
# ❌ Bad: Entire prompt changes each time
# Input cost: N full writes (N × base input price)

Few-Shot Examples

messages = [{
"role": "user",
"content": """Extract the meeting action items. Format as JSON.
Example 1:
Input: "We agreed to update the docs by Friday and Sarah will deploy the fix"
Output: {"action_items": [
{"task": "update docs", "assignee": "team", "deadline": "Friday"},
{"task": "deploy fix", "assignee": "Sarah", "deadline": "implicit"}
]}
Example 2:
Input: "John needs to review PR #142 by EOD and Priya will set up the staging env"
Output: {"action_items": [
{"task": "review PR #142", "assignee": "John", "deadline": "EOD"},
{"task": "set up staging env", "assignee": "Priya", "deadline": "implicit"}
]}
Now process: {actual_meeting_notes}"""
}]

Tool Use Patterns

Parallel vs Sequential

# ✅ Parallel: Independent tool calls
# Claude can call get_weather for multiple cities simultaneously
messages = [{
"role": "user",
"content": "What's the weather in SF, NYC, and London?"
}]
# Claude returns multiple tool_use blocks — execute them in parallel
# ✅ Sequential: Dependent tool calls
# First get user location, then get weather for that location
messages = [{
"role": "user",
"content": "What's the weather at my current location?"
}]
# Claude calls get_current_location → you respond → Claude calls get_weather

Tool Design Principles

PrincipleExample
Clear descriptions”Get current weather for a US city” — not “weather function”
Typed schemasUse JSON Schema with types, descriptions, enums
Error handlingReturn structured errors: {"error": "City not found", "suggestions": ["San Francisco, CA"]}
Idempotent where possiblecreate_issue vs get_or_create_issue

Cost Optimization Strategies

1. Model Routing

def route_query(query, complexity="auto"):
if complexity == "auto":
# Use Haiku to classify complexity (cheap)
classification = haiku.messages.create(
model="claude-haiku-4-5",
messages=[{"role": "user", "content": f"Classify complexity (simple/medium/complex): {query}"}]
)
complexity = classification.content[0].text.strip().lower()
if complexity == "simple":
return haiku.messages.create(model="claude-haiku-4-5", ...)
elif complexity == "medium":
return sonnet.messages.create(model="claude-sonnet-4-6", ...)
else:
return opus.messages.create(model="claude-opus-4-8", ...)

2. Prompt Caching

Cache long system prompts and repeated document context:

ScenarioWithout CacheWith CacheSavings
1,000 queries with same 5K token system prompt~$15 (Sonnet)~$2.2585%
10,000 document analyses (same doc, different Qs)~$150~$22.5085%

3. Batch Processing for Volume

# For 10,000 nightly classifications — don't use sync API
# Use Batch API for 50% discount
batch = client.messages.beta.batches.create(
model="claude-haiku-4-5",
messages=[{"role": "user", "content": f"Classify: {text}"}],
# Results in ~15 min, billed at 50% discount
)

4. Token Budget Management

# Set reasonable max_tokens — unused token budget is still reserved
max_tokens = {
"classification": 100,
"short_answer": 500,
"analysis": 2000,
"long_form": 8000,
"code_generation": 16000,
}

Error Handling

import time
from anthropic import Anthropic, RateLimitError, APIError
def call_with_retry(client, **kwargs):
max_retries = 3
base_delay = 1 # seconds
for attempt in range(max_retries):
try:
return client.messages.create(**kwargs)
except RateLimitError:
if attempt < max_retries - 1:
delay = base_delay * (2 ** attempt) # Exponential backoff
time.sleep(delay)
else:
raise
except APIError as e:
if e.status_code >= 500: # Server error, retry
if attempt < max_retries - 1:
time.sleep(base_delay * (2 ** attempt))
else:
raise
else: # Client error (4xx), don't retry
raise

Streaming Best Practices

# ✅ Good: Process events incrementally for better UX
with client.messages.stream(model="claude-sonnet-4-6", ...) as stream:
for event in stream:
if event.type == "content_block_delta":
yield event.delta.text
elif event.type == "message_stop":
break
# ✅ Good: Track usage for cost monitoring
with client.messages.stream(model="claude-sonnet-4-6", ...) as stream:
for text in stream.text_stream:
print(text, end="")
final_message = stream.get_final_message()
print(f"\n\nTokens: {final_message.usage.input_tokens} in, "
f"{final_message.usage.output_tokens} out")

Rate Limit Management

import asyncio
from collections import deque
from datetime import datetime
class RateLimiter:
def __init__(self, max_rpm):
self.max_rpm = max_rpm
self.requests = deque()
async def acquire(self):
now = datetime.now()
# Remove requests older than 1 minute
while self.requests and (now - self.requests[0]).seconds > 60:
self.requests.popleft()
if len(self.requests) >= self.max_rpm:
wait_time = 60 - (now - self.requests[0]).seconds
await asyncio.sleep(max(wait_time, 0))
self.requests.append(now)

For General Prompt Engineering

For broader prompt engineering techniques beyond Claude, see the Prompt Engineering Deep Dive.