Skip to content

Tools & Frameworks Reference

📖 6 min read resourcestoolsframeworksreference
Comprehensive reference of AI development tools - frameworks, SDKs, deployment platforms, monitoring, and infrastructure.
Key Takeaways
  • Developer reference covering LLM frameworks, SDKs, inference servers, and vector databases
  • Each section explains what the tool category is and when you would use it
  • Includes deployment platforms for hosting AI applications

AI development tools organized by category - from app frameworks to production infrastructure.

If you’re new to this: think of building an AI app like building a house. Frameworks are your toolkit. SDKs talk to the model provider. Inference servers run the model yourself. Vector databases store your data’s “memory.” Monitoring tells you if it’s working. Deployment platforms host it all.


LLM Frameworks

What this is: Libraries that give you pre-built pieces for connecting your app to an LLM - like a starter kit for AI features. Instead of writing code to call an API from scratch, you use a framework that handles prompts, conversation history, and tool integration for you.

When you’d use it: You’re building a chatbot, a RAG system, or any app that talks to an LLM. Start here.

FrameworkLanguageBest ForKey Feature
LangChainPython, JSGeneral LLM appsChains, agents, tool integration, 100+ integrations
LlamaIndexPythonRAG and dataData ingestion, indexing, query engines
HaystackPythonSearch & QAPipeline architecture, document processing
Semantic KernelC#, Python, JavaEnterprise appsMicrosoft ecosystem, planner patterns
Vercel AI SDKJS/TSWeb appsStreaming, tool calling, edge-ready
LangGraphPythonAgent orchestrationGraph-based agent workflows, sub-agents

Quick Decision

Need a general-purpose LLM framework? → LangChain
Building RAG on your own data? → LlamaIndex
Building a web app with streaming? → Vercel AI SDK
Building multi-agent systems? → LangGraph

Official SDKs

What this is: The official “phone line” to a model provider (Anthropic, OpenAI, Google). An SDK is a small library that handles authentication, request formatting, and error handling so you can focus on your app logic.

When you’d use it: You’re building something and know which provider you want. Skip the framework if your use case is simple - just use the SDK directly.

ProviderSDKKey Features
Anthropicanthropic-python, anthropic-sdk-typescriptMessages API, streaming, tool use
OpenAIopenai-python, openai-nodeChat, embeddings, images, audio, assistants
Googlegoogle-generativeaiGemini models, vision, function calling
MistralmistralaiOpen-weight models, embeddings
Together AItogether-python100+ open models, fast inference

Model Serving & Inference

What this is: Software that lets you run an open-weight model (like Llama or DeepSeek) on your own hardware instead of calling a paid API. This gives you full control, privacy, and potentially lower cost at scale.

When you’d use it: You need privacy (data can’t leave your servers), you’re processing millions of requests (API costs add up), or you want to run models offline.

ToolDeploymentKey Feature
vLLMSelf-hostedPagedAttention, continuous batching, SOTA throughput
OllamaLocalOne-command local models, broad model library
TGISelf-hostedHugging Face integration, token streaming
TensorRT-LLMNVIDIA GPUMax performance on NVIDIA hardware
llama.cppCPU/EdgeRuns on laptops, phones, Raspberry Pi
RunPodCloud GPUOn-demand GPU rental, serverless inference
ModalCloud serverlessGPU serverless, great for async batch processing
ReplicateCloud APIRun open models via API, pay per second

Vector Databases

What this is: A special database that stores “embeddings” - mathematical representations of text meaning. When you search, it finds things by meaning rather than exact keywords. This is how RAG (Retrieval-Augmented Generation) works: you store your documents as vectors, then retrieve relevant ones when a user asks a question.

When you’d use it: You’re building RAG - an app that answers questions based on your own documents (knowledge base, support docs, research papers).

DatabaseDeploymentKey Feature
PineconeManaged cloudServerless, high-scale, low maintenance
WeaviateSelf-hosted/CloudHybrid search, GraphQL API, multi-modal
ChromaEmbeddedSimple, local-first, Python-native
QdrantSelf-hosted/CloudRust-native, filtering, high performance
pgvectorPostgreSQLExtends Postgres, no new infra needed
MilvusSelf-hosted/CloudBillion-scale, distributed, GPU acceleration

Evaluation & Monitoring

What this is: Tools that check whether your LLM app is working correctly. Evaluation tests individual responses (“did the answer contain the right information?”), while monitoring tracks production metrics (“are error rates going up?”).

When you’d use it: You have an LLM app in production and need to catch regressions before users notice. Or you’re testing prompts and need to compare which version performs better.

ToolBest ForKey Feature
DeepEvalLLM unit testingLLM-as-judge, pytest integration
LangSmithTracing + evaluationLangChain-native, debugging
Weights & BiasesExperiment trackingResearch, model comparison
ArizeProduction monitoringML observability, LLM-specific dashboards
LangfuseOpen-source observabilitySelf-hostable, traces + evals

Deployment Platforms

What this is: Services that host your AI app on the internet so anyone can use it. You write code on your laptop, push it to one of these platforms, and they give you a URL.

When you’d use it: You’ve built an AI app and want to share it. Pick Vercel for web apps, Railway for backend APIs, Modal for GPU-heavy batch jobs.

PlatformBest ForFree Tier
VercelWeb apps (Next.js)Generous free tier
RailwayBackend APIsLimited free tier
Cloudflare WorkersEdge apps100K req/day free
Fly.ioContainerized appsFree tier for small projects
ModalGPU serverless$30/mo free compute
RenderWeb servicesFree tier (sleeps after inactivity)

Quick Reference

I want to: → Use:
──────────────────────────────────────────────────────────
Build an LLM app → LangChain or Vercel AI SDK
Run a model locally → Ollama
Serve an open model in production → vLLM
Store embeddings for RAG → pgvector (already on Postgres) or Pinecone
Evaluate LLM outputs → DeepEval
Debug a LangChain app → LangSmith
Host a Next.js AI app → Vercel
Run GPU batch jobs → Modal

For conversational AI tools and coding assistants, see the Tools Guide.