Skip to content

Gemini Models

📖 5 min read deepmindgooglegeminimodelsreferencevendor-comparison
Deep comparison of Gemini 3.5, Gemini Omni, Nano Banana 2 (Gemini Image), and Gemini Audio — capabilities, multimodality, context windows, pricing, and model selection guide.
Key Takeaways
  • Two active Gemini 3 generations: Gemini 3.1 Pro (preview, latest flagship) and Gemini 3.5 Flash (stable, frontier Flash tier)
  • Gemini 3 lineup also includes 3 Flash (preview), 3.1 Flash-Lite (stable), 3.1 Flash Live (voice), 3.1 Flash TTS (speech)
  • Gemini Omni Flash: separate native multimodal model — create video/image/audio/text from any input; available via Gemini app and Google Flow
  • Nano Banana 2 (= Gemini 3.1 Flash Image): pro-level image generation/editing; available via Gemini app and AI Studio
  • All models available via Gemini API, Google AI Studio (free tier), and Vertex AI (enterprise)

Current Gemini Lineup — May 2026

The Gemini 3 generation has two parallel tracks: Pro (preview, maximum quality) and Flash (stable, optimized for cost/latency). Specialized models — Omni, Nano Banana, Live, TTS — are released independently.

ModelStatusInput (/1M)Output (/1M)ContextBest For
Gemini 3.1 ProPreview2(200K)/2 (≤200K) / 4 (>200K)12(200K)/12 (≤200K) / 18 (>200K)Long-context tieredFrontier reasoning, agentic coding
Gemini 3.5 FlashStable$1.50$91M tokensFrontier performance at Flash speed
Gemini 3 FlashPreviewTBDTBDTBDFrontier-class at fraction of Pro cost
Gemini 3.1 Flash-LiteStable0.25(text/image/video)/0.25 (text/image/video) / 0.50 (audio)$1.50TBDBudget tier, high-volume tasks
Gemini 3.1 Flash LivePreviewReal-time voice/dialogue (Live API)
Gemini 3.1 Flash TTSPreviewLow-latency speech synthesis
Gemini 2.5 ProStable1.25(200K)/1.25 (≤200K) / 2.50 (>200K)10(200K)/10 (≤200K) / 15 (>200K)Long-context tieredPrevious-gen Pro (still widely deployed)
Gemini 2.5 FlashStable0.30(text/image/video)/0.30 (text/image/video) / 1 (audio)$2.501M tokensPrevious-gen Flash
Gemini 2.5 Flash-LiteStable0.10(text/image/video)/0.10 (text/image/video) / 0.30 (audio)$0.40TBDPrevious-gen budget tier

Pricing note: Pro models use long-context tiered pricing — different rates above 200K tokens. Batch and Flex variants typically offer 50% reductions. See ai.google.dev/gemini-api/docs/pricing for current rates.

Specialized Multimodal Models

Released separately from the main 3 series — each is a distinct model with its own surface.

ModelWhat It IsAccess
Gemini Omni FlashNative multimodal “world model” — video in/out, image, audio, text. Unveiled at I/O 2026Gemini app, Google Flow, YouTube creation surfaces (eligible Google AI subscribers)
Nano Banana 2 (= Gemini 3.1 Flash Image)Pro-level image generation and editing at Flash speedGemini app, AI Studio
VeoCinematic video generationGemini app, Flow, AI Studio (see Media & Creative)
ImagenHigh-quality image generationGemini app, API
Lyria 3Music generation with vocalsGemini app, AI Studio

Gemini 3 Pro vs Flash — Which Track?

What matters most?
├─ Maximum reasoning / agentic depth → Gemini 3.1 Pro (preview)
│ Use when: complex multi-step coding, R&D, deep analysis
│ Cost: $2/$12 per 1M ≤200K; $4/$18 >200K
├─ Frontier performance at Flash speed → Gemini 3.5 Flash (stable)
│ Use when: production agents, sustained coding workflows
│ Cost: $1.50/$9 per 1M
├─ Budget tier, high-volume → Gemini 3.1 Flash-Lite
│ Use when: classification, routing, simple chat
│ Cost: $0.25/$1.50 per 1M (text)
├─ Real-time voice → Gemini 3.1 Flash Live + Live API
├─ Multimodal creation → Gemini Omni Flash (via Gemini app / Flow)
├─ Image generation → Nano Banana 2 / Imagen
└─ Open-weight, self-hosted → Gemma 4 (see [Gemma](/deepmind/gemma))

Gemini Omni — Native Multimodality

Gemini Omni is designed for multimodal creation — it natively processes and generates video, image, audio, and text in a single model. Unlike models that convert everything to text first, Omni works directly in each modality.

  • Video-in, video-out — describe a scene and get generated video
  • Image-in, audio-out — analyze a photo and narrate it
  • Text-in, everything-out — one prompt creates video + image + audio + text

Available via Gemini app and Google Flow.

Nano Banana 2 — Image Generation

Pro-level image generation and editing at Flash-level speed:

CapabilityDescription
Text-to-imageGenerate from text descriptions
Image editingModify, enhance, transform existing images
Style transferApply artistic styles to images
ResolutionHigh resolution output, commercial quality

Available in the Gemini app and Google AI Studio.

Audio — Voice, Speech & Music

Audio capabilities are split across several specialized models rather than a single “Gemini Audio” product:

ModelPurposeAccess
Gemini 3.1 Flash LiveReal-time voice / dialogue (low-latency, full-duplex)Live API, AI Studio
Gemini 3.1 Flash TTSLow-latency text-to-speechAPI, AI Studio
Lyria 3Music generation with vocalsGemini app, AI Studio (see Media & Creative)
Gemini multimodal inputAudio understanding (analyze, transcribe, describe) — built into Pro / FlashGemini API, AI Studio, Vertex AI

Comparing Across Models

For a broader comparison across Gemini, GPT, Claude, and DeepSeek, see the Models Decision Guide.