Skip to content

Media & Creative

📖 3 min read deepmindgooglemediavideoimagemusicvoice
Google DeepMind's media generation tools — Veo (video), Imagen (image), Lyria 3 (music), Nano Banana 2 (image editing), and Gemini Audio (voice/sound). How they work, use cases, and access.
Key Takeaways
  • Veo: cinematic video generation with audio. Available in Gemini app, Google Flow, and AI Studio
  • Nano Banana 2: pro-level image generation and editing at Flash speed. Available in Gemini app
  • Lyria 3: music generation with vocals — compose, experiment with acoustic details
  • Gemini Audio: real-time audio generation, voice synthesis, and audio understanding

Google DeepMind’s media generation portfolio spans video, image, music, and audio — all powered by Gemini’s multimodal architecture.

Veo — Video Generation

Veo is DeepMind’s leading video generation model, capable of cinematic-quality output:

FeatureDetail
InputText, image, or video prompts
OutputCinematic video with audio
ResolutionUp to 4K
DurationVariable (seconds to minutes)
Available ViaGemini app, Google Flow, AI Studio, Vertex AI
# API access via Vertex AI
from vertexai.vision_models import VideoGenerationModel
model = VideoGenerationModel.from_pretrained("veo-3")
response = model.generate_video(
prompt="A drone shot of a futuristic city at sunset, with flying cars",
duration_seconds=10
)

Use Cases

  • Marketing and advertising videos
  • Product demonstrations
  • Educational content
  • Creative storytelling
  • Social media content

Nano Banana 2 (Gemini Image) — Image Generation

Pro-level image generation and editing with Flash-level speed:

FeatureDetail
CapabilitiesText-to-image, image editing, style transfer, inpainting
SpeedFlash tier — fast generation
QualityProfessional-grade, commercial-ready
Available ViaGemini app, AI Studio, Vertex AI
# Via Gemini API with image output
model = genai.GenerativeModel("gemini-3.5-pro")
response = model.generate_content([
"Generate an image of a modern AI research lab with holographic displays"
])

Lyria 3 — Music Generation

Lyria 3 is DeepMind’s most advanced music generation model:

FeatureDetail
CapabilitiesCompose with vocals, experiment with acoustic details
GenresWide range — classical to electronic
CustomizationMood, tempo, instrumentation, vocals
Available ViaGemini app, AI Studio, Vertex AI

Gemini Audio — Voice & Sound

Real-time audio models built on Gemini:

FeatureDetail
Voice synthesisNatural-sounding voices, multiple languages
Music generationInstrumental and vocal
Audio understandingTranscribe, analyze, describe audio content
Real-timeLow-latency generation and streaming
Available ViaGemini Live API, AI Studio, Vertex AI
# Gemini Live API for real-time audio
# Available via AI Studio with Gemini Live
# Supports bidirectional audio streaming

Tool Selection Matrix

TaskBest Tool
Cinematic video from textVeo
Social media video (quick)Google Flow + Omni
Professional image generationNano Banana 2
Logo / icon generationImagen
Background music for videosLyria 3
Song with vocalsLyria 3
Voice narrationGemini Audio
Podcast productionGemini Audio + NotebookLM

Where Next