Skip to content

Gemma — Open Models

📖 3 min read deepmindgooglegemmaopen-sourcedeployment
Google DeepMind's Gemma 4 — the most intelligent open models. Sizes, deployment (Ollama, Hugging Face, Vertex AI), benchmarking, and comparison with Llama and other open models.
Key Takeaways
  • Gemma 4: DeepMind's latest open-weight family — text, audio, and image input, 140+ languages, context up to 256K tokens
  • Available from Ollama, Hugging Face, Kaggle, and Vertex AI Model Garden
  • Gemma 4 is licensed under Apache 2.0; earlier Gemma 1/2/3 use the custom Gemma Terms of Use
  • Optimized for on-device (phones, laptops) and self-hosted deployment

Gemma is Google DeepMind’s family of open-weight models. Unlike Gemini (API-only), Gemma models can be downloaded, self-hosted, and fine-tuned. Gemma 4 is released under Apache 2.0 — earlier generations (Gemma 1/2/3) use the custom Gemma Terms of Use, which adds a Prohibited Use Policy on top of broad permissive rights.

Gemma 4 — Capabilities

CapabilityDetail
ModalitiesText, audio, image input
Languages140+
Context windowUp to 256K tokens
LicenseApache 2.0
SizesMultiple parameter counts across the family — check the Gemma model cards for the current released sizes
Specialized variantsGemma 3n (lightweight), EmbeddingGemma, FunctionGemma, PaliGemma 1/2 (multimodal), ShieldGemma 1/2 (safety), RecurrentGemma (research)

Getting Started

Ollama — Local Deployment

Terminal window
# Install Ollama: ollama.com
ollama pull gemma4
ollama run gemma4

Hugging Face

from transformers import AutoModelForCausalLM, AutoTokenizer
# Check huggingface.co/google for current released sizes and -it (instruct) variants
model_id = "google/gemma-4-it"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
inputs = tokenizer("Explain quantum computing", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=500)
print(tokenizer.decode(outputs[0]))

Vertex AI

Gemma 4 is available in the Vertex AI Model Garden — deploy directly to a managed endpoint, or use the Garden’s one-click serving for benchmarking before commit.

Specialized Gemma Variants

The Gemma family extends beyond the base instruct models:

VariantPurpose
Gemma 3nLightweight footprint for low-resource devices
EmbeddingGemmaNumerical text representations for retrieval / RAG
FunctionGemmaSpecialised for function calling
PaliGemma 1 / 2Multimodal (vision + language)
ShieldGemma 1 / 2Safety / content evaluation
RecurrentGemmaResearch variant exploring recurrent architectures

Gemma vs Llama

FeatureGemma 4Llama (latest)
DeveloperGoogle DeepMindMeta
LicenseApache 2.0 (Gemma 4)Llama Community License (custom, with use restrictions)
Context windowUp to 256KVaries by variant
MultimodalText + audio + image inputVision variants available
Google integrationNative: Vertex AI Model Garden, AI StudioNone
EcosystemOllama, Hugging Face, Kaggle, Vertex AI, TFLite, CoreMLOllama, Hugging Face, Together AI, Replicate

License note: “Apache 2.0” (Gemma 4) is a true permissive license — no use-case restrictions. The older Llama Community License and the Gemma Terms of Use (Gemma 1/2/3) both add a Prohibited Use Policy. Match your compliance review to the specific model and version you deploy.

Use Cases

Use CaseRecommended PathDeployment
On-device AI (phone)Smallest available Gemma + INT4 quantizationCoreML, TFLite
Local chatbot (laptop)Mid-size GemmaOllama
Code assistantLarger Gemma variantOllama or Vertex AI
Fine-tuned agentLargest available sizeVertex AI
Production RAGEmbeddingGemma + larger generator GemmaVertex AI + Vector Search

Where to Find Gemma