Google·3 sizes·General LLM
Gemma models: sizes & hardware to run them
The Gemma family spans 3 sizes from 2B to 27B. Each size maps to a different hardware tier — below is the approximate memory each needs at 4-bit and the device we’d start with for a private local deployment.
Multilingual
Sizes & hardware
| Model | Params | Context | ~VRAM @ 4-bit | Minimum device | Recommended |
|---|---|---|---|---|---|
| Gemma 2 2B | 2B | 8K | ~1.6GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
| Gemma 2 9B | 9B | 8K | ~7GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
| Gemma 2 27B | 27B | 8K | ~17GB | NVIDIA GeForce RTX 3090 | NVIDIA B200 (placeholder) |
Memory figures are approximate working-set estimates (weights + KV cache at modest context); treat as ±. Device picks come from our compatibility engine, best on-prem fit first.
Open each size
General LLM
Gemma 2 2B
Runs on a CPU or any small GPU. Strong response quality for a 2B model, with a short context window.
General LLM
Gemma 2 9B
8-12GB GPUs at 4-bit. Strong quality for its size, with a shorter native context window.
General LLM
Gemma 2 27B
Needs 24GB+ for comfortable 4-bit inference (RTX 3090/4090 class). Short context limits very long documents.
Run Gemma models inside a private AI Business OS
Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.
Explore the AI Business OS