Meta·3 sizes·Coding LLM
CodeLlama models: sizes & hardware to run them
The CodeLlama family spans 3 sizes from 7B to 34B. Each size maps to a different hardware tier — below is the approximate memory each needs at 4-bit and the device we’d start with for a private local deployment.
Code
Sizes & hardware
| Model | Params | Context | ~VRAM @ 4-bit | Minimum device | Recommended |
|---|---|---|---|---|---|
| CodeLlama 7B | 7B | 16K | ~5GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
| CodeLlama 13B | 13B | 16K | ~8GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
| CodeLlama 34B | 34B | 16K | ~21GB | NVIDIA GeForce RTX 3090 | NVIDIA B200 (placeholder) |
Memory figures are approximate working-set estimates (weights + KV cache at modest context); treat as ±. Device picks come from our compatibility engine, best on-prem fit first.
Open each size
Coding LLM
CodeLlama 7B
8GB+ GPUs at 4-bit. A well-established small coder for responsive in-editor completion.
Coding LLM
CodeLlama 13B
16GB+ GPUs at 4-bit. The mid-size CodeLlama for stronger completion and light refactoring.
Coding LLM
CodeLlama 34B
A 24GB+ card (RTX 3090/4090) or 32GB+ Mac at 4-bit. The largest CodeLlama for a single box.
Run CodeLlama models inside a private AI Business OS
Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.
Explore the AI Business OS