Meta·7 sizes·General LLM / Vision / Multimodal
Llama models: sizes & hardware to run them
The Llama family spans 7 sizes from 1B to 405B. Each size maps to a different hardware tier — below is the approximate memory each needs at 4-bit and the device we’d start with for a private local deployment.
ToolsReasoningVisionMultilingualLong context
Sizes & hardware
| Model | Params | Context | ~VRAM @ 4-bit | Minimum device | Recommended |
|---|---|---|---|---|---|
| Llama 3.2 1B | 1B | 128K | ~1GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
| Llama 3.2 3B | 3B | 128K | ~2.5GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
| Llama 3.1 8B | 8B | 128K | ~6GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
| Llama 3.2 Vision 11B | 11B | 128K | ~9GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
| Llama 3.1 70B | 70B | 128K | ~42GB | NVIDIA RTX A6000 | NVIDIA B200 (placeholder) |
| Llama 3.3 70B | 70B | 128K | ~42GB | NVIDIA RTX A6000 | NVIDIA B200 (placeholder) |
| Llama 3.1 405B | 405B | 128K | ~230GB | Supermicro 8x H100 SuperServer | Supermicro 8x H100 SuperServer |
Memory figures are approximate working-set estimates (weights + KV cache at modest context); treat as ±. Device picks come from our compatibility engine, best on-prem fit first.
Open each size
General LLM
Llama 3.2 1B
Runs almost anywhere — even on a CPU or a low-power mini PC. The default for edge and ultra-cheap deployments.
General LLM
Llama 3.2 3B
Comfortable on any 8GB GPU, a Mac mini, or a small mini PC. A good entry assistant for a single office.
General LLM
Llama 3.1 8B
Runs comfortably at 4-bit on any 8GB+ GPU, a Mac mini, or a small mini PC. The classic entry point for local AI.
Vision / Multimodal
Llama 3.2 Vision 11B
Plan for 16-24GB to handle the vision encoder plus context comfortably.
General LLM
Llama 3.1 70B
Flagship tier — ~42GB at 4-bit needs a 48GB card, a 64GB+ unified-memory Mac, or multi-GPU.
General LLM
Llama 3.3 70B
Flagship tier — ~42GB at 4-bit means a 48GB card, a 64GB+ unified-memory Mac, or multi-GPU.
General LLM
Llama 3.1 405B
Datacenter tier — realistically a multi-GPU / multi-node or cloud target even at 4-bit.
Run Llama models inside a private AI Business OS
Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.
Explore the AI Business OS