Meta·7 sizes·General LLM / Vision / Multimodal

Llama models: sizes & hardware to run them

The Llama family spans 7 sizes from 1B to 405B. Each size maps to a different hardware tier — below is the approximate memory each needs at 4-bit and the device we’d start with for a private local deployment.

ToolsReasoningVisionMultilingualLong context

Sizes & hardware

Model	Params	Context	~VRAM @ 4-bit	Minimum device	Recommended
Llama 3.2 1B	1B	128K	~1GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Llama 3.2 3B	3B	128K	~2.5GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Llama 3.1 8B	8B	128K	~6GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Llama 3.2 Vision 11B	11B	128K	~9GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Llama 3.1 70B	70B	128K	~42GB	NVIDIA RTX A6000	NVIDIA B200 (placeholder)
Llama 3.3 70B	70B	128K	~42GB	NVIDIA RTX A6000	NVIDIA B200 (placeholder)
Llama 3.1 405B	405B	128K	~230GB	Supermicro 8x H100 SuperServer	Supermicro 8x H100 SuperServer

Memory figures are approximate working-set estimates (weights + KV cache at modest context); treat as ±. Device picks come from our compatibility engine, best on-prem fit first.

Open each size

General LLM

Llama 3.2 1B

Runs almost anywhere — even on a CPU or a low-power mini PC. The default for edge and ultra-cheap deployments.

General LLM

Llama 3.2 3B

Comfortable on any 8GB GPU, a Mac mini, or a small mini PC. A good entry assistant for a single office.

General LLM

Llama 3.1 8B

Runs comfortably at 4-bit on any 8GB+ GPU, a Mac mini, or a small mini PC. The classic entry point for local AI.

Vision / Multimodal

Llama 3.2 Vision 11B

Plan for 16-24GB to handle the vision encoder plus context comfortably.

General LLM

Llama 3.1 70B

Flagship tier — ~42GB at 4-bit needs a 48GB card, a 64GB+ unified-memory Mac, or multi-GPU.

General LLM

Llama 3.3 70B

Flagship tier — ~42GB at 4-bit means a 48GB card, a 64GB+ unified-memory Mac, or multi-GPU.

General LLM

Llama 3.1 405B

Datacenter tier — realistically a multi-GPU / multi-node or cloud target even at 4-bit.

Run Llama models inside a private AI Business OS

Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.

Explore the AI Business OS