LLM Hardware Requirements

Approximate memory each open model needs per quantization, and the smallest catalog device that can run it. Figures are working-set estimates (weights + KV cache at modest context) — treat as ±. As a rule of thumb, the 4-bit (Q4) column is the memory you need to budget.

General LLM

Model	Params	Context	Q4	Q8	FP16	Minimum device
Llama 3.2 1B	~1B	128K	~1GB	~1.5GB	~3GB	NVIDIA GeForce RTX 3060 12GB
Llama 3.2 3B	~3B	128K	~2.5GB	~4GB	~7GB	NVIDIA GeForce RTX 3060 12GB
Llama 3.1 8B	~8B	128K	~6GB	~9GB	~17GB	NVIDIA GeForce RTX 3060 12GB
Llama 3.1 70B	~70B	128K	~42GB	~75GB	~140GB	NVIDIA RTX A6000
Llama 3.3 70B	~70B	128K	~42GB	~75GB	~140GB	NVIDIA RTX A6000
Llama 3.1 405B	~405B	128K	~230GB	~410GB	~810GB	Supermicro 8x H100 SuperServer
Qwen2.5 7B	~7B	128K	~5.5GB	~8GB	~15GB	NVIDIA GeForce RTX 3060 12GB
Qwen2.5 14B	~14B	128K	~10GB	~16GB	~30GB	NVIDIA GeForce RTX 3060 12GB
Qwen2.5 32B	~32B	128K	~20GB	~34GB	~64GB	NVIDIA GeForce RTX 3090
Qwen2.5 72B	~72B	128K	~44GB	~78GB	~145GB	Apple Mac mini (M4 Pro)
Qwen3 8B	~8B	128K	~6GB	~9GB	~17GB	NVIDIA GeForce RTX 3060 12GB
Qwen3 14B	~14B	128K	~10GB	~16GB	~30GB	NVIDIA GeForce RTX 3060 12GB
Qwen3 32B	~32B	128K	~20GB	~34GB	~64GB	NVIDIA GeForce RTX 3090
Qwen3 235B-A22B (MoE)	~235B	128K	~130GB	~235GB	~470GB	NVIDIA B200 (placeholder)
Mistral 7B	~7B	32K	~5GB	~8GB	~15GB	NVIDIA GeForce RTX 3060 12GB
Mistral Small 24B	~24B	32K	~14GB	~25GB	~48GB	Intel Arc A770 16GB
Mixtral 8x7B (MoE)	~47B	32K	~28GB	~50GB	~90GB	NVIDIA GeForce RTX 5090 (placeholder)
Gemma 2 9B	~9B	8K	~7GB	~10GB	~19GB	NVIDIA GeForce RTX 3060 12GB
Gemma 2 27B	~27B	8K	~17GB	~29GB	~54GB	NVIDIA GeForce RTX 3090
Phi-3 Medium (14B)	~14B	128K	~9GB	~15GB	~28GB	NVIDIA GeForce RTX 3060 12GB
Phi-4 (14B)	~14B	16K	~9GB	~15GB	~28GB	NVIDIA GeForce RTX 3060 12GB
Qwen2.5 0.5B	small	32K	~0.4GB	~0.6GB	~1GB	NVIDIA GeForce RTX 3060 12GB
Qwen2.5 1.5B	~1.5B	32K	~1GB	~1.7GB	~3GB	NVIDIA GeForce RTX 3060 12GB
Qwen2.5 3B	~3B	32K	~2.2GB	~3.4GB	~6GB	NVIDIA GeForce RTX 3060 12GB
Gemma 2 2B	~2B	8K	~1.6GB	~2.4GB	~4GB	NVIDIA GeForce RTX 3060 12GB
Gemma 3 4B	~4B	128K	~3GB	~4.5GB	~8GB	NVIDIA GeForce RTX 3060 12GB
Gemma 3 12B	~12B	128K	~8GB	~13GB	~24GB	NVIDIA GeForce RTX 3060 12GB
Gemma 3 27B	~27B	128K	~17GB	~29GB	~54GB	NVIDIA GeForce RTX 3090
Phi-3.5 Mini (3.8B)	~3.8B	128K	~2.5GB	~4GB	~8GB	NVIDIA GeForce RTX 3060 12GB
Mistral Nemo 12B	~12B	128K	~8GB	~13GB	~24GB	NVIDIA GeForce RTX 3060 12GB
Granite 3 2B	~2B	128K	~1.6GB	~2.4GB	~4GB	NVIDIA GeForce RTX 3060 12GB
Granite 3 8B	~8B	128K	~6GB	~9GB	~17GB	NVIDIA GeForce RTX 3060 12GB
SmolLM2 1.7B	~1.7B	8K	~1.1GB	~1.9GB	~3.4GB	NVIDIA GeForce RTX 3060 12GB
Qwen2.5 7B Instruct	~7.6B	33K	~4.9GB	~8.4GB	~15.2GB	NVIDIA GeForce RTX 3060 12GB

Reasoning

Model	Params	Context	Q4	Q8	FP16	Minimum device
DeepSeek-R1 Distill 1.5B	~1.5B	128K	~1.5GB	~2.5GB	~4GB	NVIDIA GeForce RTX 3060 12GB
DeepSeek-R1 Distill 8B	~8B	128K	~6GB	~9GB	~17GB	NVIDIA GeForce RTX 3060 12GB
DeepSeek-R1 Distill 14B	~14B	128K	~10GB	~16GB	~30GB	NVIDIA GeForce RTX 3060 12GB
DeepSeek-R1 Distill 32B	~32B	128K	~20GB	~34GB	~64GB	NVIDIA GeForce RTX 3090
DeepSeek-R1 671B (MoE)	~671B	128K	~400GB	~700GB	~1340GB	Supermicro 8x H100 SuperServer
DeepSeek-R1 Distill Llama 70B	~70B	128K	~42GB	~75GB	~140GB	NVIDIA RTX A6000

Coding LLM

Model	Params	Context	Q4	Q8	FP16	Minimum device
Qwen2.5-Coder 7B	~7B	128K	~5.5GB	~8GB	~15GB	NVIDIA GeForce RTX 3060 12GB
Qwen2.5-Coder 14B	~14B	128K	~10GB	~16GB	~30GB	NVIDIA GeForce RTX 3060 12GB
Qwen2.5-Coder 32B	~32B	128K	~20GB	~34GB	~64GB	NVIDIA GeForce RTX 3090
DeepSeek-Coder V2 (class)	~16B	128K	~11GB	~18GB	~33GB	Intel Arc A770 16GB
Qwen2.5-Coder 1.5B	~1.5B	32K	~1GB	~1.7GB	~3GB	NVIDIA GeForce RTX 3060 12GB
CodeLlama 7B	~7B	16K	~5GB	~8GB	~14GB	NVIDIA GeForce RTX 3060 12GB
CodeLlama 13B	~13B	16K	~8GB	~14GB	~26GB	NVIDIA GeForce RTX 3060 12GB
CodeLlama 34B	~34B	16K	~21GB	~37GB	~68GB	NVIDIA GeForce RTX 3090
StarCoder2 3B	~3B	16K	~2.2GB	~3.4GB	~6GB	NVIDIA GeForce RTX 3060 12GB
StarCoder2 7B	~7B	16K	~5GB	~8GB	~14GB	NVIDIA GeForce RTX 3060 12GB
StarCoder2 15B	~15B	16K	~10GB	~17GB	~30GB	NVIDIA GeForce RTX 3060 12GB
Qwen2.5 Coder 7B Instruct	~7.6B	131K	~4.9GB	~8.4GB	~15.2GB	NVIDIA GeForce RTX 3060 12GB

Embedding

Model	Params	Context	Q4	Q8	FP16	Minimum device
Nomic Embed Text (class)	small	8K	—	—	~1GB	NVIDIA GeForce RTX 3060 12GB
BGE-M3 Embeddings (class)	small	8K	—	—	~2GB	NVIDIA GeForce RTX 3060 12GB
mxbai-embed-large (class)	small	0.5K	—	—	~1GB	NVIDIA GeForce RTX 3060 12GB
all-MiniLM (class)	small	0.5K	—	—	~0.2GB	NVIDIA GeForce RTX 3060 12GB
Snowflake Arctic Embed (class)	small	0.5K	—	—	~1GB	NVIDIA GeForce RTX 3060 12GB

Vision / Multimodal

Model	Params	Context	Q4	Q8	FP16	Minimum device
Qwen2-VL 7B (vision)	~7B	32K	~7GB	~10GB	~17GB	NVIDIA GeForce RTX 3060 12GB
Llama 3.2 Vision 11B	~11B	128K	~9GB	~14GB	~24GB	NVIDIA GeForce RTX 3060 12GB
LLaVA 7B (vision)	~7B	4K	~6GB	~9GB	~16GB	NVIDIA GeForce RTX 3060 12GB
LLaVA 13B (vision)	~13B	4K	~9GB	~15GB	~26GB	NVIDIA GeForce RTX 3060 12GB
LLaVA-Llama3 8B (vision)	~8B	8K	~6.5GB	~9.5GB	~17GB	NVIDIA GeForce RTX 3060 12GB
Moondream 2 (vision)	~1.8B	2K	~1.5GB	~2.5GB	~4GB	NVIDIA GeForce RTX 3060 12GB
MiniCPM-V 8B (vision)	~8B	32K	~7GB	~10GB	~17GB	NVIDIA GeForce RTX 3060 12GB

Best local LLMs →Compare hardware for LLMs →Find the right model →

Size a machine for your private AI Business OS

Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.

Get started