LLM Hardware Requirements
Approximate memory each open model needs per quantization, and the smallest catalog device that can run it. Figures are working-set estimates (weights + KV cache at modest context) — treat as ±. As a rule of thumb, the 4-bit (Q4) column is the memory you need to budget.
General LLM
Reasoning
| Model | Params | Context | Q4 | Q8 | FP16 | Minimum device |
|---|---|---|---|---|---|---|
| DeepSeek-R1 Distill 1.5B | ~1.5B | 128K | ~1.5GB | ~2.5GB | ~4GB | NVIDIA GeForce RTX 3060 12GB |
| DeepSeek-R1 Distill 8B | ~8B | 128K | ~6GB | ~9GB | ~17GB | NVIDIA GeForce RTX 3060 12GB |
| DeepSeek-R1 Distill 14B | ~14B | 128K | ~10GB | ~16GB | ~30GB | NVIDIA GeForce RTX 3060 12GB |
| DeepSeek-R1 Distill 32B | ~32B | 128K | ~20GB | ~34GB | ~64GB | NVIDIA GeForce RTX 3090 |
| DeepSeek-R1 671B (MoE) | ~671B | 128K | ~400GB | ~700GB | ~1340GB | Supermicro 8x H100 SuperServer |
| DeepSeek-R1 Distill Llama 70B | ~70B | 128K | ~42GB | ~75GB | ~140GB | NVIDIA RTX A6000 |
Coding LLM
| Model | Params | Context | Q4 | Q8 | FP16 | Minimum device |
|---|---|---|---|---|---|---|
| Qwen2.5-Coder 7B | ~7B | 128K | ~5.5GB | ~8GB | ~15GB | NVIDIA GeForce RTX 3060 12GB |
| Qwen2.5-Coder 14B | ~14B | 128K | ~10GB | ~16GB | ~30GB | NVIDIA GeForce RTX 3060 12GB |
| Qwen2.5-Coder 32B | ~32B | 128K | ~20GB | ~34GB | ~64GB | NVIDIA GeForce RTX 3090 |
| DeepSeek-Coder V2 (class) | ~16B | 128K | ~11GB | ~18GB | ~33GB | Intel Arc A770 16GB |
| Qwen2.5-Coder 1.5B | ~1.5B | 32K | ~1GB | ~1.7GB | ~3GB | NVIDIA GeForce RTX 3060 12GB |
| CodeLlama 7B | ~7B | 16K | ~5GB | ~8GB | ~14GB | NVIDIA GeForce RTX 3060 12GB |
| CodeLlama 13B | ~13B | 16K | ~8GB | ~14GB | ~26GB | NVIDIA GeForce RTX 3060 12GB |
| CodeLlama 34B | ~34B | 16K | ~21GB | ~37GB | ~68GB | NVIDIA GeForce RTX 3090 |
| StarCoder2 3B | ~3B | 16K | ~2.2GB | ~3.4GB | ~6GB | NVIDIA GeForce RTX 3060 12GB |
| StarCoder2 7B | ~7B | 16K | ~5GB | ~8GB | ~14GB | NVIDIA GeForce RTX 3060 12GB |
| StarCoder2 15B | ~15B | 16K | ~10GB | ~17GB | ~30GB | NVIDIA GeForce RTX 3060 12GB |
| Qwen2.5 Coder 7B Instruct | ~7.6B | 131K | ~4.9GB | ~8.4GB | ~15.2GB | NVIDIA GeForce RTX 3060 12GB |
Embedding
| Model | Params | Context | Q4 | Q8 | FP16 | Minimum device |
|---|---|---|---|---|---|---|
| Nomic Embed Text (class) | small | 8K | — | — | ~1GB | NVIDIA GeForce RTX 3060 12GB |
| BGE-M3 Embeddings (class) | small | 8K | — | — | ~2GB | NVIDIA GeForce RTX 3060 12GB |
| mxbai-embed-large (class) | small | 0.5K | — | — | ~1GB | NVIDIA GeForce RTX 3060 12GB |
| all-MiniLM (class) | small | 0.5K | — | — | ~0.2GB | NVIDIA GeForce RTX 3060 12GB |
| Snowflake Arctic Embed (class) | small | 0.5K | — | — | ~1GB | NVIDIA GeForce RTX 3060 12GB |
Vision / Multimodal
| Model | Params | Context | Q4 | Q8 | FP16 | Minimum device |
|---|---|---|---|---|---|---|
| Qwen2-VL 7B (vision) | ~7B | 32K | ~7GB | ~10GB | ~17GB | NVIDIA GeForce RTX 3060 12GB |
| Llama 3.2 Vision 11B | ~11B | 128K | ~9GB | ~14GB | ~24GB | NVIDIA GeForce RTX 3060 12GB |
| LLaVA 7B (vision) | ~7B | 4K | ~6GB | ~9GB | ~16GB | NVIDIA GeForce RTX 3060 12GB |
| LLaVA 13B (vision) | ~13B | 4K | ~9GB | ~15GB | ~26GB | NVIDIA GeForce RTX 3060 12GB |
| LLaVA-Llama3 8B (vision) | ~8B | 8K | ~6.5GB | ~9.5GB | ~17GB | NVIDIA GeForce RTX 3060 12GB |
| Moondream 2 (vision) | ~1.8B | 2K | ~1.5GB | ~2.5GB | ~4GB | NVIDIA GeForce RTX 3060 12GB |
| MiniCPM-V 8B (vision) | ~8B | 32K | ~7GB | ~10GB | ~17GB | NVIDIA GeForce RTX 3060 12GB |
Size a machine for your private AI Business OS
Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.
Get started