Alibaba·16 sizes·General LLM / Coding LLM / Vision / Multimodal
Qwen models: sizes & hardware to run them
The Qwen family spans 16 sizes from 0.5B to 235B. Each size maps to a different hardware tier — below is the approximate memory each needs at 4-bit and the device we’d start with for a private local deployment.
ToolsReasoningVisionCodeMultilingualLong context
Sizes & hardware
Memory figures are approximate working-set estimates (weights + KV cache at modest context); treat as ±. Device picks come from our compatibility engine, best on-prem fit first.
Open each size
General LLM
Qwen2.5 0.5B
Runs on virtually any hardware, including CPUs and microcontrollers-class devices. The smallest of the Qwen2.5 line.
General LLM
Qwen2.5 1.5B
Runs comfortably on a CPU or any small GPU. A practical edge size for light high-volume work.
Coding LLM
Qwen2.5-Coder 1.5B
Runs on a CPU or any small GPU. The tiny coder for fast, private in-editor completion.
General LLM
Qwen2.5 3B
Comfortable on any 8GB GPU, a Mac mini, or a small mini PC. A capable small assistant for one office.
General LLM
Qwen2.5 7B
8GB+ GPUs handle it at 4-bit; great for multilingual and tool-using agents on modest hardware.
Coding LLM
Qwen2.5-Coder 7B
8GB+ GPUs at 4-bit. Ideal for responsive in-editor completion on modest hardware.
Vision / Multimodal
Qwen2-VL 7B (vision)
Vision models carry extra encoder overhead; budget a 16GB+ GPU for comfortable use.
General LLM
Qwen3 8B
8GB+ GPUs at 4-bit. A strong, current small generalist with optional step-by-step reasoning.
General LLM
Qwen2.5 14B
Fits comfortably on 16GB+ cards at 4-bit; a capable everyday agent model for a small team.
General LLM
Qwen3 14B
16GB+ cards at 4-bit. A current mid-size pick when you want better reasoning than a 7-8B model.
Coding LLM
Qwen2.5-Coder 14B
16GB+ GPUs at 4-bit. A strong balance of coding quality and footprint for a developer workstation.
General LLM
Qwen2.5 32B
A 24GB card (RTX 3090/4090) or 32GB+ Mac runs it well at 4-bit. The sweet spot for capable single-box agents.
General LLM
Qwen3 32B
A 24GB card or 32GB+ Mac at 4-bit. A current high-quality single-box model with reasoning.
Coding LLM
Qwen2.5-Coder 32B
A 24GB card (RTX 3090/4090) or 32GB+ Mac at 4-bit. The strongest open coder you can run on one consumer card.
General LLM
Qwen2.5 72B
Flagship tier — similar footprint to Llama 70B; 48GB+ single card, a big Mac, or multi-GPU.
General LLM
Qwen3 235B-A22B (MoE)
Datacenter / multi-GPU or cloud. Mixture-of-experts: large total memory, but only ~22B params activate per token.
Run Qwen models inside a private AI Business OS
Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.
Explore the AI Business OS