Alibaba·16 sizes·General LLM / Coding LLM / Vision / Multimodal

Qwen models: sizes & hardware to run them

The Qwen family spans 16 sizes from 0.5B to 235B. Each size maps to a different hardware tier — below is the approximate memory each needs at 4-bit and the device we’d start with for a private local deployment.

ToolsReasoningVisionCodeMultilingualLong context

Sizes & hardware

Model	Params	Context	~VRAM @ 4-bit	Minimum device	Recommended
Qwen2.5 0.5B	0.5B	32K	~0.4GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen2.5 1.5B	1.5B	32K	~1GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen2.5-Coder 1.5B	1.5B	32K	~1GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen2.5 3B	3B	32K	~2.2GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen2.5 7B	7B	128K	~5.5GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen2.5-Coder 7B	7B	128K	~5.5GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen2-VL 7B (vision)	7B	32K	~7GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen3 8B	8B	128K	~6GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen2.5 14B	14B	128K	~10GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen3 14B	14B	128K	~10GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen2.5-Coder 14B	14B	128K	~10GB	NVIDIA GeForce RTX 3060 12GB	NVIDIA B200 (placeholder)
Qwen2.5 32B	32B	128K	~20GB	NVIDIA GeForce RTX 3090	NVIDIA B200 (placeholder)
Qwen3 32B	32B	128K	~20GB	NVIDIA GeForce RTX 3090	NVIDIA B200 (placeholder)
Qwen2.5-Coder 32B	32B	128K	~20GB	NVIDIA GeForce RTX 3090	NVIDIA B200 (placeholder)
Qwen2.5 72B	72B	128K	~44GB	Apple Mac mini (M4 Pro)	NVIDIA B200 (placeholder)
Qwen3 235B-A22B (MoE)	235B (≈22B active)	128K	~130GB	NVIDIA B200 (placeholder)	NVIDIA B200 (placeholder)

Memory figures are approximate working-set estimates (weights + KV cache at modest context); treat as ±. Device picks come from our compatibility engine, best on-prem fit first.

Open each size

General LLM

Qwen2.5 0.5B

Runs on virtually any hardware, including CPUs and microcontrollers-class devices. The smallest of the Qwen2.5 line.

General LLM

Qwen2.5 1.5B

Runs comfortably on a CPU or any small GPU. A practical edge size for light high-volume work.

Coding LLM

Qwen2.5-Coder 1.5B

Runs on a CPU or any small GPU. The tiny coder for fast, private in-editor completion.

General LLM

Qwen2.5 3B

Comfortable on any 8GB GPU, a Mac mini, or a small mini PC. A capable small assistant for one office.

General LLM

Qwen2.5 7B

8GB+ GPUs handle it at 4-bit; great for multilingual and tool-using agents on modest hardware.

Coding LLM

Qwen2.5-Coder 7B

8GB+ GPUs at 4-bit. Ideal for responsive in-editor completion on modest hardware.

Vision / Multimodal

Qwen2-VL 7B (vision)

Vision models carry extra encoder overhead; budget a 16GB+ GPU for comfortable use.

General LLM

Qwen3 8B

8GB+ GPUs at 4-bit. A strong, current small generalist with optional step-by-step reasoning.

General LLM

Qwen2.5 14B

Fits comfortably on 16GB+ cards at 4-bit; a capable everyday agent model for a small team.

General LLM

Qwen3 14B

16GB+ cards at 4-bit. A current mid-size pick when you want better reasoning than a 7-8B model.

Coding LLM

Qwen2.5-Coder 14B

16GB+ GPUs at 4-bit. A strong balance of coding quality and footprint for a developer workstation.

General LLM

Qwen2.5 32B

A 24GB card (RTX 3090/4090) or 32GB+ Mac runs it well at 4-bit. The sweet spot for capable single-box agents.

General LLM

Qwen3 32B

A 24GB card or 32GB+ Mac at 4-bit. A current high-quality single-box model with reasoning.

Coding LLM

Qwen2.5-Coder 32B

A 24GB card (RTX 3090/4090) or 32GB+ Mac at 4-bit. The strongest open coder you can run on one consumer card.

General LLM

Qwen2.5 72B

Flagship tier — similar footprint to Llama 70B; 48GB+ single card, a big Mac, or multi-GPU.

General LLM

Qwen3 235B-A22B (MoE)

Datacenter / multi-GPU or cloud. Mixture-of-experts: large total memory, but only ~22B params activate per token.

Run Qwen models inside a private AI Business OS

Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.

Explore the AI Business OS