Compare Local AI Models

Open chat, reasoning and coding models side by side — size, context window, 4-bit memory, deployment and license. Bigger isn’t always better: the right model is the largest one that runs comfortably on hardware you’re willing to own.

Cheap / single office

A 7–8B model (Llama 3.1 8B, Qwen3 8B, Mistral 7B) on a 12GB GPU or mini PC. Great first assistant.

Balanced / small team

A 14–32B model (Qwen2.5 14–32B, Mistral Small 24B) on a 16–24GB GPU. Real RAG, coding and automation.

Powerful / department

A 70B model (Llama 3.3 70B, Qwen2.5 72B) on a 48GB card, big Mac, or multi-GPU. Near-frontier quality, private.

Frontier / hybrid

MoE giants or hosted APIs for the hardest jobs — burst to these from a local base in a hybrid setup.

Model	Type	Params	Context	VRAM @ Q4	Deployment	License
Qwen2.5 0.5B	General LLM	~0.5B	32K	~0.4GB	local	Apache-2.0
Llama 3.2 1B	General LLM	~1B	128K	~1GB	local	Llama Community License
Qwen2.5 1.5B	General LLM	~1.5B	32K	~1GB	local	Apache-2.0
DeepSeek-R1 Distill 1.5B	Reasoning	~1.5B	128K	~1.5GB	local	MIT
Qwen2.5-Coder 1.5B	Coding LLM	~1.5B	32K	~1GB	local	Apache-2.0
SmolLM2 1.7B	General LLM	~1.7B	8K	~1.1GB	local	Apache-2.0
Gemma 2 2B	General LLM	~2B	8K	~1.6GB	local	Gemma Terms of Use
Granite 3 2B	General LLM	~2B	128K	~1.6GB	local	Apache-2.0
Llama 3.2 3B	General LLM	~3B	128K	~2.5GB	local	Llama Community License
Qwen2.5 3B	General LLM	~3B	32K	~2.2GB	local	Qwen Research License
StarCoder2 3B	Coding LLM	~3B	16K	~2.2GB	local	BigCode OpenRAIL-M
Phi-3.5 Mini (3.8B)	General LLM	~3.8B	128K	~2.5GB	local	MIT
Gemma 3 4B	General LLM	~4B	128K	~3GB	local	Gemma Terms of Use
Qwen2.5 7B	General LLM	~7B	128K	~5.5GB	local	Apache-2.0
Mistral 7B	General LLM	~7B	32K	~5GB	local	Apache-2.0
Qwen2.5-Coder 7B	Coding LLM	~7B	128K	~5.5GB	local	Apache-2.0
CodeLlama 7B	Coding LLM	~7B	16K	~5GB	local	Llama Community License
StarCoder2 7B	Coding LLM	~7B	16K	~5GB	local	BigCode OpenRAIL-M
Qwen2.5 7B Instruct	General LLM	~7.6B	33K	~4.9GB	local	apache-2.0
Qwen2.5 Coder 7B Instruct	Coding LLM	~7.6B	131K	~4.9GB	local	apache-2.0
Llama 3.1 8B	General LLM	~8B	128K	~6GB	local	Llama Community License
Qwen3 8B	General LLM	~8B	128K	~6GB	local	Apache-2.0
Granite 3 8B	General LLM	~8B	128K	~6GB	local	Apache-2.0
DeepSeek-R1 Distill 8B	Reasoning	~8B	128K	~6GB	local	MIT
Gemma 2 9B	General LLM	~9B	8K	~7GB	local	Gemma Terms of Use
Gemma 3 12B	General LLM	~12B	128K	~8GB	local	Gemma Terms of Use
Mistral Nemo 12B	General LLM	~12B	128K	~8GB	local	Apache-2.0
CodeLlama 13B	Coding LLM	~13B	16K	~8GB	local	Llama Community License
Qwen2.5 14B	General LLM	~14B	128K	~10GB	local	Apache-2.0
Qwen3 14B	General LLM	~14B	128K	~10GB	local	Apache-2.0
Phi-3 Medium (14B)	General LLM	~14B	128K	~9GB	local	MIT
Phi-4 (14B)	General LLM	~14B	16K	~9GB	local	MIT
DeepSeek-R1 Distill 14B	Reasoning	~14B	128K	~10GB	local	MIT
Qwen2.5-Coder 14B	Coding LLM	~14B	128K	~10GB	local	Apache-2.0
StarCoder2 15B	Coding LLM	~15B	16K	~10GB	local	BigCode OpenRAIL-M
DeepSeek-Coder V2 (class)	Coding LLM	~16B	128K	~11GB	local	DeepSeek License
Mistral Small 24B	General LLM	~24B	32K	~14GB	local	Apache-2.0
Gemma 2 27B	General LLM	~27B	8K	~17GB	local	Gemma Terms of Use
Gemma 3 27B	General LLM	~27B	128K	~17GB	hybrid	Gemma Terms of Use
Qwen2.5 32B	General LLM	~32B	128K	~20GB	hybrid	Apache-2.0
Qwen3 32B	General LLM	~32B	128K	~20GB	hybrid	Apache-2.0
DeepSeek-R1 Distill 32B	Reasoning	~32B	128K	~20GB	hybrid	MIT
Qwen2.5-Coder 32B	Coding LLM	~32B	128K	~20GB	hybrid	Apache-2.0
CodeLlama 34B	Coding LLM	~34B	16K	~21GB	hybrid	Llama Community License
Mixtral 8x7B (MoE)	General LLM	~47B	32K	~28GB	hybrid	Apache-2.0
Llama 3.1 70B	General LLM	~70B	128K	~42GB	hybrid	Llama Community License
Llama 3.3 70B	General LLM	~70B	128K	~42GB	hybrid	Llama Community License
DeepSeek-R1 Distill Llama 70B	Reasoning	~70B	128K	~42GB	hybrid	MIT
Qwen2.5 72B	General LLM	~72B	128K	~44GB	hybrid	Qwen License
Qwen3 235B-A22B (MoE)	General LLM	~235B	128K	~130GB	cloud	Apache-2.0
Llama 3.1 405B	General LLM	~405B	128K	~230GB	cloud	Llama Community License
DeepSeek-R1 671B (MoE)	Reasoning	~671B	128K	~400GB	cloud	MIT

Compare hardware for LLMs →Full hardware requirements →Find the right model →

Pick a model, then make it a business agent

Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.

Explore the AI Business OS