Compatible models for BrainOutput Office Appliance (RTX 3060 12GB)

Open models graded for the BrainOutput Office Appliance (RTX 3060 12GB) (top config: 32GB, ~12GB AI memory), best fit first. Lower configurations run fewer of these.

CodeLlama 13B
CodeLlama · ~13B · 16K ctx · Llama Community License
Fits at Q4_K_M (~8GB) with ~2.6GB headroom — about 1 concurrent instance.
Q4_K_M · ~8GBRuns well
Gemma 3 12B
Gemma 3 · ~12B · 128K ctx · Gemma Terms of Use
Fits at Q4_K_M (~8GB) with ~2.6GB headroom — about 1 concurrent instance.
Q4_K_M · ~8GBRuns well
Mistral Nemo 12B
Mistral · ~12B · 128K ctx · Apache-2.0
Fits at Q4_K_M (~8GB) with ~2.6GB headroom — about 1 concurrent instance.
Q4_K_M · ~8GBRuns well
Gemma 2 9B
Gemma · ~9B · 8K ctx · Gemma Terms of Use
Fits at Q8_0 (~10GB) with ~0.6GB headroom — about 1 concurrent instance.
Q8_0 · ~10GBRuns well
Llama 3.1 8B
Llama · ~8B · 128K ctx · Llama Community License
Fits at Q8_0 (~9GB) with ~1.6GB headroom — about 1 concurrent instance.
Q8_0 · ~9GBRuns well
Qwen3 8B
Qwen · ~8B · 128K ctx · Apache-2.0
Fits at Q8_0 (~9GB) with ~1.6GB headroom — about 1 concurrent instance.
Q8_0 · ~9GBRuns well
Granite 3 8B
Granite · ~8B · 128K ctx · Apache-2.0
Fits at Q8_0 (~9GB) with ~1.6GB headroom — about 1 concurrent instance.
Q8_0 · ~9GBRuns well
DeepSeek-R1 Distill 8B
DeepSeek · ~8B · 128K ctx · MIT
Fits at Q8_0 (~9GB) with ~1.6GB headroom — about 1 concurrent instance.
Q8_0 · ~9GBRuns well
Qwen2.5 7B Instruct
Qwen2.5 · ~7.6B · 33K ctx · apache-2.0
Fits at Q8_0 (~8.4GB) with ~2.2GB headroom — about 1 concurrent instance.
Q8_0 · ~8.4GBRuns well
Qwen2.5 Coder 7B Instruct
Qwen2.5 · ~7.6B · 131K ctx · apache-2.0
Fits at Q8_0 (~8.4GB) with ~2.2GB headroom — about 1 concurrent instance.
Q8_0 · ~8.4GBRuns well
Qwen2.5 7B
Qwen · ~7B · 128K ctx · Apache-2.0
Fits at Q8_0 (~8GB) with ~2.6GB headroom — about 1 concurrent instance.
Q8_0 · ~8GBRuns well
Mistral 7B
Mistral · ~7B · 32K ctx · Apache-2.0
Fits at Q8_0 (~8GB) with ~2.6GB headroom — about 1 concurrent instance.
Q8_0 · ~8GBRuns well
Qwen2.5-Coder 7B
Qwen · ~7B · 128K ctx · Apache-2.0
Fits at Q8_0 (~8GB) with ~2.6GB headroom — about 1 concurrent instance.
Q8_0 · ~8GBRuns well
CodeLlama 7B
CodeLlama · ~7B · 16K ctx · Llama Community License
Fits at Q8_0 (~8GB) with ~2.6GB headroom — about 1 concurrent instance.
Q8_0 · ~8GBRuns well
StarCoder2 7B
StarCoder · ~7B · 16K ctx · BigCode OpenRAIL-M
Fits at Q8_0 (~8GB) with ~2.6GB headroom — about 1 concurrent instance.
Q8_0 · ~8GBRuns well
Gemma 3 4B
Gemma 3 · ~4B · 128K ctx · Gemma Terms of Use
Fits at FP16 (~8GB) with ~2.6GB headroom — about 1 concurrent instance.
FP16 · ~8GBRuns well
Phi-3.5 Mini (3.8B)
Phi · ~3.8B · 128K ctx · MIT
Fits at FP16 (~8GB) with ~2.6GB headroom — about 1 concurrent instance.
FP16 · ~8GBRuns well
Llama 3.2 3B
Llama · ~3B · 128K ctx · Llama Community License
Fits at FP16 (~7GB) with ~3.6GB headroom — about 1 concurrent instance.
FP16 · ~7GBRuns well
Qwen2.5 3B
Qwen · ~3B · 32K ctx · Qwen Research License
Fits at FP16 (~6GB) with ~4.6GB headroom — about 1 concurrent instance.
FP16 · ~6GBRuns well
StarCoder2 3B
StarCoder · ~3B · 16K ctx · BigCode OpenRAIL-M
Fits at FP16 (~6GB) with ~4.6GB headroom — about 1 concurrent instance.
FP16 · ~6GBRuns well
Gemma 2 2B
Gemma · ~2B · 8K ctx · Gemma Terms of Use
Fits at FP16 (~4GB) with ~6.6GB headroom — about 2 concurrent instances.
FP16 · ~4GBRuns well
Granite 3 2B
Granite · ~2B · 128K ctx · Apache-2.0
Fits at FP16 (~4GB) with ~6.6GB headroom — about 2 concurrent instances.
FP16 · ~4GBRuns well
SmolLM2 1.7B
SmolLM · ~1.7B · 8K ctx · Apache-2.0
Fits at FP16 (~3.4GB) with ~7.2GB headroom — about 3 concurrent instances.
FP16 · ~3.4GBRuns well
Qwen2.5 1.5B
Qwen · ~1.5B · 32K ctx · Apache-2.0
Fits at FP16 (~3GB) with ~7.6GB headroom — about 3 concurrent instances.
FP16 · ~3GBRuns well
DeepSeek-R1 Distill 1.5B
DeepSeek · ~1.5B · 128K ctx · MIT
Fits at FP16 (~4GB) with ~6.6GB headroom — about 2 concurrent instances.
FP16 · ~4GBRuns well
Qwen2.5-Coder 1.5B
Qwen · ~1.5B · 32K ctx · Apache-2.0
Fits at FP16 (~3GB) with ~7.6GB headroom — about 3 concurrent instances.
FP16 · ~3GBRuns well
Llama 3.2 1B
Llama · ~1B · 128K ctx · Llama Community License
Fits at FP16 (~3GB) with ~7.6GB headroom — about 3 concurrent instances.
FP16 · ~3GBRuns well
Qwen2.5 0.5B
Qwen · 32K ctx · Apache-2.0
Fits at FP16 (~1GB) with ~9.6GB headroom — about 10 concurrent instances.
FP16 · ~1GBRuns well
StarCoder2 15B
StarCoder · ~15B · 16K ctx · BigCode OpenRAIL-M
Fits at Q4_K_M (~10GB) but limited bandwidth makes token generation slow for a 15B model.
Q4_K_M · ~10GBRuns slowly
Qwen2.5 14B
Qwen · ~14B · 128K ctx · Apache-2.0
Fits at Q4_K_M (~10GB) but limited bandwidth makes token generation slow for a 14B model.
Q4_K_M · ~10GBRuns slowly
Qwen3 14B
Qwen · ~14B · 128K ctx · Apache-2.0
Fits at Q4_K_M (~10GB) but limited bandwidth makes token generation slow for a 14B model.
Q4_K_M · ~10GBRuns slowly
Phi-3 Medium (14B)
Phi · ~14B · 128K ctx · MIT
Fits at Q4_K_M (~9GB) but limited bandwidth makes token generation slow for a 14B model.
Q4_K_M · ~9GBRuns slowly
Phi-4 (14B)
Phi · ~14B · 16K ctx · MIT
Fits at Q4_K_M (~9GB) but limited bandwidth makes token generation slow for a 14B model.
Q4_K_M · ~9GBRuns slowly
DeepSeek-R1 Distill 14B
DeepSeek · ~14B · 128K ctx · MIT
Fits at Q4_K_M (~10GB) but limited bandwidth makes token generation slow for a 14B model.
Q4_K_M · ~10GBRuns slowly
Qwen2.5-Coder 14B
Qwen · ~14B · 128K ctx · Apache-2.0
Fits at Q4_K_M (~10GB) but limited bandwidth makes token generation slow for a 14B model.
Q4_K_M · ~10GBRuns slowly
DeepSeek-R1 671B (MoE)
DeepSeek · ~671B · 128K ctx · MIT
Even the smallest quantization (~400GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Llama 3.1 405B
Llama · ~405B · 128K ctx · Llama Community License
Even the smallest quantization (~230GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Qwen3 235B-A22B (MoE)
Qwen · ~235B · 128K ctx · Apache-2.0
Even the smallest quantization (~130GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Qwen2.5 72B
Qwen · ~72B · 128K ctx · Qwen License
Even the smallest quantization (~44GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Llama 3.1 70B
Llama · ~70B · 128K ctx · Llama Community License
Even the smallest quantization (~42GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Llama 3.3 70B
Llama · ~70B · 128K ctx · Llama Community License
Even the smallest quantization (~42GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
DeepSeek-R1 Distill Llama 70B
DeepSeek · ~70B · 128K ctx · MIT
Even the smallest quantization (~42GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Mixtral 8x7B (MoE)
Mistral · ~47B · 32K ctx · Apache-2.0
Even the smallest quantization (~28GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
CodeLlama 34B
CodeLlama · ~34B · 16K ctx · Llama Community License
Even the smallest quantization (~21GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Qwen2.5 32B
Qwen · ~32B · 128K ctx · Apache-2.0
Even the smallest quantization (~20GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Qwen3 32B
Qwen · ~32B · 128K ctx · Apache-2.0
Even the smallest quantization (~20GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
DeepSeek-R1 Distill 32B
DeepSeek · ~32B · 128K ctx · MIT
Even the smallest quantization (~20GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Qwen2.5-Coder 32B
Qwen · ~32B · 128K ctx · Apache-2.0
Even the smallest quantization (~20GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Gemma 2 27B
Gemma · ~27B · 8K ctx · Gemma Terms of Use
Even the smallest quantization (~17GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Gemma 3 27B
Gemma 3 · ~27B · 128K ctx · Gemma Terms of Use
Even the smallest quantization (~17GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Mistral Small 24B
Mistral · ~24B · 32K ctx · Apache-2.0
Even the smallest quantization (~14GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
DeepSeek-Coder V2 (class)
DeepSeek · ~16B · 128K ctx · DeepSeek License
Even the smallest quantization (~11GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended

All BrainOutput Office Appliance (RTX 3060 12GB) configurations →

Run these models on the BrainOutput Office Appliance (RTX 3060 12GB)

Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.

Explore the AI Business OS