Compatible models for Apple Mac mini (M4 Pro)
Open models graded for the Apple Mac mini (M4 Pro) (top config: 64GB, ~64GB AI memory), best fit first. Lower configurations run fewer of these.
- CodeLlama 13BCodeLlama · ~13B · 16K ctx · Llama Community License
Fits at FP16 (~26GB) with ~18.8GB headroom — about 1 concurrent instance.
FP16 · ~26GBRuns well - Gemma 3 12BGemma 3 · ~12B · 128K ctx · Gemma Terms of Use
Fits at FP16 (~24GB) with ~20.8GB headroom — about 1 concurrent instance.
FP16 · ~24GBRuns well - Mistral Nemo 12BMistral · ~12B · 128K ctx · Apache-2.0
Fits at FP16 (~24GB) with ~20.8GB headroom — about 1 concurrent instance.
FP16 · ~24GBRuns well - Gemma 2 9BGemma · ~9B · 8K ctx · Gemma Terms of Use
Fits at FP16 (~19GB) with ~25.8GB headroom — about 2 concurrent instances.
FP16 · ~19GBRuns well - Llama 3.1 8BLlama · ~8B · 128K ctx · Llama Community License
Fits at FP16 (~17GB) with ~27.8GB headroom — about 2 concurrent instances.
FP16 · ~17GBRuns well - Qwen3 8BQwen · ~8B · 128K ctx · Apache-2.0
Fits at FP16 (~17GB) with ~27.8GB headroom — about 2 concurrent instances.
FP16 · ~17GBRuns well - Granite 3 8BGranite · ~8B · 128K ctx · Apache-2.0
Fits at FP16 (~17GB) with ~27.8GB headroom — about 2 concurrent instances.
FP16 · ~17GBRuns well - DeepSeek-R1 Distill 8BDeepSeek · ~8B · 128K ctx · MIT
Fits at FP16 (~17GB) with ~27.8GB headroom — about 2 concurrent instances.
FP16 · ~17GBRuns well - Qwen2.5 7B InstructQwen2.5 · ~7.6B · 33K ctx · apache-2.0
Fits at FP16 (~15.2GB) with ~29.6GB headroom — about 2 concurrent instances.
FP16 · ~15.2GBRuns well - Qwen2.5 Coder 7B InstructQwen2.5 · ~7.6B · 131K ctx · apache-2.0
Fits at FP16 (~15.2GB) with ~29.6GB headroom — about 2 concurrent instances.
FP16 · ~15.2GBRuns well - Qwen2.5 7BQwen · ~7B · 128K ctx · Apache-2.0
Fits at FP16 (~15GB) with ~29.8GB headroom — about 2 concurrent instances.
FP16 · ~15GBRuns well - Mistral 7BMistral · ~7B · 32K ctx · Apache-2.0
Fits at FP16 (~15GB) with ~29.8GB headroom — about 2 concurrent instances.
FP16 · ~15GBRuns well - Qwen2.5-Coder 7BQwen · ~7B · 128K ctx · Apache-2.0
Fits at FP16 (~15GB) with ~29.8GB headroom — about 2 concurrent instances.
FP16 · ~15GBRuns well - CodeLlama 7BCodeLlama · ~7B · 16K ctx · Llama Community License
Fits at FP16 (~14GB) with ~30.8GB headroom — about 3 concurrent instances.
FP16 · ~14GBRuns well - StarCoder2 7BStarCoder · ~7B · 16K ctx · BigCode OpenRAIL-M
Fits at FP16 (~14GB) with ~30.8GB headroom — about 3 concurrent instances.
FP16 · ~14GBRuns well - Gemma 3 4BGemma 3 · ~4B · 128K ctx · Gemma Terms of Use
Fits at FP16 (~8GB) with ~36.8GB headroom — about 5 concurrent instances.
FP16 · ~8GBRuns well - Phi-3.5 Mini (3.8B)Phi · ~3.8B · 128K ctx · MIT
Fits at FP16 (~8GB) with ~36.8GB headroom — about 5 concurrent instances.
FP16 · ~8GBRuns well - Llama 3.2 3BLlama · ~3B · 128K ctx · Llama Community License
Fits at FP16 (~7GB) with ~37.8GB headroom — about 6 concurrent instances.
FP16 · ~7GBRuns well - Qwen2.5 3BQwen · ~3B · 32K ctx · Qwen Research License
Fits at FP16 (~6GB) with ~38.8GB headroom — about 7 concurrent instances.
FP16 · ~6GBRuns well - StarCoder2 3BStarCoder · ~3B · 16K ctx · BigCode OpenRAIL-M
Fits at FP16 (~6GB) with ~38.8GB headroom — about 7 concurrent instances.
FP16 · ~6GBRuns well - Gemma 2 2BGemma · ~2B · 8K ctx · Gemma Terms of Use
Fits at FP16 (~4GB) with ~40.8GB headroom — about 11 concurrent instances.
FP16 · ~4GBRuns well - Granite 3 2BGranite · ~2B · 128K ctx · Apache-2.0
Fits at FP16 (~4GB) with ~40.8GB headroom — about 11 concurrent instances.
FP16 · ~4GBRuns well - SmolLM2 1.7BSmolLM · ~1.7B · 8K ctx · Apache-2.0
Fits at FP16 (~3.4GB) with ~41.4GB headroom — about 13 concurrent instances.
FP16 · ~3.4GBRuns well - Qwen2.5 1.5BQwen · ~1.5B · 32K ctx · Apache-2.0
Fits at FP16 (~3GB) with ~41.8GB headroom — about 14 concurrent instances.
FP16 · ~3GBRuns well - DeepSeek-R1 Distill 1.5BDeepSeek · ~1.5B · 128K ctx · MIT
Fits at FP16 (~4GB) with ~40.8GB headroom — about 11 concurrent instances.
FP16 · ~4GBRuns well - Qwen2.5-Coder 1.5BQwen · ~1.5B · 32K ctx · Apache-2.0
Fits at FP16 (~3GB) with ~41.8GB headroom — about 14 concurrent instances.
FP16 · ~3GBRuns well - Llama 3.2 1BLlama · ~1B · 128K ctx · Llama Community License
Fits at FP16 (~3GB) with ~41.8GB headroom — about 14 concurrent instances.
FP16 · ~3GBRuns well - Qwen2.5 0.5BQwen · 32K ctx · Apache-2.0
Fits at FP16 (~1GB) with ~43.8GB headroom — about 44 concurrent instances.
FP16 · ~1GBRuns well - Qwen2.5 72BQwen · ~72B · 128K ctx · Qwen License
Fits at Q4_K_M (~44GB) but limited bandwidth makes token generation slow for a 72B model.
Q4_K_M · ~44GBRuns slowly - Llama 3.1 70BLlama · ~70B · 128K ctx · Llama Community License
Fits at Q4_K_M (~42GB) but limited bandwidth makes token generation slow for a 70B model.
Q4_K_M · ~42GBRuns slowly - Llama 3.3 70BLlama · ~70B · 128K ctx · Llama Community License
Fits at Q4_K_M (~42GB) but limited bandwidth makes token generation slow for a 70B model.
Q4_K_M · ~42GBRuns slowly - DeepSeek-R1 Distill Llama 70BDeepSeek · ~70B · 128K ctx · MIT
Fits at Q4_K_M (~42GB) but limited bandwidth makes token generation slow for a 70B model.
Q4_K_M · ~42GBRuns slowly - Mixtral 8x7B (MoE)Mistral · ~47B · 32K ctx · Apache-2.0
Fits at Q4_K_M (~28GB) but limited bandwidth makes token generation slow for a 47B model.
Q4_K_M · ~28GBRuns slowly - CodeLlama 34BCodeLlama · ~34B · 16K ctx · Llama Community License
Fits at Q8_0 (~37GB) but limited bandwidth makes token generation slow for a 34B model.
Q8_0 · ~37GBRuns slowly - Qwen2.5 32BQwen · ~32B · 128K ctx · Apache-2.0
Fits at Q8_0 (~34GB) but limited bandwidth makes token generation slow for a 32B model.
Q8_0 · ~34GBRuns slowly - Qwen3 32BQwen · ~32B · 128K ctx · Apache-2.0
Fits at Q8_0 (~34GB) but limited bandwidth makes token generation slow for a 32B model.
Q8_0 · ~34GBRuns slowly - DeepSeek-R1 Distill 32BDeepSeek · ~32B · 128K ctx · MIT
Fits at Q8_0 (~34GB) but limited bandwidth makes token generation slow for a 32B model.
Q8_0 · ~34GBRuns slowly - Qwen2.5-Coder 32BQwen · ~32B · 128K ctx · Apache-2.0
Fits at Q8_0 (~34GB) but limited bandwidth makes token generation slow for a 32B model.
Q8_0 · ~34GBRuns slowly - Gemma 2 27BGemma · ~27B · 8K ctx · Gemma Terms of Use
Fits at Q8_0 (~29GB) but limited bandwidth makes token generation slow for a 27B model.
Q8_0 · ~29GBRuns slowly - Gemma 3 27BGemma 3 · ~27B · 128K ctx · Gemma Terms of Use
Fits at Q8_0 (~29GB) but limited bandwidth makes token generation slow for a 27B model.
Q8_0 · ~29GBRuns slowly - Mistral Small 24BMistral · ~24B · 32K ctx · Apache-2.0
Fits at Q8_0 (~25GB) but limited bandwidth makes token generation slow for a 24B model.
Q8_0 · ~25GBRuns slowly - DeepSeek-Coder V2 (class)DeepSeek · ~16B · 128K ctx · DeepSeek License
Fits at FP16 (~33GB) but limited bandwidth makes token generation slow for a 16B model.
FP16 · ~33GBRuns slowly - StarCoder2 15BStarCoder · ~15B · 16K ctx · BigCode OpenRAIL-M
Fits at FP16 (~30GB) but limited bandwidth makes token generation slow for a 15B model.
FP16 · ~30GBRuns slowly - Qwen2.5 14BQwen · ~14B · 128K ctx · Apache-2.0
Fits at FP16 (~30GB) but limited bandwidth makes token generation slow for a 14B model.
FP16 · ~30GBRuns slowly - Qwen3 14BQwen · ~14B · 128K ctx · Apache-2.0
Fits at FP16 (~30GB) but limited bandwidth makes token generation slow for a 14B model.
FP16 · ~30GBRuns slowly - Phi-3 Medium (14B)Phi · ~14B · 128K ctx · MIT
Fits at FP16 (~28GB) but limited bandwidth makes token generation slow for a 14B model.
FP16 · ~28GBRuns slowly - Phi-4 (14B)Phi · ~14B · 16K ctx · MIT
Fits at FP16 (~28GB) but limited bandwidth makes token generation slow for a 14B model.
FP16 · ~28GBRuns slowly - DeepSeek-R1 Distill 14BDeepSeek · ~14B · 128K ctx · MIT
Fits at FP16 (~30GB) but limited bandwidth makes token generation slow for a 14B model.
FP16 · ~30GBRuns slowly - Qwen2.5-Coder 14BQwen · ~14B · 128K ctx · Apache-2.0
Fits at FP16 (~30GB) but limited bandwidth makes token generation slow for a 14B model.
FP16 · ~30GBRuns slowly - DeepSeek-R1 671B (MoE)DeepSeek · ~671B · 128K ctx · MIT
Even the smallest quantization (~400GB) exceeds usable memory (~44.8GB). Choose a smaller model or step up the hardware.
Not recommended - Llama 3.1 405BLlama · ~405B · 128K ctx · Llama Community License
Even the smallest quantization (~230GB) exceeds usable memory (~44.8GB). Choose a smaller model or step up the hardware.
Not recommended - Qwen3 235B-A22B (MoE)Qwen · ~235B · 128K ctx · Apache-2.0
Even the smallest quantization (~130GB) exceeds usable memory (~44.8GB). Choose a smaller model or step up the hardware.
Not recommended
All Apple Mac mini (M4 Pro) configurations →
Run these models on the Apple Mac mini (M4 Pro)
Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.