Compatible devices for Mixtral 8x7B (MoE)
Every hardware profile in our catalog graded for Mixtral 8x7B (MoE), best fit first. For sellable vendor configurations, see the device catalog.
- NVIDIA B200 (placeholder)NVIDIA · Datacenter GPUs
Fits at FP16 (~90GB) with ~79GB headroom — about 1 concurrent instance.
FP16 · ~90GBRuns well - Supermicro 8x H100 SuperServerSupermicro · AI Servers
Fits at FP16 (~90GB) with ~473.2GB headroom — about 6 concurrent instances.
FP16 · ~90GBRuns well - Dell PowerEdge XE9680Dell · AI Servers
Fits at FP16 (~90GB) with ~473.2GB headroom — about 6 concurrent instances.
FP16 · ~90GBRuns well - AMD Instinct MI300XAMD · Datacenter GPUs
Fits at FP16 (~90GB) with ~79GB headroom — about 1 concurrent instance.
FP16 · ~90GBRuns well - Cloud B200 (Blackwell profile, to verify)Cloud · Cloud GPU Profiles
Fits at FP16 (~90GB) with ~68.4GB headroom — about 1 concurrent instance.
FP16 · ~90GBRuns well - NVIDIA H200 (141GB)NVIDIA · Datacenter GPUs
Fits at FP16 (~90GB) with ~34.1GB headroom — about 1 concurrent instance.
FP16 · ~90GBRuns well - Cloud H200 141GB (profile)Cloud · Cloud GPU Profiles
Fits at FP16 (~90GB) with ~34.1GB headroom — about 1 concurrent instance.
FP16 · ~90GBRuns well - NVIDIA H100 (80GB)NVIDIA · Datacenter GPUs
Fits at Q8_0 (~50GB) with ~20.4GB headroom — about 1 concurrent instance.
Q8_0 · ~50GBRuns well - Cloud H100 80GB (profile)Cloud · Cloud GPU Profiles
Fits at Q8_0 (~50GB) with ~20.4GB headroom — about 1 concurrent instance.
Q8_0 · ~50GBRuns well - NVIDIA RTX PRO 6000 BlackwellNVIDIA · Professional GPUs
Fits at Q8_0 (~50GB) with ~34.5GB headroom — about 1 concurrent instance.
Q8_0 · ~50GBRuns well - HP Z8 Fury G5 WorkstationHP · AI Workstations
Fits at Q8_0 (~50GB) with ~34.5GB headroom — about 1 concurrent instance.
Q8_0 · ~50GBRuns well - Lenovo ThinkStation PX WorkstationLenovo · AI Workstations
Fits at Q8_0 (~50GB) with ~34.5GB headroom — about 1 concurrent instance.
Q8_0 · ~50GBRuns well - Supermicro AI WorkstationSupermicro · AI Workstations
Fits at Q8_0 (~50GB) with ~34.5GB headroom — about 1 concurrent instance.
Q8_0 · ~50GBRuns well - Quad RTX 4090 AI Workstation (reference profile)Reference · AI Workstations
Fits at Q8_0 (~50GB) with ~34.5GB headroom — about 1 concurrent instance.
Q8_0 · ~50GBRuns well - Dell Precision 7960 AI WorkstationDell · AI Workstations
Fits at Q4_K_M (~28GB) with ~14.2GB headroom — about 1 concurrent instance.
Q4_K_M · ~28GBRuns well - NVIDIA A100 80GBNVIDIA · Datacenter GPUs
Fits at Q8_0 (~50GB) with ~20.4GB headroom — about 1 concurrent instance.
Q8_0 · ~50GBRuns well - Cloud A100 80GB (profile)Cloud · Cloud GPU Profiles
Fits at Q8_0 (~50GB) with ~20.4GB headroom — about 1 concurrent instance.
Q8_0 · ~50GBRuns well - Coding Agent Workstation (reference profile)Reference · AI Workstations
Fits at Q4_K_M (~28GB) with ~14.2GB headroom — about 1 concurrent instance.
Q4_K_M · ~28GBRuns well - NVIDIA L40SNVIDIA · Datacenter GPUs
Fits at Q4_K_M (~28GB) with ~14.2GB headroom — about 1 concurrent instance.
Q4_K_M · ~28GBRuns well - Cloud L40S 48GB (profile)Cloud · Cloud GPU Profiles
Fits at Q4_K_M (~28GB) with ~14.2GB headroom — about 1 concurrent instance.
Q4_K_M · ~28GBRuns well - Law Firm Private AI Box (reference profile)Reference · AI Appliances
Fits at Q4_K_M (~28GB) with ~14.2GB headroom — about 1 concurrent instance.
Q4_K_M · ~28GBRuns well - NVIDIA RTX 6000 Ada GenerationNVIDIA · Professional GPUs
Fits at Q4_K_M (~28GB) with ~14.2GB headroom — about 1 concurrent instance.
Q4_K_M · ~28GBRuns well - AMD Radeon PRO W7900AMD · Professional GPUs
Fits at Q4_K_M (~28GB) with ~14.2GB headroom — about 1 concurrent instance.
Q4_K_M · ~28GBRuns well - NVIDIA RTX A6000NVIDIA · Professional GPUs
Fits at Q4_K_M (~28GB) with ~14.2GB headroom — about 1 concurrent instance.
Q4_K_M · ~28GBRuns well - Apple Mac Studio (M2 Ultra)Apple · Apple Silicon
Fits at FP16 (~90GB) but limited bandwidth makes token generation slow for a 47B model.
FP16 · ~90GBRuns slowly - Apple Mac Studio (M4 Ultra class, to verify)Apple · Apple Silicon
Fits at FP16 (~90GB) but limited bandwidth makes token generation slow for a 47B model.
FP16 · ~90GBRuns slowly - Apple Mac Studio (M4 Max)Apple · Apple Silicon
Fits at Q8_0 (~50GB) but limited bandwidth makes token generation slow for a 47B model.
Q8_0 · ~50GBRuns slowly - NVIDIA GeForce RTX 5090 (placeholder)NVIDIA · Consumer GPUs
Fits only at Q4_K_M with little headroom (~0.2GB) — usable but tight; consider more memory.
Q4_K_M · ~28GBRuns slowly - NVIDIA DGX Spark (GB10 class)NVIDIA · AI Appliances
Fits at Q8_0 (~50GB) but limited bandwidth makes token generation slow for a 47B model.
Q8_0 · ~50GBRuns slowly - AMD Ryzen AI Max Mini PC (Strix Halo class)AMD · Mini PCs
Fits at Q8_0 (~50GB) but limited bandwidth makes token generation slow for a 47B model.
Q8_0 · ~50GBRuns slowly - Apple Mac mini (M4 Pro)Apple · Apple Silicon
Fits at Q4_K_M (~28GB) but limited bandwidth makes token generation slow for a 47B model.
Q4_K_M · ~28GBRuns slowly - Accounting / Odoo AI Box (reference profile)Reference · AI Appliances
Even the smallest quantization (~28GB) exceeds usable memory (~21.1GB). Choose a smaller model or step up the hardware.
Not recommended - Small Business Mini PC (reference profile)Reference · Mini PCs
Even the smallest quantization (~28GB) exceeds usable memory (~22.4GB). Choose a smaller model or step up the hardware.
Not recommended - NVIDIA GeForce RTX 4090NVIDIA · Consumer GPUs
Even the smallest quantization (~28GB) exceeds usable memory (~21.1GB). Choose a smaller model or step up the hardware.
Not recommended - Apple Mac mini (M4)Apple · Apple Silicon
Even the smallest quantization (~28GB) exceeds usable memory (~22.4GB). Choose a smaller model or step up the hardware.
Not recommended - AMD Radeon RX 7900 XTXAMD · Consumer GPUs
Even the smallest quantization (~28GB) exceeds usable memory (~21.1GB). Choose a smaller model or step up the hardware.
Not recommended - NVIDIA GeForce RTX 3090NVIDIA · Consumer GPUs
Even the smallest quantization (~28GB) exceeds usable memory (~21.1GB). Choose a smaller model or step up the hardware.
Not recommended - Dual RTX 3060 Local Server (reference profile)Reference · AI Servers
Even the smallest quantization (~28GB) exceeds usable memory (~21.1GB). Choose a smaller model or step up the hardware.
Not recommended - Local Office AI Appliance (reference profile)Reference · AI Appliances
Even the smallest quantization (~28GB) exceeds usable memory (~14.1GB). Choose a smaller model or step up the hardware.
Not recommended - Hotel AI Automation Box (reference profile)Reference · AI Appliances
Even the smallest quantization (~28GB) exceeds usable memory (~14.1GB). Choose a smaller model or step up the hardware.
Not recommended - Intel Arc A770 16GBIntel · Consumer GPUs
Even the smallest quantization (~28GB) exceeds usable memory (~14.1GB). Choose a smaller model or step up the hardware.
Not recommended - Intel Arc B580 12GBIntel · Consumer GPUs
Even the smallest quantization (~28GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended - NVIDIA GeForce RTX 3060 12GBNVIDIA · Consumer GPUs
Even the smallest quantization (~28GB) exceeds usable memory (~10.6GB). Choose a smaller model or step up the hardware.
Not recommended
Run Mixtral 8x7B (MoE) privately
Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.
Explore the AI Business OS