LLaVA·3 sizes·Vision / Multimodal
LLaVA models: sizes & hardware to run them
The LLaVA family spans 3 sizes from 7B to 13B. Each size maps to a different hardware tier — below is the approximate memory each needs at 4-bit and the device we’d start with for a private local deployment.
Vision
Sizes & hardware
| Model | Params | Context | ~VRAM @ 4-bit | Minimum device | Recommended |
|---|---|---|---|---|---|
| LLaVA 7B (vision) | 7B | 4K | ~6GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
| LLaVA-Llama3 8B (vision) | 8B | 8K | ~6.5GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
| LLaVA 13B (vision) | 13B | 4K | ~9GB | NVIDIA GeForce RTX 3060 12GB | NVIDIA B200 (placeholder) |
Memory figures are approximate working-set estimates (weights + KV cache at modest context); treat as ±. Device picks come from our compatibility engine, best on-prem fit first.
Open each size
Vision / Multimodal
LLaVA 7B (vision)
8GB+ GPUs at 4-bit, plus extra headroom for the vision encoder. The classic open VLM.
Vision / Multimodal
LLaVA-Llama3 8B (vision)
8GB+ GPUs at 4-bit, plus vision-encoder headroom. A LLaVA build on a Llama 3 backbone.
Vision / Multimodal
LLaVA 13B (vision)
16GB+ GPUs at 4-bit, plus headroom for the vision encoder. The larger LLaVA for better quality.
Run LLaVA models inside a private AI Business OS
Run your own AI agents on hardware you control — private by design, no per-seat data leaving your premises. BrainOutput helps you pick the right machine and turn it into a working AI Business OS.
Explore the AI Business OS