Private Document & RAG Agent
The document agent reads contracts, reports, policies and wikis and answers questions with citations, using retrieval-augmented generation over a private knowledge base rather than its training data.
Because retrieval and the model both run on hardware you control, source material never leaves your premises — and a capable mid-size model plus an embedding model is usually all it takes.
What it does
- ▸Answers from your documents, contracts and wikis with citations
- ▸Retrieval over a private knowledge base (RAG)
- ▸Summarization and cross-document Q&A
- ▸Keeps source material on infrastructure you control
How it works
The agent wraps an open model with retrieval over your data, scoped permissions, typed tools, confirmations and an audit log — the AI Business OS layer that makes it safe to deploy.
Fit is driven by each machine’s rag capability score.
Models that power it
All models →Open models in the library that suit this role: 40. A few, smallest first:
all-MiniLM (class)
tiny · very fast
Nomic Embed Text (class)
fast retrieval · lightweight
Snowflake Arctic Embed (class)
quality retrieval · RAG
mxbai-embed-large (class)
quality retrieval · RAG
BGE-M3 Embeddings (class)
multilingual retrieval · long documents
DeepSeek-R1 Distill 1.5B
tiny reasoning · edge
Hardware it runs on
All hardware →Machines that can host this agent today, scored for real local-AI workloads — cheapest strong fit first.
Apple Mac mini (M4 Pro)
More memory bandwidth and up to 64GB unified memory make this a surprisingly capable small-form-factor local-AI box.
- Memory
- 64 GB unified
- Architecture
- Apple M4 Pro
NVIDIA L40S
A versatile 48GB datacenter card for inference and graphics — a popular, cost-effective cloud and on-prem serving option.
- Memory
- 48 GB
- Architecture
- Ada Lovelace
Coding Agent Workstation (reference profile)
A workstation tuned for local coding agents: ~48GB across two 24GB cards runs strong 32B coder models and serves a small engineering team privately.
- Memory
- 48 GB
- Architecture
- Ada Lovelace
Run it private, in your cloud, or hybrid
Keep this agent on hardware you own for privacy and predictable cost, run it on cloud GPUs in your own account for bursts and the largest models, or do both.
Frequently asked questions
What is the Document / RAG agent?+
The document agent reads contracts, reports, policies and wikis and answers questions with citations, using retrieval-augmented generation over a private knowledge base rather than its training data.
Can the Document / RAG agent run privately on my own hardware?+
Yes. It runs on open-weight models you self-host on a private box, on-prem server or your own cloud account, so data stays on infrastructure you control. You can also run hybrid — local by default, bursting to the cloud for the largest models.
Which models power the Document / RAG agent?+
It works with open models such as all-MiniLM (class), Nomic Embed Text (class), Snowflake Arctic Embed (class). The right size depends on quality needs and the hardware you run it on — see the model library for VRAM by quantization.
What hardware does the Document / RAG agent need?+
It typically maps to the — tier. A machine like the Apple Mac mini (M4 Pro) strongly fits this role; lighter or heavier hardware shifts how many concurrent requests and how large a model you can run.
Hire another agent
Put the Document / RAG agent to work with BrainOutput
Deploy the Document / RAG agent privately, connect your tools, and grow into a full AI team on infrastructure you control.
Build my AI team