Private Document & RAG Agent

The document agent reads contracts, reports, policies and wikis and answers questions with citations, using retrieval-augmented generation over a private knowledge base rather than its training data.

Because retrieval and the model both run on hardware you control, source material never leaves your premises — and a capable mid-size model plus an embedding model is usually all it takes.

Build my AI team See the use case

What it does

▸Answers from your documents, contracts and wikis with citations
▸Retrieval over a private knowledge base (RAG)
▸Summarization and cross-document Q&A
▸Keeps source material on infrastructure you control

How it works

The agent wraps an open model with retrieval over your data, scoped permissions, typed tools, confirmations and an audit log — the AI Business OS layer that makes it safe to deploy.

Fit is driven by each machine’s rag capability score.

Models that power it

All models →

Open models in the library that suit this role: 40. A few, smallest first:

Sentence-Transformers

all-MiniLM (class)

tiny · very fast

0.023B params0.5K context

Nomic

Nomic Embed Text (class)

fast retrieval · lightweight

0.14B params8K context

Snowflake

Snowflake Arctic Embed (class)

quality retrieval · RAG

0.33B params0.5K context

Mixedbread

mxbai-embed-large (class)

quality retrieval · RAG

0.34B params0.5K context

BAAI

BGE-M3 Embeddings (class)

multilingual retrieval · long documents

0.6B params8K context

DeepSeek

DeepSeek-R1 Distill 1.5B

tiny reasoning · edge

1.5B params128K context

Hardware it runs on

All hardware →

Machines that can host this agent today, scored for real local-AI workloads — cheapest strong fit first.

Apple · Apple Silicon

Apple Mac mini (M4 Pro)

57/100· Capable·~

More memory bandwidth and up to 64GB unified memory make this a surprisingly capable small-form-factor local-AI box.

Memory: 64 GB unified
Architecture: Apple M4 Pro

NVIDIA · Datacenter GPUs

NVIDIA L40S

59/100· Capable

A versatile 48GB datacenter card for inference and graphics — a popular, cost-effective cloud and on-prem serving option.

Memory: 48 GB
Architecture: Ada Lovelace

Reference · AI Workstations

Coding Agent Workstation (reference profile)

65/100· Strong

A workstation tuned for local coding agents: ~48GB across two 24GB cards runs strong 32B coder models and serves a small engineering team privately.

Memory: 48 GB
Architecture: Ada Lovelace

Run it private, in your cloud, or hybrid

Keep this agent on hardware you own for privacy and predictable cost, run it on cloud GPUs in your own account for bursts and the largest models, or do both.

Compare deployment modes →Recommend a build & estimate ROI →

Frequently asked questions

What is the Document / RAG agent?+

The document agent reads contracts, reports, policies and wikis and answers questions with citations, using retrieval-augmented generation over a private knowledge base rather than its training data.

Can the Document / RAG agent run privately on my own hardware?+

Yes. It runs on open-weight models you self-host on a private box, on-prem server or your own cloud account, so data stays on infrastructure you control. You can also run hybrid — local by default, bursting to the cloud for the largest models.

Which models power the Document / RAG agent?+

It works with open models such as all-MiniLM (class), Nomic Embed Text (class), Snowflake Arctic Embed (class). The right size depends on quality needs and the hardware you run it on — see the model library for VRAM by quantization.

What hardware does the Document / RAG agent need?+

It typically maps to the — tier. A machine like the Apple Mac mini (M4 Pro) strongly fits this role; lighter or heavier hardware shifts how many concurrent requests and how large a model you can run.