⚠️ Model quality matters enormously for legal work.
Small models that fit on a standard MacBook Air (≤16 GB RAM, 3B–7B parameters) perform very poorly at legal citation analysis — expect frequent hallucinations and missed errors. For reliable results you need ≥32B parameters, which require high-end workstation hardware. If you don't have that hardware, use a cloud provider instead.
Option A — LM Studio (recommended for Mac)
- Download LM Studio and open it.
- Click the search icon and download your model:
• 8 GB RAM: qwen2.5-3b-instruct-mlx or llama-3.2-3b-instruct-mlx (~2 GB)
• 16 GB RAM: gemma-4-e4b-it-mlx or qwen3-14b-mlx
Download the MLX version (Apple Silicon only — 2–3× faster than Ollama)
- In the left sidebar, click Developer → toggle Enable local server on.
- Note the server address (usually
http://127.0.0.1:1234/v1).
- In cite.review Settings: set Base URL to
http://127.0.0.1:1234/v1, leave API Key blank.
- Set Model name to the exact identifier shown in LM Studio's server tab (e.g.
qwen2.5-3b-instruct-mlx).
Option B — Ollama (Windows / Linux / Mac)
- Download Ollama and install it.
- Open Terminal and pull your model:
• 8 GB RAM: ollama pull gemma4:e4b
• 16 GB RAM: ollama pull qwen3:14b
- Ollama starts automatically — no API key needed.
- In cite.review Settings: set Base URL to
http://localhost:11434/v1, leave API Key blank.
- Set Model name to the name you pulled (e.g.
qwen3:14b).
Even with a large model, local inference is slower than cloud providers and results may vary. For production legal work, a cloud provider is more reliable.