Skip to content

LLM Setup

arktrace supports three LLM providers configured via .env. Set LLM_PROVIDER to one of:

LLM_PROVIDER Where it runs Requires
llamacpp (default) Local — no server, no internet GGUF model file
anthropic Remote — Anthropic API ANTHROPIC_API_KEY
openai Remote — any OpenAI-compatible API LLM_BASE_URL + LLM_API_KEY

What the LLM does in arktrace

Feature Prompt shape Typical output
Analyst brief Vessel profile + top SHAP signals + 3 GDELT events One paragraph citing a specific event and how it connects to the vessel's risk score
Analyst chat Fleet overview + optional vessel detail + analyst question Direct factual answer grounded in the provided data

Prompts are short (500–1,200 tokens in, 150–300 out). Instruction following matters more than reasoning ability — a 4B model is sufficient.


Docker on macOS runs through a Colima Linux VM which has no access to Apple Metal (GPU/ANE). Running the dashboard natively on the host bypasses the VM and enables Metal-accelerated inference — typically 5–10× faster on Apple Silicon.

Infra (MinIO) still runs in Docker. Only the FastAPI process runs on the host.

One-time setup

# 1. Install llama-cpp-python with Metal support
CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python --force-reinstall

# 2. Download the model (saves to ~/models/ by default)
uv run python scripts/download_model.py phi-4-mini-it

Start infra + dashboard

# Recommended — one command starts everything:
bash scripts/run_dev.sh

# Or step-by-step:
docker compose -f docker-compose.infra.yml up -d   # MinIO only, no dashboard container

S3_ENDPOINT=http://localhost:9000 \
LLAMACPP_MODEL_PATH=~/models/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf \
  uv run uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload

scripts/run_dev.sh accepts a few flags:

Flag Default Description
--model PATH auto-detect Path to GGUF model file
--provider NAME llamacpp Override LLM_PROVIDER
--port PORT 8000 uvicorn port
--no-infra Skip Docker infra (MinIO already running)

Provider: llamacpp (local, no server)

The simplest setup — no separate server, no internet, runs on any laptop with 8 GB RAM.

1. Install:

uv pip install llama-cpp-python
# Apple Silicon — Metal acceleration:
CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python --force-reinstall

2. Download a GGUF model:

# Mistral 7B Instruct v0.3 (~4.4 GB) — Apache 2.0, no restrictions on government or defence use:
uv run python scripts/download_model.py mistral-7b-it

Models are saved to ~/models/ by default. Override with --dir /path/to/dir.

3. Configure .env:

LLM_PROVIDER=llamacpp
LLAMACPP_MODEL_PATH=/Users/yourname/models/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf

Alternatively, skip the download step and let the dashboard pull the model from HuggingFace on first request:

LLM_PROVIDER=llamacpp
LLAMACPP_MODEL_REPO=bartowski/Mistral-7B-Instruct-v0.3-GGUF
LLAMACPP_MODEL_FILE=*Q4_K_M*

4. Start the dashboard — no other process needed:

uv run uvicorn src.api.main:app --reload

Docker (full stack, no Metal): docker compose up handles everything — model_init downloads the model into a named volume on first run, then the dashboard starts automatically:

docker compose up

Docker infra only (for native macOS dev): Start only MinIO without the dashboard container:

docker compose -f docker-compose.infra.yml up -d

See the native macOS dev mode section above for the complete workflow.

The model loads once on first request. If LLAMACPP_MODEL_PATH is unset or the file is missing, the dashboard loads normally and brief generation returns a "LLM not configured" placeholder.

Model guide:

Short name HuggingFace repo Licence Q4_K_M size Min RAM
phi-4-mini-it bartowski/microsoft_Phi-4-mini-instruct-GGUF MIT — no restrictions on government or defence use ~2.4 GB 8 GB
qwen2.5-3b-it bartowski/Qwen2.5-3B-Instruct-GGUF Apache 2.0 — no restrictions on government or defence use ~2.0 GB 8 GB
smollm2-1.7b-it bartowski/SmolLM2-1.7B-Instruct-GGUF Apache 2.0 — smallest option; good for low-RAM environments ~1.1 GB 6 GB
mistral-7b-it bartowski/Mistral-7B-Instruct-v0.3-GGUF Apache 2.0 — highest quality local option ~4.4 GB 10 GB

Provider: anthropic (remote)

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
LLM_MODEL=claude-haiku-4-5-20251001   # default — fast and cheap for briefs

Provider: openai (remote, any OpenAI-compatible API)

Works with OpenAI, Ollama, MLX LM, LM Studio, or any other OpenAI-compatible endpoint.

OpenAI:

LLM_PROVIDER=openai
LLM_API_KEY=sk-...
LLM_MODEL=gpt-4o-mini
# LLM_BASE_URL defaults to https://api.openai.com/v1 if not set

Self-hosted (Ollama, LM Studio, etc.):

LLM_PROVIDER=openai
LLM_BASE_URL=http://localhost:11434/v1   # Ollama
LLM_API_KEY=local
LLM_MODEL=qwen2.5:7b