LLM Setup

arktrace supports three LLM providers configured via .env. Set LLM_PROVIDER to one of:

`LLM_PROVIDER`	Where it runs	Requires
`llamacpp` (default)	Local — no server, no internet	GGUF model file
`anthropic`	Remote — Anthropic API	`ANTHROPIC_API_KEY`
`openai`	Remote — any OpenAI-compatible API	`LLM_BASE_URL` + `LLM_API_KEY`

What the LLM does in arktrace

Feature	Prompt shape	Typical output
Analyst brief	Vessel profile + top SHAP signals + 3 GDELT events	One paragraph citing a specific event and how it connects to the vessel's risk score
Analyst chat	Fleet overview + optional vessel detail + analyst question	Direct factual answer grounded in the provided data

Prompts are short (500–1,200 tokens in, 150–300 out). Instruction following matters more than reasoning ability — a 4B model is sufficient.

🍎 Native macOS dev mode (Apple Metal — recommended for local dev)

Docker on macOS runs through a Colima Linux VM which has no access to Apple Metal (GPU/ANE). Running the dashboard natively on the host bypasses the VM and enables Metal-accelerated inference — typically 5–10× faster on Apple Silicon.

Infra (MinIO) still runs in Docker. Only the FastAPI process runs on the host.

One-time setup

# 1. Install llama-cpp-python with Metal support
CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python --force-reinstall

# 2. Download the model (saves to ~/models/ by default)
uv run python scripts/download_model.py phi-4-mini-it

Start infra + dashboard

# Recommended — one command starts everything:
bash scripts/run_dev.sh

# Or step-by-step:
docker compose -f docker-compose.infra.yml up -d   # MinIO only, no dashboard container

S3_ENDPOINT=http://localhost:9000 \
LLAMACPP_MODEL_PATH=~/models/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf \
  uv run uvicorn src.api.main:app --host 0.0.0.0 --port 8000 --reload

scripts/run_dev.sh accepts a few flags:

Flag	Default	Description
`--model PATH`	auto-detect	Path to GGUF model file
`--provider NAME`	`llamacpp`	Override `LLM_PROVIDER`
`--port PORT`	`8000`	uvicorn port
`--no-infra`	—	Skip Docker infra (MinIO already running)

Provider: llamacpp (local, no server)

The simplest setup — no separate server, no internet, runs on any laptop with 8 GB RAM.

1. Install:

uv pip install llama-cpp-python
# Apple Silicon — Metal acceleration:
CMAKE_ARGS="-DGGML_METAL=on" uv pip install llama-cpp-python --force-reinstall

2. Download a GGUF model:

# Mistral 7B Instruct v0.3 (~4.4 GB) — Apache 2.0, no restrictions on government or defence use:
uv run python scripts/download_model.py mistral-7b-it

Models are saved to ~/models/ by default. Override with --dir /path/to/dir.

3. Configure .env:

LLM_PROVIDER=llamacpp
LLAMACPP_MODEL_PATH=/Users/yourname/models/Mistral-7B-Instruct-v0.3-Q4_K_M.gguf

Alternatively, skip the download step and let the dashboard pull the model from HuggingFace on first request:

LLM_PROVIDER=llamacpp
LLAMACPP_MODEL_REPO=bartowski/Mistral-7B-Instruct-v0.3-GGUF
LLAMACPP_MODEL_FILE=*Q4_K_M*

4. Start the dashboard — no other process needed:

uv run uvicorn src.api.main:app --reload

Docker (full stack, no Metal): docker compose up handles everything — model_init downloads the model into a named volume on first run, then the dashboard starts automatically:

docker compose up

Docker infra only (for native macOS dev): Start only MinIO without the dashboard container:

docker compose -f docker-compose.infra.yml up -d

See the native macOS dev mode section above for the complete workflow.

The model loads once on first request. If LLAMACPP_MODEL_PATH is unset or the file is missing, the dashboard loads normally and brief generation returns a "LLM not configured" placeholder.

Model guide:

Short name	HuggingFace repo	Licence	Q4_K_M size	Min RAM
`phi-4-mini-it`	`bartowski/microsoft_Phi-4-mini-instruct-GGUF`	MIT — no restrictions on government or defence use	~2.4 GB	8 GB
`qwen2.5-3b-it`	`bartowski/Qwen2.5-3B-Instruct-GGUF`	Apache 2.0 — no restrictions on government or defence use	~2.0 GB	8 GB
`smollm2-1.7b-it`	`bartowski/SmolLM2-1.7B-Instruct-GGUF`	Apache 2.0 — smallest option; good for low-RAM environments	~1.1 GB	6 GB
`mistral-7b-it`	`bartowski/Mistral-7B-Instruct-v0.3-GGUF`	Apache 2.0 — highest quality local option	~4.4 GB	10 GB

Provider: anthropic (remote)

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
LLM_MODEL=claude-haiku-4-5-20251001   # default — fast and cheap for briefs

Provider: openai (remote, any OpenAI-compatible API)

Works with OpenAI, Ollama, MLX LM, LM Studio, or any other OpenAI-compatible endpoint.

OpenAI:

LLM_PROVIDER=openai
LLM_API_KEY=sk-...
LLM_MODEL=gpt-4o-mini
# LLM_BASE_URL defaults to https://api.openai.com/v1 if not set

Self-hosted (Ollama, LM Studio, etc.):

LLM_PROVIDER=openai
LLM_BASE_URL=http://localhost:11434/v1   # Ollama
LLM_API_KEY=local
LLM_MODEL=qwen2.5:7b