arktrace
arktrace is a Causal Inference Engine for Shadow Fleet Prediction. It uses Difference-in-Differences (DiD) to identify vessels that causally respond to sanction announcements with evasion behaviour — detecting unknown-unknown threats 60–90 days before they appear on public sanctions lists. AIS behaviour, ownership graph proximity, and trade flow data form the evidentiary substrate; causal inference and network-based backtracking propagation are the novel methodology.
The Problem
A shadow fleet vessel moves sanctioned oil or cargo while deliberately hiding what it is doing. It combines several techniques at once, which is why individual tracking tools miss it:
| Technique | How it works |
|---|---|
| Going dark (AIS off) | Switches off its tracking transponder during cargo transfers so no position record exists |
| GPS spoofing | Broadcasts a false position while the actual transfer happens elsewhere |
| Flag hopping | Changes its country of registration frequently to reset port inspection history |
| Name and identity change | Renames itself and re-registers under new shell companies to break continuity in watchlists |
| Ship-to-ship transfer at sea | Moves cargo vessel-to-vessel far from any port, leaving no port record |
| Shell company ownership | Buries beneficial ownership behind 4–6 layers of holding companies across multiple jurisdictions |
Conventional tools detect individual techniques in isolation. arktrace goes further: it uses causal inference to identify vessels whose evasion behaviour was triggered by specific sanction events, separating genuine evasion from ordinary commercial route changes, and propagates signals through the ownership network to surface connected threats not yet on any list.
The default area of interest is the Strait of Malacca and Singapore Strait — the world's busiest shipping lane. Five regions are supported: Singapore/Malacca Strait, Japan Sea, Middle East, Europe, and US Gulf.
How It Works — Four Steps
1. Ingest public data The pipeline pulls vessel tracking data (AIS), international sanctions lists (OFAC, EU, UN), vessel ownership records, and bilateral trade statistics from public sources. No proprietary data feeds or costly subscriptions are required.
2. Apply causal inference and compute 19 signals per vessel
The core model (src/score/causal_sanction.py) runs a Difference-in-Differences regression for each vessel around every major sanction announcement, testing whether behavioural change was causally driven by the event rather than coincidental. This is the primary innovation. Four signal families — movement anomaly, identity churn, ownership network distance, and trade flow mismatch — serve as the evidentiary substrate that feeds the causal model and an unknown-unknown detector (src/analysis/causal.py), which surfaces vessels with no current sanctions link but evasion-consistent causal signatures.
3. Rank candidates on the watchlist Causal scores and network propagation results are combined into a single confidence score. The dashboard shows a map and ranked table with a plain-English explanation of the top signals that drove each vessel's ranking — e.g. "causal DiD response to 2024-10 OFAC announcement (p < 0.01), one ownership hop from a sanctioned entity, changed flag 3 times in 2 years" — so an analyst can immediately understand the causal chain.
4. Hand off to a patrol officer High-scoring vessels are exported as a task file for the patrol vessel. The officer dispatches for close-range inspection. Results (confirmed, cleared, inconclusive) feed back into the causal model as hard labels, tightening future DiD estimates and triggering graph-wide backtracking to surface connected threats.
How Effective Is It?
| Capability | What to expect |
|---|---|
| Pre-designation lead time | 60–90 days before OFAC listing (backtested via DiD on historical sanction announcements). See docs/scoring-model.md. |
| Unknown-unknown detection | Causal signatures surface vessels with no sanctions link whose behaviour pattern matches confirmed evaders — catching threats before they appear on any list. |
| Detection rate | Precision@50 target ≥ 0.60: at least 30 of the top-50 ranked candidates are confirmed OFAC-listed vessels. AUROC and Recall@200 are also tracked. |
| False positive reduction | Geopolitical rerouting filter down-weights DiD scores for vessels on declared diversion routes (e.g. Cape of Good Hope since 2023), reducing noise from legitimate commercial rerouting. |
| Network propagation | Confirmed vessels trigger graph-wide backtracking (scripts/run_backtracking.py) to surface ownership-connected threats not yet on any watchlist. |
| Explainability | Every score has a per-feature SHAP breakdown. Analysts see the exact causal and network signals that drove each result — no black-box verdicts. |
How Efficient Is It?
| Resource | Requirement |
|---|---|
| Hardware | Standard laptop (4 vCPU / 8 GB RAM). No GPU, no cloud, no external server. |
| Full pipeline run | ~45 minutes from raw data to ranked watchlist |
| Incremental re-score | Under 60 seconds per batch during live monitoring |
| Live alerting | Real-time SSE alerts when a vessel crosses a configurable confidence threshold |
| Software cost | Fully open-source. No licensing fees. |
| Regions | Switch between Singapore, Japan Sea, Middle East, Europe, and US Gulf with a single CLI flag — no code changes |
Screening and Physical Investigation
arktrace covers Phase A — Screening (this repository). Phase B — Physical Investigation — is the patrol vessel software suite implemented in edgesentry-rs and edgesentry-app.
| Phase | What it does | Status |
|---|---|---|
| A — Screening | Ingest public data → compute risk signals → rank candidates → analyst dashboard | Working |
| B — Physical Investigation | Patrol vessel dispatch → OCR identity check → LiDAR hull scan → cryptographically signed evidence → VDES secure transmission | Design specification complete; implementation begins after trial contract award |
Phase A produces the watchlist. Phase B acts on it. Patrol outcomes flow back into Phase A to improve future rankings.
Human Oversight
The model ranks candidates — humans decide what to do. No automated decision triggers legal or operational action.
- Every candidate is reviewed by an analyst before escalation. Review tiers: Confirmed, Probable, Suspect, Cleared, Inconclusive.
- "Confirmed" requires at least two independent high-credibility sources, or one official designation with a verified vessel identifier (MMSI/IMO match).
- Every review decision is recorded with a rationale, evidence references, and reviewer identity.
Full policy: docs/triage-governance.md.
Document Index
| To understand… | Read |
|---|---|
| Detection signals and scoring formula | docs/scoring-model.md |
| All 19 features and what each detects | docs/feature-engineering.md |
| Causal reasoning and unknown-unknown detection | docs/causal-analysis.md |
| Physical vessel investigation (Phase B) | docs/field-investigation.md |
| Human oversight, evidence policy, tier taxonomy | docs/triage-governance.md |
| Validation metrics and backtesting methodology | docs/backtesting-validation.md |
| Three operational scenarios (duty officer, analyst, patrol) | docs/scenarios.md |
| Full roadmap (Phase A and Phase B) | docs/roadmap.md |
| Deployment (local, Docker, cloud VM) | docs/deployment.md |
| Tech stack and algorithm details | docs/technical-solution.md |
| Pipeline operations reference | docs/pipeline-operations.md |
| Regional configuration (Singapore, Japan Sea, etc.) | docs/regional-playbooks.md |