Regional Analysis Playbooks

Configuration guidance for running the MPOL screening pipeline across five maritime areas of interest. Each section is written for a specific analyst persona and covers bounding box settings, signal priorities, dashboard filters, and workarounds for scenarios the current scripts do not yet support natively.

Personas (implemented): Singapore/Malacca · Japan Sea/DPRK · US Gulf · Europe/Baltic · Middle East/Indian Ocean

Regional importance ranking

Understanding which regions carry the most weight in global maritime security helps prioritise where to deploy the pipeline first. Ranks 1–5 have full playbooks below. Ranks 6–8 are emerging priorities without dedicated playbooks yet — use the bbox values and signal notes as a starting point with any existing persona as a template.

Rank	Region	Persona	Key threat	Approx. bbox
1	Middle East / Indian Ocean	Persona 5	Iranian crude (~1.5–2 mbpd illicit); Houthi Red Sea attacks	`−10 32 30 80`
2	Singapore / Malacca Strait	Persona 1	Iranian + Russian crude STS blending; 40% of global seaborne trade	`−5 92 22 122` (default)
3	Europe (Baltic / Black Sea)	Persona 4	Russian shadow fleet (~400–600 vessels); G7 price cap evasion	`30 −22 72 42`
4	Japan Sea / East China Sea	Persona 2	DPRK coal/fuel UN sanctions violations	`25 115 48 145`
5	US Coastal / Gulf of Mexico	Persona 3	Venezuelan crude; Caribbean smuggling; OFAC enforcement	`8 −98 32 −60`
6	West Africa / Gulf of Guinea	—	Nigerian crude diversion, illicit bunkering off Bonny/Escravos; significant piracy threat (IMB top-ranked region); Angolan and Gabonese crude mislabelling	`−10 −5 15 15`
7	Cape of Good Hope / South Atlantic	—	Massively increased traffic since 2024 Red Sea rerouting; new STS hub emerging off South Africa (Algoa Bay); limited surveillance coverage creates blind spots	`−40 10 −25 40`
8	Arctic / Northern Sea Route	—	Russia routing LNG and crude via the Northern Sea Route to bypass EU/G7 controls; AIS coverage is sparse above 70°N; growing volume from Yamal and Arctic LNG 2	`65 20 85 180`

Persona 1 — Singapore / Malacca Strait Analyst

Who: Maritime security analyst at a port authority or regional coast guard (e.g., MPA Singapore, ReCAAP). Monitoring the world's busiest chokepoint for shadow tankers evading Iran, Russia, and Venezuela oil sanctions.

Primary signals: AIS gaps during transit, STS transfers at known anchorages (West of Batam, Karimata Strait), high-risk flag states, vessels with direct ownership links to OFAC/EU-listed entities.

Step-by-step configuration

A1 — AIS ingestion

This is the default configuration. No changes needed.

uv run python src/ingest/ais_stream.py
# default bbox: [[-5.0, 92.0], [22.0, 122.0]] (Malacca + South China Sea approaches)

A3 — Feature engineering

Use the default 30-day rolling window. The Malacca Strait is a high-frequency transit corridor so 30 days captures enough passages.

uv run python src/features/ais_behavior.py --window 30

A4 — Composite scoring

Default weights are tuned for this region (0.4 × anomaly + 0.4 × graph + 0.2 × identity). No changes needed.

Note: when running via scripts/run_pipeline.py, the C3 causal model automatically calibrates w_graph before Step 8. The value above is the fallback if insufficient AIS data is available.

A4 — Bunker barge exclusion (Singapore-specific)

Singapore waters contain a large population of legitimate service craft: bunker barges (AIS type 51–54), pilot tenders (type 51), and harbour tugs (types 31–32). These vessels loiter at low SOG near anchorages and refuelling points — the same behavioural signature as shadow-fleet STS transfers. Without exclusion, including them in the HDBSCAN training baseline compresses anomaly scores for genuine dark-vessel events.

The exclusion is on by default. No configuration needed for Singapore. To verify:

# Service vessel types excluded from HDBSCAN training (still scored by Isolation Forest):
# 31, 32 (tug/supply), 51-59 (pilot, SAR, fire-fighting, law enforcement, medical)
uv run python src/score/mpol_baseline.py --db data/processed/singapore.duckdb

To revert to legacy behaviour (not recommended for Singapore):

uv run python src/score/mpol_baseline.py --no-exclude-service-vessels

A5 / Dashboard — Filters to apply

Filter	Value	Reason
Vessel type	Tanker	Iranian/Russian crude moves in tankers
Minimum confidence	0.55	Strait traffic is dense; lower threshold catches more candidates
Top N	100	High vessel density warrants a wider review list

Key columns to sort by: sts_candidate_count, sanctions_distance, ais_gap_count_30d.

Workarounds

Historical AIS replay (pre-ingestion): ais_stream.py is live-only. To analyse a past incident (e.g., a reported STS event from last month), run a DuckDB query against accumulated ais_positions data filtered by timestamp and bbox directly. No script change needed if you have the historical data in the DB.

Expanding the bbox to cover the Indian Ocean approaches: Pass --bbox to override the default:

uv run python src/ingest/ais_stream.py --bbox -10 60 25 122
# covers Arabian Sea + Bay of Bengal approaches to Malacca

Note: a larger bbox increases WebSocket message volume significantly. Reduce --flush-interval to 30s to avoid memory pressure.

Persona 2 — Japan Sea / East China Sea Analyst

Who: Analyst at Japan Coast Guard, a UN Panel of Experts on DPRK, or a sanctions intelligence firm monitoring North Korean coal exports and fuel imports in violation of UN Security Council resolutions (UNSCR 2371, 2375, 2397).

Primary signals: Vessels going dark near DPRK waters, position jumps (GPS spoofing is endemic near the Korean Peninsula), STS transfers in the East China Sea, ownership links to DPRK-adjacent shell companies, high cluster_sanctions_ratio in the Lance Graph (DPRK-connected networks are tightly clustered).

Step-by-step configuration

A1 — AIS ingestion

Override the bbox to cover the Japan Sea, Yellow Sea, and East China Sea:

uv run python src/ingest/ais_stream.py \
  --bbox 25 115 48 145
# lat 25–48°N, lon 115–145°E
# covers Yellow Sea, Bohai, Japan Sea, East China Sea

Use a separate DuckDB file to keep Japan Sea data isolated from the default Singapore DB:

DB_PATH=data/processed/japansea.duckdb uv run python src/ingest/schema.py
DB_PATH=data/processed/japansea.duckdb uv run python src/ingest/ais_stream.py \
  --bbox 25 115 48 145 --db data/processed/japansea.duckdb

A2 — Sanctions loading

No changes needed. OpenSanctions already merges OFAC, EU, and UN lists. The UN consolidated list (which targets DPRK entities) is included automatically.

A3 — Feature engineering

Increase the AIS behavioral window to 60 days. DPRK-linked vessels make infrequent, long voyages — a 30-day window misses the full gap pattern.

DB_PATH=data/processed/japansea.duckdb \
  uv run python src/features/ais_behavior.py \
  --db data/processed/japansea.duckdb \
  --window 60

Run all other feature scripts with --db data/processed/japansea.duckdb.

A4 — Composite scoring

The default weight puts graph_risk_score at 40%. For DPRK analysis this is correct — ownership graph proximity to UN-listed entities is the strongest signal. No weight change needed.

Note: when running via scripts/run_pipeline.py, the C3 causal model automatically calibrates w_graph before Step 8. The value above is the fallback if insufficient AIS data is available.

Dashboard — Filters to apply

Filter	Value	Reason
Vessel type	Tanker, Cargo	Coal and petroleum product carriers
Minimum confidence	0.60	Tighter — DPRK evasion is highly deliberate
Top N	50	Focus on highest-certainty candidates

Key columns: position_jump_count (GPS spoofing near DPRK), ais_gap_count_30d (dark periods in transit), sanctions_distance (UN list proximity).

Launch the dashboard pointed at the Japan Sea DB:

WATCHLIST_OUTPUT_PATH=data/processed/japansea_watchlist.parquet \
  uv run uvicorn src.api.main:app --reload
  # open http://localhost:8000

Workarounds

No historical AIS for Japan Sea: aisstream.io is live-only and Marine Cadastre does not cover this region. To backfill: - Run ais_stream.py continuously for several days before scoring. - Alternatively, AISHub (www.aishub.net) offers a free data-sharing programme where members can download historical NMEA data for non-commercial use. Export to CSV and load via load_csv_to_duckdb() directly (the function accepts any CSV with the Marine Cadastre column schema).

Narrowing to DPRK-adjacent waters only: The full bbox (25–48°N, 115–145°E) is large. To focus specifically on the waters around DPRK (Yellow Sea west coast and Japan Sea east coast), run a second ingestion pass:

uv run python src/ingest/ais_stream.py \
  --bbox 36 124 42 132 --db data/processed/japansea.duckdb
# tight DPRK coastal corridor

AIS gap threshold: The default gap threshold is 6 hours. In Japan Sea analysis, vessels may go dark for 12–48 hours near DPRK. Pass the flag directly:

DB_PATH=data/processed/japansea.duckdb \
  uv run python src/features/ais_behavior.py \
  --db data/processed/japansea.duckdb \
  --window 60 \
  --gap-threshold-hours 12

Persona 3 — US Coastal / Gulf of Mexico Analyst

Who: USCG maritime intelligence officer or OFAC compliance analyst monitoring Venezuelan crude smuggling through the Caribbean, Gulf of Mexico STS operations, and Cuban embargo violations.

Primary signals: STS transfers in the Gulf of Mexico (particularly the Yucatan Channel and offshore platforms), vessels transiting between Venezuela and US-adjacent waters, flag-of-convenience vessels with weak port state control history.

Step-by-step configuration

A1 — AIS ingestion

Override the bbox to the Gulf of Mexico and Caribbean:

uv run python src/ingest/ais_stream.py \
  --bbox 8 -98 32 -60 --db data/processed/gulf.duckdb
# Gulf of Mexico + Caribbean + Venezuelan approaches

For US West Coast (e.g., Pacific sanctions enforcement):

uv run python src/ingest/ais_stream.py \
  --bbox 28 -135 50 -115 --db data/processed/uswest.duckdb

Marine Cadastre for historical backfill (US region only)

This is the one region where Marine Cadastre is directly useful. Pass --marine-cadastre-year to the pipeline, and it runs automatically using the Gulf bounding box:

PIPELINE_REGION=gulf docker compose run --rm pipeline \
  uv run python scripts/run_pipeline.py \
  --region gulf --non-interactive \
  --marine-cadastre-year 2023

# Multiple years
PIPELINE_REGION=gulf docker compose run --rm pipeline \
  uv run python scripts/run_pipeline.py \
  --region gulf --non-interactive \
  --marine-cadastre-year 2022 --marine-cadastre-year 2023

A3 — Feature engineering

Use a shorter window for US coastal traffic — vessels transit faster and more frequently:

DB_PATH=data/processed/gulf.duckdb \
  uv run python src/features/ais_behavior.py \
  --db data/processed/gulf.duckdb \
  --window 14

A4 — Composite scoring

For US/OFAC analysis, ownership graph proximity is less discriminating (more vessels have OFAC exposure in this region) and behavioral anomaly is more important. Pass weights via CLI flags:

uv run python src/score/composite.py \
  --db data/processed/gulf.duckdb \
  --w-anomaly 0.50 --w-graph 0.30 --w-identity 0.20

Note: when running via scripts/run_pipeline.py, the C3 causal model automatically calibrates w_graph before Step 8. The value above is the fallback if insufficient AIS data is available.

Dashboard — Filters to apply

Filter	Value	Reason
Vessel type	Tanker, Cargo	Venezuelan crude and refined products
Minimum confidence	0.50	Broader net for initial screening
Top N	75	Gulf traffic is dense

Launch pointed at the Gulf DB:

WATCHLIST_OUTPUT_PATH=data/processed/gulf_watchlist.parquet \
  uv run uvicorn src.api.main:app --reload
  # open http://localhost:8000

Workarounds

Marine Cadastre bbox: The CLI defaults to Singapore. Pass --bbox lat_min lon_min lat_max lon_max to override, e.g. --bbox 8 -98 32 -60 for the Gulf.

Composite weights: Pass flags directly to composite.py:

uv run python src/score/composite.py \
  --db data/processed/gulf.duckdb \
  --w-anomaly 0.50 --w-graph 0.30 --w-identity 0.20

Multiple regions simultaneously: The pipeline uses a single DB_PATH. To run all three regions in parallel, use separate .env files:

# Terminal 1 — Singapore
DB_PATH=data/processed/sg.duckdb uv run python src/ingest/ais_stream.py

# Terminal 2 — Japan Sea
DB_PATH=data/processed/japansea.duckdb uv run python src/ingest/ais_stream.py \
  --bbox 25 115 48 145 --db data/processed/japansea.duckdb

# Terminal 3 — Gulf
DB_PATH=data/processed/gulf.duckdb uv run python src/ingest/ais_stream.py \
  --bbox 8 -98 32 -60 --db data/processed/gulf.duckdb

Each region gets its own DuckDB file. Run the full feature + scoring pipeline separately for each by passing --db to every script.

Persona 4 — European Waters Analyst

Who: Analyst at EMSA (European Maritime Safety Agency), a national coast guard (e.g., UK HMCG, Danish Maritime Authority), or an EU sanctions compliance team monitoring Russian crude exports following the February 2022 invasion of Ukraine and the G7 price cap regime.

Primary signals: AIS dark periods near Russian Baltic export terminals (Primorsk, Ust-Luga), STS transfers in international waters off the Greek coast or the Strait of Gibraltar, vessels transiting the Bosphorus with suspiciously low declared cargo values, rapid flag changes away from EU/G7 registries, ownership chains routed through UAE or Turkey to obscure Russian beneficial ownership.

Key sub-regions:

Sub-region	Coverage	Primary concern
Baltic Sea	54–66°N, 10–30°E	Russian crude exports from Primorsk and Ust-Luga
North Sea	51–62°N, −5–10°E	Re-export via Rotterdam/ARA hub
Mediterranean	30–46°N, −6–36°E	STS off Greece/Malta, Libyan crude
Black Sea / Bosphorus	40–47°N, 26–42°E	Russian Novorossiysk crude transiting Turkish Straits

Step-by-step configuration

A1 — AIS ingestion

Use a broad European bbox that covers all four sub-regions:

DB_PATH=data/processed/europe.duckdb uv run python src/ingest/schema.py

uv run python src/ingest/ais_stream.py \
  --bbox 30 -22 72 42 --db data/processed/europe.duckdb
# lat 30–72°N, lon 22°W–42°E
# covers Atlantic approaches, North Sea, Baltic, Mediterranean, Black Sea

To focus on the Baltic alone (lower volume, faster iteration):

uv run python src/ingest/ais_stream.py \
  --bbox 54 10 66 30 --db data/processed/baltic.duckdb

A2 — Sanctions loading

No changes needed. The EU consolidated sanctions list is already included in OpenSanctions CC0 data loaded by sanctions.py. Russian-linked entities sanctioned under EU Regulation 833/2014 will be present.

A3 — Feature engineering

Russian Baltic tankers make slow, deliberate voyages with predictable patterns — deviations are highly meaningful. Use a 45-day window to capture the full loading-transit-delivery cycle:

DB_PATH=data/processed/europe.duckdb \
  uv run python src/features/ais_behavior.py \
  --db data/processed/europe.duckdb \
  --window 45

Run all other feature scripts with --db data/processed/europe.duckdb.

A4 — Composite scoring

For European/Russian sanctions analysis, identity volatility is a very strong signal — the Russian shadow fleet aggressively re-flags and renames vessels. Shift weight toward identity via CLI flags:

uv run python src/score/composite.py \
  --db data/processed/europe.duckdb \
  --w-anomaly 0.35 --w-graph 0.35 --w-identity 0.30

Note: when running via scripts/run_pipeline.py, the C3 causal model automatically calibrates w_graph before Step 8. The value above is the fallback if insufficient AIS data is available.

Dashboard — Filters to apply

Filter	Value	Reason
Vessel type	Tanker	Russian crude and oil products dominate the shadow fleet here
Minimum confidence	0.55	European AIS coverage is dense; false positives are lower
Top N	75	Baltic shadow fleet is estimated at 400–600 vessels globally

Key columns to review: flag_changes_2y (vessels cycling through Palau, Gabon, Cameroon flags), owner_changes_2y (rapid ownership restructuring to obscure Russian links), sts_candidate_count (Greek anchorage STS operations).

Launch the dashboard pointed at the Europe DB:

WATCHLIST_OUTPUT_PATH=data/processed/europe_watchlist.parquet \
  uv run uvicorn src.api.main:app --reload
  # open http://localhost:8000

Workarounds

No historical AIS for European waters: Marine Cadastre is US-only. For historical backfill: - MarineTraffic / VesselFinder: Both offer historical AIS data exports (paid). Export to CSV with columns matching the Marine Cadastre schema (MMSI, BaseDateTime, LAT, LON, SOG, COG, VesselType) and load via load_csv_to_duckdb() with a custom bbox:

from src.ingest.marine_cadastre import load_csv_to_duckdb
from pathlib import Path

BALTIC_BBOX = {"lat_min": 54.0, "lat_max": 66.0, "lon_min": 10.0, "lon_max": 30.0}
load_csv_to_duckdb(Path("data/raw/baltic_historical.csv"),
                   db_path="data/processed/europe.duckdb",
                   bbox=BALTIC_BBOX)

AISHub: Free data-sharing programme. Members can download historical NMEA logs, convert to CSV, and load the same way.

Bosphorus / Turkish Straits chokepoint monitoring: The Bosphorus (41°N, 29°E) is a critical transit point for Black Sea crude. To monitor it specifically, run a focused ingestion stream in parallel:

uv run python src/ingest/ais_stream.py \
  --bbox 40 26 42 30 --db data/processed/bosphorus.duckdb
# tight bbox around the Turkish Straits

Vessels appearing in both the Black Sea and Mediterranean DBs within a plausible transit window (12–24 h) are confirmed Bosphorus transits.

EU sanctions list filtering: sanctions.py loads all OpenSanctions datasets merged. To understand which entities are EU-specific, query the DB after loading:

uv run python - <<'EOF'
import duckdb
con = duckdb.connect("data/processed/europe.duckdb", read_only=True)
print(con.execute("""
    SELECT source, COUNT(*) AS n
    FROM sanctions_entities
    GROUP BY source ORDER BY n DESC
""").df())
con.close()
EOF

high_risk_flag_ratio for European context: The identity feature uses a global list of weak port-state-control flags. For Russian shadow fleet analysis, add Gabon (GA), Palau (PW), and Cameroon (CM) to the high-risk flag list if they are not already included — these are the primary re-flagging destinations observed since 2022. Check src/features/identity.py for the flag list definition.

Persona 5 — Middle East / Indian Ocean Analyst

Who: Analyst at the US Fifth Fleet (Bahrain), the Combined Maritime Forces (CMF), the IMO, or a commercial maritime risk firm (e.g., Ambrey, Dryad Global) monitoring Iranian crude exports, Houthi-threatened Red Sea corridors, and Gulf STS operations.

Why this region ranks #1: Iran exports an estimated 1.5–2.0 million barrels per day in violation of OFAC sanctions, almost entirely via shadow tankers. The Strait of Hormuz (21°N, 57°E) is the single most critical maritime chokepoint — 20% of global oil flows through it. Since late 2023 the Red Sea has become a kinetic threat zone, forcing rerouting around the Cape of Good Hope and creating new shadow fleet opportunities in the Indian Ocean.

Primary signals: Repeated AIS gaps in the Arabian Gulf (loading at Kharg Island or Bandar Abbas without declaring), STS transfers off Fujairah (UAE) and in the Gulf of Oman, position jumps near Hormuz (Iranian GPS jamming is documented), vessels rebranding between Iranian and Malaysian/Chinese flags, ownership chains to IRGC-linked holding companies.

Key sub-regions:

Sub-region	Coverage	Primary concern
Arabian Gulf	22–30°N, 48–57°E	Iranian crude loading at Kharg Island, Bandar Abbas
Strait of Hormuz	24–27°N, 55–60°E	Chokepoint transit; spoofing and dark periods
Gulf of Oman / Fujairah	22–27°N, 56–62°E	STS hub for Iranian crude ship-to-ship transfers
Red Sea	12–30°N, 32–44°E	Houthi threat zone; vessels rerouting or declaring false destinations
Indian Ocean approaches	−10–22°N, 55–80°E	Iranian crude en route to India, China, and blending hubs

Step-by-step configuration

A1 — AIS ingestion

Use a broad bbox covering the Arabian Gulf through the Indian Ocean:

DB_PATH=data/processed/middleeast.duckdb uv run python src/ingest/schema.py

uv run python src/ingest/ais_stream.py \
  --bbox -10 32 30 80 --db data/processed/middleeast.duckdb
# lat 10°S–30°N, lon 32°E–80°E
# covers Red Sea, Arabian Gulf, Gulf of Oman, western Indian Ocean

To focus on the Strait of Hormuz and Fujairah STS zone only:

uv run python src/ingest/ais_stream.py \
  --bbox 22 55 27 62 --db data/processed/hormuz.duckdb

For the Red Sea corridor (Houthi threat zone):

uv run python src/ingest/ais_stream.py \
  --bbox 11 32 30 44 --db data/processed/redsea.duckdb

A2 — Sanctions loading

No changes needed. OFAC SDN already contains several hundred Iranian entities including IRGC-linked shipping companies, vessels, and owners. OpenSanctions merges these with EU and UN Iran-specific designations.

A3 — Feature engineering

Use a 60-day window — Iranian shadow tankers make long round trips (Arabian Gulf → China) of 30–50 days. A 30-day window will cut off the gap evidence mid-voyage.

DB_PATH=data/processed/middleeast.duckdb \
  uv run python src/features/ais_behavior.py \
  --db data/processed/middleeast.duckdb \
  --window 60

Increase the AIS gap threshold to detect Iranian-style dark periods (typically 12–72 h near Kharg Island):

DB_PATH=data/processed/middleeast.duckdb \
  uv run python src/features/ais_behavior.py \
  --db data/processed/middleeast.duckdb \
  --window 60 \
  --gap-threshold-hours 12

Run all other feature scripts with --db data/processed/middleeast.duckdb.

A4 — Composite scoring

For Iranian crude, all three signal categories are strong. The default weights are appropriate. However, sanctions_distance carries outsized predictive power here because Iran operates a tightly-connected network where most vessels are within 2–3 hops of an OFAC-listed entity. No weight change required.

Note: when running via scripts/run_pipeline.py, the C3 causal model automatically calibrates w_graph before Step 8. The value above is the fallback if insufficient AIS data is available.

Dashboard — Filters to apply

Filter	Value	Reason
Vessel type	Tanker	Iranian crude is the dominant cargo
Minimum confidence	0.60	High-quality signal environment; set threshold higher to reduce noise
Top N	100	Iranian shadow fleet is large (~300–400 active vessels)

Key columns to review: ais_gap_count_30d and ais_gap_max_hours (dark loading periods), position_jump_count (GPS spoofing near Hormuz), sts_candidate_count (Fujairah STS), sanctions_distance (IRGC network proximity).

Launch the dashboard:

WATCHLIST_OUTPUT_PATH=data/processed/middleeast_watchlist.parquet \
  uv run uvicorn src.api.main:app --reload
  # open http://localhost:8000

Workarounds

Iranian GPS jamming creates false position jumps: The position_jump_count feature flags consecutive positions implying speed > 50 knots. Near Hormuz this is often GPS jamming rather than actual spoofing by the vessel. To separate the two, filter by geographic proximity to known jamming zones before acting on this signal — it remains a valid flag but should be weighted lower for Hormuz transits specifically. No automated workaround exists yet; manual review of flagged vessels' last known positions is recommended.

Red Sea rerouting creates anomalous patterns for legitimate vessels: Since 2024, many non-shadow-fleet vessels have stopped transiting the Red Sea, creating unusual behavioral patterns (e.g., long Cape of Good Hope detours) that may elevate anomaly_score for innocent vessels. To reduce false positives, filter out vessels whose last position is in the Cape of Good Hope corridor (lat −35°–−25°N, lon 15°–35°E) from the final watchlist:

import polars as pl
df = pl.read_parquet("data/processed/middleeast_watchlist.parquet")
# Exclude vessels last seen near Cape of Good Hope
cape_bbox = (df["last_lat"].is_between(-35, -25)) & (df["last_lon"].is_between(15, 35))
df = df.filter(~cape_bbox)
df.write_parquet("data/processed/middleeast_watchlist_filtered.parquet")

No historical AIS for the region: Marine Cadastre is US-only. For backfill: - AISHub free sharing programme covers Indian Ocean reasonably well. - UN Panel of Experts reports (published annually) name specific vessels — cross-reference their MMSIs against the watchlist as a validation set instead of OFAC alone.

Running Hormuz and Red Sea as separate focused streams: Both sub-regions can run as separate ingestion processes with their own DBs while the broad Middle East stream runs in the background:

# Terminal 1 — broad Middle East
uv run python src/ingest/ais_stream.py --bbox -10 32 30 80 --db data/processed/middleeast.duckdb

# Terminal 2 — Hormuz chokepoint (high resolution)
uv run python src/ingest/ais_stream.py --bbox 22 55 27 62 --db data/processed/hormuz.duckdb

# Terminal 3 — Red Sea threat zone
uv run python src/ingest/ais_stream.py --bbox 11 32 30 44 --db data/processed/redsea.duckdb

Score each DB independently and merge the top candidates manually for the daily brief.

Running the pipeline

The recommended entry point is scripts/run_pipeline.py, which handles region selection, passes all flags automatically, and walks through each step interactively:

uv run python scripts/run_pipeline.py                          # interactive
uv run python scripts/run_pipeline.py --region japan --non-interactive

Feature gaps and planned improvements

Gap	Affected personas	Workaround
~~No CLI flag for `GAP_THRESHOLD_H`~~	~~Japan Sea, Middle East~~	Resolved — use `--gap-threshold-hours` on `ais_behavior.py`
~~No CLI flags for composite weights~~	~~US Gulf, Europe~~	Resolved — use `--w-anomaly`, `--w-graph`, `--w-identity` on `composite.py`
Marine Cadastre bbox hardcoded for Singapore	US Gulf, Europe	Call `load_csv_to_duckdb()` with custom `bbox` dict
No historical AIS for non-US regions	Japan Sea, Europe, Middle East	Run `ais_stream.py` to accumulate data; or import AISHub/MarineTraffic CSV exports
Single `DB_PATH` per pipeline run	All multi-region	Use separate DuckDB files and pass `--db` to every script
No sub-region filtering within a bbox	Europe (Bosphorus), Middle East (Hormuz/Red Sea)	Run a second `ais_stream.py` instance with a tighter bbox and separate DB
Iranian GPS jamming inflates `position_jump_count`	Middle East	Manual review of Hormuz-area vessels; no automated filter yet
Red Sea rerouting creates false anomaly elevation	Middle East	Post-score filter excluding Cape of Good Hope positions (see Persona 5 workaround)