Pipeline Catalog (Operations View)

This document is an operations-first map of the pipelines in arktrace. It focuses on four questions:

What pipeline type exists?
Why does it exist?
When should it run?
What result should operators expect?

Scope Clarification: Backend Jobs vs Web UI Operations

The pipeline types in this document refer to backend execution jobs (batch, scheduled, or event-triggered runs), not manual button-by-button work in the web dashboard.

Backend jobs:
- Run by CI/CD, schedulers (cron/orchestrator), long-running services, or operator-triggered CLI/API calls.
- Produce and refresh data artifacts such as DuckDB tables, watchlists, scores, and evaluation reports.
Web UI operations:
- Human review and investigation workflow in the dashboard (filter, inspect vessel detail, submit tier/handoff decisions).
- Consume pipeline outputs and write review feedback, but do not replace core ingestion/scoring/evaluation pipeline execution.

For implementation-level commands and flags, see Pipeline Operations.

Pipeline Types At A Glance

Pipeline Type	Primary Purpose	Typical Run Timing	Expected Result
Full Screening (9-step)	Generate ranked shadow-fleet candidates from raw/public data	Initial setup, regional refresh, periodic baseline run	Updated watchlist and supporting artifacts for analyst triage
Continuous Monitoring (Streaming)	Keep vessel risk ranking fresh with live AIS and alerting	Persistent operations mode (minutes-level cadence)	Near-real-time watchlist updates and threshold-based alerts
Historical Backtesting Validation	Validate ranking quality on historical evidence-backed windows	Before model/weight changes, pre-release, governance reviews	Reproducible metrics + threshold recommendations
Review-Feedback Evaluation	Learn from human review outcomes and detect quality drift	Weekly/monthly quality cycle, after enough reviews accumulate	Tier-aware and ops-aware quality report with regression checks
Public Data Integration Batch	End-to-end known-case coverage check across regions	Post-merge main branch validation and major-change dry runs	Public-overlap integration report and known-case floor check
Demo / Smoke Pipeline	Fast environment and UI verification without full ingestion	Demos, incident triage, local sanity checks	Deterministic watchlist for quick dashboard and flow validation

1) Full Screening Pipeline (9-step)

Purpose

Run the complete MPOL screening flow from ingestion to dashboard-ready watchlist.

When To Run

New environment bring-up
Region switch (Singapore/Japan/Middle East/Europe/Gulf)
Scheduled refresh run (for non-streaming operations)
Before analyst shift when starting from stale data

Main Inputs

Region preset
Optional live AIS streaming duration
Optional historical backfill (Marine Cadastre where applicable)
Optional geopolitical corridor filter

Expected Outputs

Region DuckDB (data/processed/<region>.duckdb)
Ranked watchlist (data/processed/candidate_watchlist.parquet)
Scoring artifacts (composite_scores.parquet, <region>_causal_effects.parquet)
Validation metrics (validation_metrics.json)

Operational Success Criteria

Non-empty watchlist with confidence ordering
Dashboard loads candidates and filters correctly
No failed step in the 9-step run log

2) Continuous Monitoring Pipeline (Streaming)

Purpose

Maintain continuously updated risk ranking and operational alerts from live AIS.

When To Run

During active monitoring windows
For ports/chokepoints requiring minutes-level visibility
In always-on operations with periodic re-scoring

Main Inputs

Active AIS stream
Region/bbox configuration
Alert threshold policy (for confidence crossing)

Expected Outputs

AIS micro-batches appended to DuckDB
Re-scored candidates at operational cadence
Alert events (SSE/webhook depending on deployment)

Operational Success Criteria

Data freshness within target polling interval
Alerts fire on threshold crossings without excessive lag
Re-scoring job remains stable over long runtime

3) Historical Backtesting Validation Pipeline

Purpose

Measure ranking quality using historical windows and evidence-backed labels.

When To Run

Before promoting scoring/model changes
Before publishing performance claims
At recurring quality review checkpoints

Main Inputs

Evaluation manifest (versioned windows)
Window watchlist snapshots
Labels CSV with source traceability and confidence

Expected Outputs

Backtest report (data/processed/backtest_report.json)
Window metrics and cross-window summary
Capacity-aware threshold hints (ops_thresholds)

Operational Success Criteria

Metrics generated reproducibly for the same manifest
Threshold recommendations are backed by labeled support
No regression beyond agreed governance tolerance

4) Review-Feedback Evaluation Pipeline

Purpose

Close the human-in-the-loop learning loop using latest vessel_reviews outcomes.

When To Run

Weekly threshold refresh
Monthly governance review
After major review volume increase or process change

Main Inputs

Latest review snapshot (optionally frozen by as_of_utc)
Region watchlists
Optional prior report baseline for drift comparison

Expected Outputs

Review feedback report (data/processed/review_feedback_evaluation.json)
Tier-aware mix and operations-aware hit-rate summaries
Region/capacity threshold recommendations
Drift/regression pass/fail checks

Operational Success Criteria

Snapshot reproducibility (same as_of_utc yields same result)
Regression checks clearly indicate pass/fail by region
Report is actionable for threshold update decisions

5) Public Data Integration Batch Pipeline

Purpose

Run a medium-scale end-to-end verification using practical public positive-label sources.

When To Run

Automatically after merge to main
Manually before high-risk release or major data-path change

Main Inputs

Public sanctions snapshot DB
Multi-region pipeline outputs
Known-case thresholds (min_known_cases, max_known_cases)

Expected Outputs

Integration manifest/report/summary artifacts
Region-specific evaluation label files
Known-case floor pass/fail result

Operational Success Criteria

Public data refresh and ingestion complete
Known-case floor is met (when strict mode enabled)
Found-vs-missed coverage remains within acceptable bounds

6) Delayed-Label Intelligence (Backtracking) Pipeline

Purpose

Convert newly confirmed vessel labels into forward-looking detection power without requiring a full model retrain:

Causal rewind — retroactively scans trailing 12 months of AIS data per confirmed vessel and surfaces precursor signals (AIS gap uplift, STS proxy, low-SOG fraction) that appeared before confirmation.
Label propagation — traverses the ownership/STS graph from confirmed MMSIs to identify and risk-uplift related entities (shared owner, shared manager, STS contact).

When To Run

After any new confirmed label is ingested via the review panel
Weekly sweep to catch batch-confirmed outcomes
Incremental mode (--since) after each shift's review session

Main Inputs

vessel_reviews table (confirmed-tier entries)
Lance Graph ownership/STS datasets
ais_positions table (trailing 12-month window per vessel)

Expected Outputs

data/processed/backtracking_report.json — full structured report
data/processed/backtracking_report.md — human-readable summary with precursor signal table and propagated entity list
regression_checks.pass field (True = all confirmed vessels successfully rewound)

Operational Success Criteria

regression_checks.pass is true in every run
At least one precursor signal detected for vessels with sufficient AIS history
Propagated entities are traceable to a specific confirmed seed via source_mmsi + evidence_type

See Backtracking Runbook for full CLI reference and demo scenario.

7) Demo / Smoke Pipeline

Purpose

Provide fast deterministic validation of dashboard and operator flow without full data processing.

When To Run

Demo preparation
Rapid post-deploy sanity checks
Environment troubleshooting

Main Inputs

Bundled demo watchlist fixture

Expected Outputs

Processed watchlist replaced with deterministic demo data
Dashboard map/table interaction becomes immediately testable

Operational Success Criteria

UI loads non-empty candidate list quickly
Core interaction paths work (filter, detail, review actions)

Suggested Operations Cadence

Cadence	Pipeline	Goal	Trigger Type	Triggered By (if Event)
Continuous	Continuous Monitoring	Live situational awareness	Scheduled (always-on service loop)	N/A
Daily or per watch	Full Screening (if non-streaming mode)	Fresh candidate ranking	Scheduled or Event-triggered	Duty operations officer / shift lead
Weekly	Review-Feedback Evaluation	Threshold tuning and drift control	Scheduled	N/A
After each confirmed label	Delayed-Label Intelligence (Backtracking)	Precursor discovery + graph uplift	Event-triggered	Analyst submitting confirmed review / weekly sweep
Pre-release	Historical Backtesting + Public Integration Batch	Quality gate before change promotion	Event-triggered	Release owner / CI pipeline on release candidate
On-demand	Demo/Smoke	Fast verification and incident checks	Event-triggered	Analyst / operator / incident commander

Decision Guide

Need fresh candidate generation from raw data: run Full Screening.
Need live alerting and near-real-time updates: run Continuous Monitoring.
Need evidence-backed quality measurement on historical windows: run Historical Backtesting.
Need to tune thresholds from analyst outcomes and check drift: run Review-Feedback Evaluation.
Need to convert a new confirmed label into precursor insights and graph uplift: run Backtracking.
Need broad post-merge safety check on practical known positives: run Public Data Integration Batch.
Need a quick UI/environment confidence check: run Demo/Smoke.