Pipeline Catalog (Operations View)
This document is an operations-first map of the pipelines in arktrace. It focuses on four questions:
- What pipeline type exists?
- Why does it exist?
- When should it run?
- What result should operators expect?
Scope Clarification: Backend Jobs vs Web UI Operations
The pipeline types in this document refer to backend execution jobs (batch, scheduled, or event-triggered runs), not manual button-by-button work in the web dashboard.
- Backend jobs:
- Run by CI/CD, schedulers (cron/orchestrator), long-running services, or operator-triggered CLI/API calls.
- Produce and refresh data artifacts such as DuckDB tables, watchlists, scores, and evaluation reports.
- Web UI operations:
- Human review and investigation workflow in the dashboard (filter, inspect vessel detail, submit tier/handoff decisions).
- Consume pipeline outputs and write review feedback, but do not replace core ingestion/scoring/evaluation pipeline execution.
For implementation-level commands and flags, see Pipeline Operations.
Pipeline Types At A Glance
| Pipeline Type | Primary Purpose | Typical Run Timing | Expected Result |
|---|---|---|---|
| Full Screening (9-step) | Generate ranked shadow-fleet candidates from raw/public data | Initial setup, regional refresh, periodic baseline run | Updated watchlist and supporting artifacts for analyst triage |
| Continuous Monitoring (Streaming) | Keep vessel risk ranking fresh with live AIS and alerting | Persistent operations mode (minutes-level cadence) | Near-real-time watchlist updates and threshold-based alerts |
| Historical Backtesting Validation | Validate ranking quality on historical evidence-backed windows | Before model/weight changes, pre-release, governance reviews | Reproducible metrics + threshold recommendations |
| Review-Feedback Evaluation | Learn from human review outcomes and detect quality drift | Weekly/monthly quality cycle, after enough reviews accumulate | Tier-aware and ops-aware quality report with regression checks |
| Public Data Integration Batch | End-to-end known-case coverage check across regions | Post-merge main branch validation and major-change dry runs | Public-overlap integration report and known-case floor check |
| Demo / Smoke Pipeline | Fast environment and UI verification without full ingestion | Demos, incident triage, local sanity checks | Deterministic watchlist for quick dashboard and flow validation |
1) Full Screening Pipeline (9-step)
Purpose
Run the complete MPOL screening flow from ingestion to dashboard-ready watchlist.
When To Run
- New environment bring-up
- Region switch (Singapore/Japan/Middle East/Europe/Gulf)
- Scheduled refresh run (for non-streaming operations)
- Before analyst shift when starting from stale data
Main Inputs
- Region preset
- Optional live AIS streaming duration
- Optional historical backfill (Marine Cadastre where applicable)
- Optional geopolitical corridor filter
Expected Outputs
- Region DuckDB (
data/processed/<region>.duckdb) - Ranked watchlist (
data/processed/candidate_watchlist.parquet) - Scoring artifacts (
composite_scores.parquet,<region>_causal_effects.parquet) - Validation metrics (
validation_metrics.json)
Operational Success Criteria
- Non-empty watchlist with confidence ordering
- Dashboard loads candidates and filters correctly
- No failed step in the 9-step run log
2) Continuous Monitoring Pipeline (Streaming)
Purpose
Maintain continuously updated risk ranking and operational alerts from live AIS.
When To Run
- During active monitoring windows
- For ports/chokepoints requiring minutes-level visibility
- In always-on operations with periodic re-scoring
Main Inputs
- Active AIS stream
- Region/bbox configuration
- Alert threshold policy (for confidence crossing)
Expected Outputs
- AIS micro-batches appended to DuckDB
- Re-scored candidates at operational cadence
- Alert events (SSE/webhook depending on deployment)
Operational Success Criteria
- Data freshness within target polling interval
- Alerts fire on threshold crossings without excessive lag
- Re-scoring job remains stable over long runtime
3) Historical Backtesting Validation Pipeline
Purpose
Measure ranking quality using historical windows and evidence-backed labels.
When To Run
- Before promoting scoring/model changes
- Before publishing performance claims
- At recurring quality review checkpoints
Main Inputs
- Evaluation manifest (versioned windows)
- Window watchlist snapshots
- Labels CSV with source traceability and confidence
Expected Outputs
- Backtest report (
data/processed/backtest_report.json) - Window metrics and cross-window summary
- Capacity-aware threshold hints (
ops_thresholds)
Operational Success Criteria
- Metrics generated reproducibly for the same manifest
- Threshold recommendations are backed by labeled support
- No regression beyond agreed governance tolerance
4) Review-Feedback Evaluation Pipeline
Purpose
Close the human-in-the-loop learning loop using latest vessel_reviews outcomes.
When To Run
- Weekly threshold refresh
- Monthly governance review
- After major review volume increase or process change
Main Inputs
- Latest review snapshot (optionally frozen by
as_of_utc) - Region watchlists
- Optional prior report baseline for drift comparison
Expected Outputs
- Review feedback report (
data/processed/review_feedback_evaluation.json) - Tier-aware mix and operations-aware hit-rate summaries
- Region/capacity threshold recommendations
- Drift/regression pass/fail checks
Operational Success Criteria
- Snapshot reproducibility (same
as_of_utcyields same result) - Regression checks clearly indicate pass/fail by region
- Report is actionable for threshold update decisions
5) Public Data Integration Batch Pipeline
Purpose
Run a medium-scale end-to-end verification using practical public positive-label sources.
When To Run
- Automatically after merge to
main - Manually before high-risk release or major data-path change
Main Inputs
- Public sanctions snapshot DB
- Multi-region pipeline outputs
- Known-case thresholds (
min_known_cases,max_known_cases)
Expected Outputs
- Integration manifest/report/summary artifacts
- Region-specific evaluation label files
- Known-case floor pass/fail result
Operational Success Criteria
- Public data refresh and ingestion complete
- Known-case floor is met (when strict mode enabled)
- Found-vs-missed coverage remains within acceptable bounds
6) Delayed-Label Intelligence (Backtracking) Pipeline
Purpose
Convert newly confirmed vessel labels into forward-looking detection power without requiring a full model retrain:
- Causal rewind — retroactively scans trailing 12 months of AIS data per confirmed vessel and surfaces precursor signals (AIS gap uplift, STS proxy, low-SOG fraction) that appeared before confirmation.
- Label propagation — traverses the ownership/STS graph from confirmed MMSIs to identify and risk-uplift related entities (shared owner, shared manager, STS contact).
When To Run
- After any new
confirmedlabel is ingested via the review panel - Weekly sweep to catch batch-confirmed outcomes
- Incremental mode (
--since) after each shift's review session
Main Inputs
vessel_reviewstable (confirmed-tier entries)- Lance Graph ownership/STS datasets
ais_positionstable (trailing 12-month window per vessel)
Expected Outputs
data/processed/backtracking_report.json— full structured reportdata/processed/backtracking_report.md— human-readable summary with precursor signal table and propagated entity listregression_checks.passfield (True = all confirmed vessels successfully rewound)
Operational Success Criteria
regression_checks.passistruein every run- At least one precursor signal detected for vessels with sufficient AIS history
- Propagated entities are traceable to a specific confirmed seed via
source_mmsi+evidence_type
See Backtracking Runbook for full CLI reference and demo scenario.
7) Demo / Smoke Pipeline
Purpose
Provide fast deterministic validation of dashboard and operator flow without full data processing.
When To Run
- Demo preparation
- Rapid post-deploy sanity checks
- Environment troubleshooting
Main Inputs
- Bundled demo watchlist fixture
Expected Outputs
- Processed watchlist replaced with deterministic demo data
- Dashboard map/table interaction becomes immediately testable
Operational Success Criteria
- UI loads non-empty candidate list quickly
- Core interaction paths work (filter, detail, review actions)
Suggested Operations Cadence
| Cadence | Pipeline | Goal | Trigger Type | Triggered By (if Event) |
|---|---|---|---|---|
| Continuous | Continuous Monitoring | Live situational awareness | Scheduled (always-on service loop) | N/A |
| Daily or per watch | Full Screening (if non-streaming mode) | Fresh candidate ranking | Scheduled or Event-triggered | Duty operations officer / shift lead |
| Weekly | Review-Feedback Evaluation | Threshold tuning and drift control | Scheduled | N/A |
| After each confirmed label | Delayed-Label Intelligence (Backtracking) | Precursor discovery + graph uplift | Event-triggered | Analyst submitting confirmed review / weekly sweep |
| Pre-release | Historical Backtesting + Public Integration Batch | Quality gate before change promotion | Event-triggered | Release owner / CI pipeline on release candidate |
| On-demand | Demo/Smoke | Fast verification and incident checks | Event-triggered | Analyst / operator / incident commander |
Decision Guide
- Need fresh candidate generation from raw data: run Full Screening.
- Need live alerting and near-real-time updates: run Continuous Monitoring.
- Need evidence-backed quality measurement on historical windows: run Historical Backtesting.
- Need to tune thresholds from analyst outcomes and check drift: run Review-Feedback Evaluation.
- Need to convert a new confirmed label into precursor insights and graph uplift: run Backtracking.
- Need broad post-merge safety check on practical known positives: run Public Data Integration Batch.
- Need a quick UI/environment confidence check: run Demo/Smoke.