Human-in-the-Loop Triage Governance
This document defines the operating model for ranking-to-investigation handoff in Phase A screening.
Scope covered by this document:
- Tier taxonomy and minimum evidence requirements
- Evidence policy and over-claim safeguards
- Analyst/officer decision states and handoff lifecycle
- KPI specification for model quality and operations utility
Decision Responsibility Model
The model ranks candidates. Humans decide actions.
- Model responsibility:
- Produce ranked candidates with explainable signals.
-
Surface uncertainty and confidence boundaries.
-
Human responsibility:
- Assign a review tier based on available evidence.
- Decide escalation to physical investigation.
-
Record rationale and evidence references.
-
Governance rule:
- Never treat model score alone as legal or operational proof.
- Every non-cleared decision requires explicit evidence references.
Tier Taxonomy
Allowed review tiers:
- Confirmed
- Probable
- Suspect
- Cleared
- Inconclusive
Tier definitions and minimum evidence
- Confirmed
- Meaning: high-confidence determination with corroborated evidence.
-
Minimum evidence: at least 2 independent high-credibility sources OR 1 official designation source with vessel identifier match (MMSI/IMO).
-
Probable
- Meaning: strong signals and corroboration, but not enough for confirmed status.
-
Minimum evidence: at least 1 high-credibility source plus consistent model/graph behavior signals.
-
Suspect
- Meaning: early warning; potentially risky pattern with incomplete corroboration.
-
Minimum evidence: model-based anomaly pattern plus at least 1 weak/medium external indication.
-
Cleared
- Meaning: reviewed and not recommended for current escalation.
-
Minimum evidence: analyst rationale documenting why evidence is insufficient or contradictory.
-
Inconclusive
- Meaning: insufficient data quality or unresolved conflict among signals.
- Minimum evidence: explicit data gap reason (missing identifiers, stale data, conflicting sources).
Evidence Policy
Source credibility classes
- High: official sanctions/designation lists, government enforcement notices.
- Medium: reputable investigative reports with dated vessel identifiers.
- Weak: indirect references, partial identifiers, unverified reports.
Evidence recording requirements
For every reviewed vessel decision, record:
- decision_tier
- decision_timestamp_utc
- reviewer_id
- rationale text (why this tier)
- evidence references (source, URL, publication date)
- identifier basis (MMSI/IMO/name match confidence)
Over-claim safeguards
- Do not assign Confirmed from a single weak/medium source.
- Do not infer negative truth from absence of public evidence.
- Keep label confidence explicit (high/medium/weak/unknown).
- Re-review stale evidence on a fixed cadence.
Escalation and Handoff Lifecycle
Decision states:
- queued_review
- in_review
- handoff_recommended
- handoff_accepted
- handoff_completed
- closed
Recommended transition path:
- queued_review -> in_review
- in_review -> (confirmed|probable|suspect|cleared|inconclusive)
- if confirmed/probable and operationally relevant: -> handoff_recommended
- handoff_recommended -> handoff_accepted (officer)
- handoff_accepted -> handoff_completed (field outcome attached)
- -> closed
Transition constraints:
- handoff_recommended requires non-empty rationale + at least one evidence reference.
- handoff_completed requires outcome note and officer attribution.
KPI Specification
KPI objectives:
- Maintain high triage utility at fixed analyst review capacity.
- Reduce wasted investigations while preserving recall on credible positives.
Model quality KPIs
- Precision@K: K in {25, 50, 100}
- Recall@K: K in {100, 200}
- AUROC
- PR-AUC
- Calibration error (ECE)
Operations KPIs
- Hit-rate at top-N review queue
- Escalation acceptance rate (handoff_recommended -> handoff_accepted)
- Wasted investigation rate (handoff_completed with cleared outcome)
- Median review turnaround time
Tier-aware KPIs
- Tier distribution by region and vessel type
- Tier-specific FP/FN pattern summary from backtesting + reviewed outcomes
- Drift alerts when tier mix or hit-rate deviates from agreed bounds
Reporting Cadence
- Daily: operational dashboard snapshot (top-N, hit-rate, queue health)
- Weekly: threshold recommendation refresh by region/review capacity
- Monthly: tier-quality and drift review with mitigation actions
Traceability
Related tracking issues:
-
54 Parent workflow issue
-
55 Taxonomy/evidence/KPI definition
-
56 Schema and API persistence for review decisions
-
57 Dashboard workflow and handoff actions
-
58 Periodic evaluation and threshold tuning loop