243 lines (176 loc) · 7.58 KB

HUD + Autonomy Improvement Plan

Objective

Strengthen autonomous mapping and digital twin operations by upgrading HUD decision support, map intelligence, and closed-loop course-correction visibility using the existing frontend, C2 HUD, and backend telemetry stack.

Current Strengths

Multi-view frontend already supports primary HUD and C2 HUD modes.
SSE ops-event stream already exists for live updates.
C2 map already renders nodes, links, and risk zones.
Backend exposes health, metrics, trust, training, and swarm endpoints.
New autonomy core package provides contracts, map/twin stores, planner logic, and readiness checks.

Priority Improvements

P0: Mission-Critical HUD Upgrades (Immediate)

Add an Autonomy Control Strip in HUD:

Live twin lag and freshness badge.
Coverage confidence badge.
Course-correction status badge (stable, warning, rerouting).
Planner safety state (policy pass/fail).

Add C2 map intelligence overlays:

Coverage heatmap layer.
Confidence decay layer.
Predicted trajectory polylines for top N nodes.
Replan trigger markers when risk thresholds are crossed.

Add recommendation panel:

Top 3 safe corrective actions with rationale.
Action confidence and expected mission gain.
Guardrail reason when actions are blocked.

Add event correlation timeline:

Correlate ops events with map shifts, policy changes, and training rounds.
Show cause/effect chain for each correction.

P1: Operational Accuracy and Explainability

Add explainability cards:

Why an action was chosen.
Which signals influenced decision score.
Which policy constraints filtered alternatives.

Add sensor quality matrix widget:

Per-source confidence, packet freshness, and anomaly score.
Auto-failover indicators.

Add digital twin scenario runner:

Preview expected state for 30s, 60s, 120s horizons.
Compare candidate action outcomes side-by-side.

Add compute allocation HUD panel:

Edge/node/backend execution placement.
Accelerator use (GPU/NPU) and fallback mode status.

P2: Hardening and Decision Reliability

Add anomaly detection indicators:

Drift spikes.
Confidence cliffs.
Sensor disagreement alarms.

Add simulation confidence envelopes:

Render uncertainty bounds around predicted paths.

Add mission SLO scoreboard:

Twin lag SLO.
Correction success SLO.
Coverage SLO.
API/control latency SLO.

Add incident and rollback assist:

Recommended rollback actions when safety gates fail.
Last-known-good control profile restore button.

Proposed HUD Function Additions

In frontend/src/HUD.jsx

buildAutonomyKPIModel(input):

Compute and return normalized KPI model for twin lag, coverage confidence, risk, correction rate, and safety state.

derivePlannerSafetyState(trustStatus, opsHealth, policyDraft):

Return policy pass/fail state with reason labels.

deriveRecommendedCorrections(opsEvents, trainingStatus, mapSnapshot):

Return ranked corrections for display with confidence and mission gain estimates.

buildEventCorrelationTimeline(opsEvents, policyHistory, trends):

Return timeline objects linking event causes to observed outcomes.

computeSLOStatusBadges(metrics):

Return per-SLO badge color, label, and breach duration.

computeSensorQualityMatrix(opsHealth, mapTelemetry):

Return source-level confidence, freshness, anomaly scores.

In frontend/src/C2SwarmHUD.jsx

buildCoverageHeatmap(nodes, coverageCells):

Convert map data into normalized overlay cells for rendering.

buildConfidenceDecayOverlay(mapState, now):

Generate visual fade state from map confidence decay.

buildPredictedTrajectories(nodes, predictionPayload):

Return polyline points and uncertainty widths per node.

detectReplanTriggers(nodes, riskZones, policyState):

Return trigger markers and reason labels for map annotations.

summarizeCommandImpact(commandLog, mapState):

Compute quick post-command impact summary.

deriveOperatorAssistCards(status, mapState, auditLog):

Return top assist recommendations and warnings.

Backend/API Enhancements Needed for HUD

Extend map payload endpoint with:

map_version
mean_confidence
stale_cell_count
replan_trigger_count
predicted_paths[]
confidence_envelopes[]

Add twin summary endpoint:

/autonomy/twin/summary
per-entity lag, confidence, risk, and prediction error.

Add planner insight endpoint:

/autonomy/planner/insights
candidate actions, rejected reasons, selected action score.

Add sensor quality endpoint:

/autonomy/sensors/quality
health, freshness, anomaly and failover state.

Add SLO endpoint:

/autonomy/slo/status
target, current, breach state, and burn rate.

Execution Plan (6 Weeks)

Week 1: HUD KPI Foundation

Implement buildAutonomyKPIModel and computeSLOStatusBadges.
Add Autonomy Control Strip to primary HUD.
Wire data from existing health/trend/trust payloads.
Add unit tests for KPI normalization and badge thresholds.

Week 2: C2 Map Overlay Upgrade

Implement coverage and confidence overlays.
Add predicted trajectory rendering primitives.
Add replan trigger annotations and tooltip reasons.
Add visual regression snapshots for map layer rendering.

Week 3: Recommendation and Explainability Layer

Implement deriveRecommendedCorrections.
Implement planner safety and explainability cards.
Add event correlation timeline model and UI.
Add contract tests for recommendation ranking logic.

Week 4: Backend Autonomy Endpoints

Add twin summary endpoint.
Add planner insight endpoint.
Add sensor quality endpoint.
Add SLO status endpoint.
Add API auth and role checks consistent with existing policy enforcement.

Week 5: Reliability and Performance

Add anomaly indicators and uncertainty envelopes.
Add client-side caching and stale-data fallbacks.
Add rate-safe polling fallback when SSE degrades.
Add performance budgets for HUD render and fetch cycles.

Week 6: Hardening and Rollout

Add end-to-end scenario tests for correction loops.
Run chaos tests for endpoint degradation and sensor dropouts.
Add operator runbook and rollback controls.
Roll out behind feature flag with staged enablement.

Metrics to Track

twin_lag_ms_p95
correction_success_rate
correction_reject_rate_policy
map_mean_confidence
stale_cell_ratio
hud_data_freshness_seconds
recommendation_acceptance_rate
operator_override_rate

Testing Strategy

Unit tests:

KPI builders, ranking logic, trigger detection, confidence overlays.

Integration tests:

Endpoint contracts for autonomy summary and planner insights.

E2E tests:

C2 command submit -> planner decision -> map update -> HUD recommendation verification.

Failure tests:

SSE disconnect, stale telemetry, missing risk data, policy denial states.

Rollout Strategy

Feature flags:

HUD_AUTONOMY_STRIP
HUD_C2_ADVANCED_OVERLAYS
HUD_RECOMMENDATION_ASSIST

Canary stages:

Stage 1: internal operators only.
Stage 2: selected mission profiles.
Stage 3: full rollout with override fallback.

Exit criteria:

No critical UI regressions.
SLOs stable for 7 consecutive days.
Operator override rate trending down.

Suggested First Implementation Tickets

HUD-101: Add buildAutonomyKPIModel + control strip.
HUD-102: Add SLO badges and breach timers.
C2-201: Coverage heatmap + confidence decay overlay.
C2-202: Predicted path rendering + uncertainty envelope.
API-301: /autonomy/twin/summary endpoint.
API-302: /autonomy/planner/insights endpoint.
API-303: /autonomy/sensors/quality endpoint.
QA-401: E2E correction loop scenario suite.