v22.7.1: Predictive Inbox — multi-cluster fixes + Run buttons + clearer cards

Paul C · Paul C · commit 99cd14e7ed3d · 2026-05-01T19:22:04.000+01:00
Three fixes/improvements on the v22.7.0 Predictive Inbox surface,
all driven by feedback from operators running multi-cluster setups.

1. Multi-cluster filter no longer empties the list
   The cluster aggregator was indexing nodes by `peer.id` (the
   locally-assigned cluster key) but each peer's predictive
   orchestrator stamps proposals with `state.node_id` (the peer's
   self_id from /etc/wolfstack/node_id). The frontend's
   `clusterByNode[scope.node_id]` lookup therefore failed for every
   peer's proposal, defaulted to "WolfStack", and the cluster filter
   excluded everything when the operator narrowed to anything else.

   Fix: build_cluster_response now uses `peer.self_id` (with
   `peer.id` as fallback when self_id hasn't been reported yet) as
   the canonical NodeAggregateStatus.node_id, matching the same id
   the proposals already carry.

2. Card clarity — every finding shows runtime + server + resource
   prominently. New `predictiveRuntimeBadge()` maps each finding_type
   to a coloured runtime badge (DOCKER 🐳, LXC 📦, VM 💻, CERT 🔒,
   BACKUP 💾, SSH 🔑, NETWORK 🌐, SERVICE ⚙️, HOST 🖥️). Card header now
   reads cluster · 🖥️ hostname · `resource_id`, so a Docker
   restart-loop finding shows up as DOCKER badge + the host's
   hostname + the container name without having to read the title.

3. Run buttons that drop the operator straight into a terminal
   - New endpoint: `GET /api/proposals/{id}/command/{idx}` returns
     the analyzer's recommended command + resolved console target
     (type+name) + remote node_id when the finding is on a peer.
     Auth-gated; the command comes from the proposal store on disk,
     never from URL parameters, so there's no shell-injection
     surface from a crafted link.
   - New ▶ Run button per command, alongside the existing 📋 Copy.
     Clicking opens `console.html` connected to the right place:
       * `docker:&lt;name&gt;` → docker exec into the container
       * `lxc:&lt;name&gt;` → lxc-attach
       * `vm:&lt;name&gt;:...` → VM serial console (qm/virsh/socat)
       * everything else (host mounts, certs, backups, sshd) → host
         shell on the node the finding lives on
     Cross-node findings open via the existing remote-console proxy
     (`?node_id=...`), so operators on cluster Node A can still run
     commands surfacing from Node B without first navigating there.
   - Comment lines (commands starting with `#`) only get Copy, not
     Run — running a comment is silly.
   - The console reuses the existing AI-action-id stuffing pattern:
     fetch the command after WS connects, write it to the PTY's
     stdin, leaving the operator at the interactive prompt for
     follow-up.

Tests still 157 passing. Code-reviewer reviews from the v22.7.0
deltas remain in effect.
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "wolfstack"
-version = "22.7.0"
+version = "22.7.1"
 edition = "2024"
 authors = ["Wolf Software Systems Ltd"]
 description = "Server management platform for the Wolf software suite"
diff --git a/src/api/mod.rs b/src/api/mod.rs
@@ -17323,23 +17323,32 @@ async fn build_cluster_response(
     // collect the JoinHandles then await them with `tokio::join!` /
     // `futures::join_all`-style sequence. Using FuturesUnordered
     // would be tidier; for a small N this is fine.
-    // Keep `(id, hostname, cluster)` alongside each JoinHandle so
-    // that even if a task panics we still know which peer it was
-    // and which cluster it belongs to. Without these, a panicked
-    // task's `JoinError` carries no peer-level attribution and
-    // operators get a useless "<panicked>" entry in the warning
-    // banner with no cluster context.
+    // Each handle carries (node_id, hostname, cluster) so panics
+    // still identify the peer. The `node_id` here MUST match what
+    // the peer's own predictive orchestrator stamps onto
+    // `Proposal.scope.node_id` — otherwise the cluster filter in
+    // the Inbox UI maps the proposal to the wrong cluster and the
+    // list goes empty when the operator picks a non-default
+    // cluster.
+    //
+    // Each peer's orchestrator uses the local `state.node_id`,
+    // which is the value of `/etc/wolfstack/node_id` (the peer's
+    // *self_id*). Locally we know that as `peer.self_id` (Option),
+    // populated when the peer first reports. Falls back to
+    // `peer.id` (the locally-assigned cluster key) for peers that
+    // haven't reported a self_id yet.
     let secret = state.cluster_secret.clone();
     let mut handles: Vec<(String, String, String, tokio::task::JoinHandle<Result<Vec<crate::predictive::Proposal>, String>>)> =
         Vec::with_capacity(peers.len());
     for peer in &peers {
         let urls = build_node_urls(&peer.address, peer.port, "/api/proposals");
         let secret = secret.clone();
         let cluster = resolve_cluster_label(peer);
+        let canonical_id = peer.self_id.clone().unwrap_or_else(|| peer.id.clone());
         let handle = tokio::spawn(async move {
             fetch_peer_proposals(&urls, &secret).await
         });
-        handles.push((peer.id.clone(), peer.hostname.clone(), cluster, handle));
+        handles.push((canonical_id, peer.hostname.clone(), cluster, handle));
     }
 
     let mut all_proposals = self_proposals;
@@ -17589,6 +17598,100 @@ pub async fn predictive_proposal_approve(
     }
 }
 
+/// GET /api/proposals/{id}/command/{idx} — return one of a
+/// proposal's `Manual` remediation commands plus the resolved
+/// console target (type + name, plus remote `node_id` when the
+/// finding lives on a peer). Used by the Inbox UI's Run button:
+/// the dashboard opens `console.html` with these values, the page
+/// fetches the command via this endpoint, and writes it to the
+/// PTY's stdin so the operator drops into an interactive session
+/// with the suggested command pre-run.
+///
+/// Auth-gated. The command itself comes from the analyzer's
+/// proposal store on disk (never user-supplied), so URL injection
+/// can't smuggle in arbitrary shell — the command index is bounded
+/// by the proposal's recorded list.
+pub async fn predictive_proposal_command(
+    req: HttpRequest,
+    state: web::Data<AppState>,
+    path: web::Path<(String, usize)>,
+) -> HttpResponse {
+    if let Err(resp) = require_auth(&req, &state) { return resp; }
+    let (id, idx) = path.into_inner();
+
+    let store = match state.predictive_proposals.read() {
+        Ok(g) => g,
+        Err(e) => e.into_inner(),
+    };
+    let Some(p) = store.get(&id) else {
+        return HttpResponse::NotFound().json(serde_json::json!({
+            "error": "proposal not found",
+        }));
+    };
+    let cmds = match &p.remediation {
+        crate::predictive::RemediationPlan::Manual { commands, .. } => commands,
+        _ => return HttpResponse::BadRequest().json(serde_json::json!({
+            "error": "proposal has no Manual remediation commands",
+        })),
+    };
+    let Some(command) = cmds.get(idx) else {
+        return HttpResponse::NotFound().json(serde_json::json!({
+            "error": "command index out of range",
+        }));
+    };
+
+    // Resolve the console target from scope.resource_id. The format
+    // mirrors the resource-id conventions each analyzer uses, so a
+    // finding on `docker:postgres` opens a docker exec session into
+    // postgres; `lxc:web` → lxc-attach; `vm:opnsense:...` → VM
+    // serial console; everything else → host shell on the node the
+    // finding lives on.
+    let (console_type, console_name) = resolve_console_target(p);
+
+    // Cross-node: if the finding is on a peer (scope.node_id !=
+    // this server's node_id), surface the peer's locally-assigned
+    // cluster-key id so console.html can use the existing
+    // remote-console proxy. Console.html accepts both peer.id and
+    // peer.self_id via `cluster.get_node`'s fallback scan.
+    let remote_node_id: Option<String> = if p.scope.node_id != state.node_id {
+        Some(p.scope.node_id.clone())
+    } else {
+        None
+    };
+
+    HttpResponse::Ok().json(serde_json::json!({
+        "command": command,
+        "console_type": console_type,
+        "console_name": console_name,
+        "remote_node_id": remote_node_id,
+        "title": p.title,
+    }))
+}
+
+/// Map a proposal's `scope.resource_id` to a `(console_type,
+/// console_name)` pair compatible with the existing `/ws/console/
+/// {type}/{name}` route + `console.html?type=...&name=...` URL.
+/// Conventions match what each analyzer writes into resource_id.
+fn resolve_console_target(p: &crate::predictive::Proposal) -> (String, String) {
+    let rid = p.scope.resource_id.as_deref().unwrap_or("");
+    if let Some(name) = rid.strip_prefix("docker:") {
+        return ("docker".into(), name.to_string());
+    }
+    if let Some(name) = rid.strip_prefix("lxc:") {
+        return ("lxc".into(), name.to_string());
+    }
+    if let Some(rest) = rid.strip_prefix("vm:") {
+        // vm scope is `vm:<name>:<disk_path>` — take the name only.
+        let name = rest.split(':').next().unwrap_or(rest);
+        return ("vm".into(), name.to_string());
+    }
+    // host shell for everything else: filesystem mounts (`/var/log`),
+    // certs (`letsencrypt:...`, `wolfstack-tls:...`), backup
+    // schedules (`backup:<id>`), node-level findings (`host`,
+    // `sshd`), and per-unit findings (`my-unit.service`).
+    ("host".into(), "host".into())
+}
+
 /// POST /api/proposals/run-now — synchronously run one orchestrator
 /// tick instead of waiting for the next 5-min cadence. Useful right
 /// after the operator clears a finding to refresh the inbox.
@@ -22053,6 +22156,7 @@ pub fn configure(cfg: &mut web::ServiceConfig) {
         .route("/api/proposals/history", web::get().to(predictive_proposals_history))
         .route("/api/proposals/run-now", web::post().to(predictive_proposals_run_now))
         .route("/api/proposals/{id}", web::get().to(predictive_proposal_get))
+        .route("/api/proposals/{id}/command/{idx}", web::get().to(predictive_proposal_command))
         .route("/api/proposals/{id}/snooze", web::post().to(predictive_proposal_snooze))
         .route("/api/proposals/{id}/dismiss", web::post().to(predictive_proposal_dismiss))
         .route("/api/proposals/{id}/approve", web::post().to(predictive_proposal_approve))
diff --git a/web/console.html b/web/console.html
@@ -193,6 +193,32 @@ <h1 id="title">💻 WolfStack Console</h1>
                             term.writeln('\x1b[31m⚠ Failed to load AI action: ' + e.message + '\x1b[0m\r\n');
                         });
                 }
+
+                // Same pattern for predictive Inbox "Run" button:
+                // ?proposal_id=<id>&cmd_idx=<n> fetches the proposal's
+                // remediation command from the server (auth-gated;
+                // command comes from the proposal store on disk, not
+                // from URL params, so there's no injection surface)
+                // and writes it to the PTY's stdin.
+                const proposalId = params.get('proposal_id');
+                const cmdIdx = params.get('cmd_idx');
+                if (proposalId && cmdIdx && /^[A-Za-z0-9\-]+$/.test(proposalId) && /^\d+$/.test(cmdIdx)) {
+                    fetch('/api/proposals/' + encodeURIComponent(proposalId) + '/command/' + encodeURIComponent(cmdIdx))
+                        .then(r => {
+                            if (!r.ok) throw new Error('Proposal command not found');
+                            return r.json();
+                        })
+                        .then(data => {
+                            if (data.command) {
+                                term.writeln('\x1b[35m🔮 Predictive Inbox: ' + (data.title || 'remediation') + '\x1b[0m');
+                                term.writeln('\x1b[36m$ ' + data.command + '\x1b[0m\r\n');
+                                setTimeout(() => { ws.send(data.command + '\n'); }, 800);
+                            }
+                        })
+                        .catch(e => {
+                            term.writeln('\x1b[31m⚠ Failed to load predictive command: ' + e.message + '\x1b[0m\r\n');
+                        });
+                }
             };
 
             ws.onmessage = (event) => {
diff --git a/web/js/app.js b/web/js/app.js