Skip to content

Commit 60b357e

Browse files
fix(dashboards): populate empty dashboard panels + tpm-metrics service + dep fixes
- Build out Network Performance Health dashboard (8 panels: stat + timeseries) covering active nodes, FL rounds, accuracy, loss, round duration, round rate - Build out Consensus Trust Monitoring dashboard (9 panels: stat + timeseries) covering tpm trust chain, cert counts, node trust scores, message signing/ verification rates, failure rates, latency, and cert expiry - Copy both dashboards into grafana/provisioning/dashboards/ so they are auto-provisioned in all docker-compose environments - docker-compose.full.yml: add tpm-metrics exporter service (port 9091) with health check, tpm-certs named volume, and dependency on backend - requirements.txt + requirements-backend.txt: pin flwr==1.7.0 (stable release), numpy==1.26.4 (compatibility with scipy/sklearn) - sovereignmap_production_backend_v2.py: run Flask metrics in daemon thread so Flower stays on main thread (required for signal handler registration)
1 parent 9e279c9 commit 60b357e

8 files changed

Lines changed: 2244 additions & 25 deletions

docker-compose.full.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,34 @@ services:
156156
max-size: "10m"
157157
max-file: "3"
158158

159+
tpm-metrics:
160+
image: python:3.11-slim
161+
container_name: sovereign-tpm-metrics
162+
volumes:
163+
- ./tpm_metrics_exporter.py:/app/tpm_metrics_exporter.py:ro
164+
- ./tpm_cert_manager.py:/app/tpm_cert_manager.py:ro
165+
- tpm-certs:/etc/sovereign/certs
166+
working_dir: /app
167+
command: ["sh", "-c", "python -m pip install --no-cache-dir prometheus-client flask cryptography && python tpm_metrics_exporter.py"]
168+
ports:
169+
- "9091:9091"
170+
networks:
171+
- sovereign-network
172+
depends_on:
173+
- backend
174+
healthcheck:
175+
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:9091/health', timeout=3)"]
176+
interval: 10s
177+
timeout: 5s
178+
retries: 3
179+
start_period: 20s
180+
restart: unless-stopped
181+
logging:
182+
driver: "json-file"
183+
options:
184+
max-size: "10m"
185+
max-file: "3"
186+
159187
grafana:
160188
image: grafana/grafana:10.2.3
161189
container_name: sovereign-grafana
@@ -217,6 +245,8 @@ volumes:
217245
driver: local
218246
alertmanager-data:
219247
driver: local
248+
tpm-certs:
249+
driver: local
220250

221251
# ========================================================================
222252
# NETWORKS

0 commit comments

Comments
 (0)