|
| 1 | +# DBLab Prometheus Metrics |
| 2 | + |
| 3 | +DBLab Engine exports metrics in Prometheus format via the `/metrics` endpoint. This endpoint does not require authentication. |
| 4 | + |
| 5 | +## Endpoint |
| 6 | + |
| 7 | +``` |
| 8 | +GET http://<dblab-host>:<port>/metrics |
| 9 | +``` |
| 10 | + |
| 11 | +## Prometheus Configuration |
| 12 | + |
| 13 | +Add the following to your `prometheus.yml`: |
| 14 | + |
| 15 | +```yaml |
| 16 | +scrape_configs: |
| 17 | + - job_name: 'dblab' |
| 18 | + static_configs: |
| 19 | + - targets: ['<dblab-host>:2345'] |
| 20 | + scrape_interval: 30s |
| 21 | +``` |
| 22 | +
|
| 23 | +## Available Metrics |
| 24 | +
|
| 25 | +### Engine Metrics |
| 26 | +
|
| 27 | +| Metric | Type | Description | |
| 28 | +|--------|------|-------------| |
| 29 | +| `dblab_engine_info` | Gauge | Engine information with labels: version, edition, instance_id | |
| 30 | +| `dblab_engine_uptime_seconds` | Gauge | Time since Database Lab Engine started in seconds | |
| 31 | + |
| 32 | +### Retrieval Metrics |
| 33 | + |
| 34 | +| Metric | Type | Description | |
| 35 | +|--------|------|-------------| |
| 36 | +| `dblab_retrieval_mode` | Gauge | Current retrieval mode (1=physical, 2=logical, 0=unknown) | |
| 37 | +| `dblab_retrieval_status` | Gauge | Current retrieval status with label: status | |
| 38 | +| `dblab_retrieval_last_refresh_timestamp_seconds` | Gauge | Unix timestamp of last data refresh | |
| 39 | +| `dblab_retrieval_next_refresh_timestamp_seconds` | Gauge | Unix timestamp of next scheduled data refresh | |
| 40 | +| `dblab_retrieval_data_freshness_seconds` | Gauge | Time since last data refresh in seconds | |
| 41 | +| `dblab_retrieval_alerts_total` | Gauge | Number of retrieval alerts with labels: type, level | |
| 42 | + |
| 43 | +### Synchronization Metrics (Physical Mode) |
| 44 | + |
| 45 | +| Metric | Type | Description | |
| 46 | +|--------|------|-------------| |
| 47 | +| `dblab_sync_replication_lag_seconds` | Gauge | Replication lag in seconds | |
| 48 | +| `dblab_sync_replication_uptime_seconds` | Gauge | Replication uptime in seconds | |
| 49 | + |
| 50 | +### Pool Metrics |
| 51 | + |
| 52 | +| Metric | Type | Labels | Description | |
| 53 | +|--------|------|--------|-------------| |
| 54 | +| `dblab_pool_status` | Gauge | pool, mode | Pool status (1=active, 2=refreshing, 3=empty) | |
| 55 | +| `dblab_pool_data_state_at_timestamp_seconds` | Gauge | pool | Unix timestamp of the pool data state | |
| 56 | +| `dblab_pool_size_bytes` | Gauge | pool | Total pool size in bytes | |
| 57 | +| `dblab_pool_free_bytes` | Gauge | pool | Free space in pool in bytes | |
| 58 | +| `dblab_pool_used_bytes` | Gauge | pool | Used space in pool in bytes | |
| 59 | +| `dblab_pool_data_size_bytes` | Gauge | pool | Logical data size in bytes | |
| 60 | +| `dblab_pool_used_by_snapshots_bytes` | Gauge | pool | Space used by snapshots in bytes | |
| 61 | +| `dblab_pool_used_by_clones_bytes` | Gauge | pool | Space used by clones in bytes | |
| 62 | +| `dblab_pool_compress_ratio` | Gauge | pool | Compression ratio of the pool | |
| 63 | +| `dblab_pool_clones_total` | Gauge | pool | Number of clones in the pool | |
| 64 | + |
| 65 | +### Clone Metrics |
| 66 | + |
| 67 | +| Metric | Type | Labels | Description | |
| 68 | +|--------|------|--------|-------------| |
| 69 | +| `dblab_clones_total` | Gauge | - | Total number of clones | |
| 70 | +| `dblab_clones_by_status` | Gauge | status | Number of clones by status | |
| 71 | +| `dblab_clones_expected_cloning_time_seconds` | Gauge | - | Expected time to create a clone in seconds | |
| 72 | +| `dblab_clones_protected_total` | Gauge | - | Number of protected clones | |
| 73 | +| `dblab_clone_diff_size_bytes` | Gauge | clone_id, branch | Clone diff size in bytes | |
| 74 | +| `dblab_clone_logical_size_bytes` | Gauge | clone_id, branch | Clone logical size in bytes | |
| 75 | +| `dblab_clone_cloning_time_seconds` | Gauge | clone_id, branch | Time taken to create clone in seconds | |
| 76 | + |
| 77 | +### Snapshot Metrics |
| 78 | + |
| 79 | +| Metric | Type | Labels | Description | |
| 80 | +|--------|------|--------|-------------| |
| 81 | +| `dblab_snapshots_total` | Gauge | pool, branch, type | Total number of snapshots (type: auto/user) | |
| 82 | +| `dblab_snapshot_physical_size_bytes` | Gauge | snapshot_id, pool, branch | Snapshot physical size in bytes | |
| 83 | +| `dblab_snapshot_logical_size_bytes` | Gauge | snapshot_id, pool, branch | Snapshot logical size in bytes | |
| 84 | +| `dblab_snapshot_clone_count` | Gauge | snapshot_id, pool | Number of clones using this snapshot | |
| 85 | + |
| 86 | +### Branch Metrics |
| 87 | + |
| 88 | +| Metric | Type | Description | |
| 89 | +|--------|------|-------------| |
| 90 | +| `dblab_branches_total` | Gauge | Total number of branches in use | |
| 91 | + |
| 92 | +### Resource/Slot Metrics |
| 93 | + |
| 94 | +| Metric | Type | Description | |
| 95 | +|--------|------|-------------| |
| 96 | +| `dblab_slots_busy_total` | Gauge | Number of busy slots preventing full refresh in logical mode | |
| 97 | + |
| 98 | +## Example Grafana Queries |
| 99 | + |
| 100 | +### Monitor disk usage |
| 101 | + |
| 102 | +```promql |
| 103 | +100 - (dblab_pool_free_bytes / dblab_pool_size_bytes * 100) |
| 104 | +``` |
| 105 | + |
| 106 | +### Monitor replication lag (physical mode) |
| 107 | + |
| 108 | +```promql |
| 109 | +dblab_sync_replication_lag_seconds |
| 110 | +``` |
| 111 | + |
| 112 | +### Data freshness (logical mode) |
| 113 | + |
| 114 | +```promql |
| 115 | +dblab_retrieval_data_freshness_seconds / 3600 |
| 116 | +``` |
| 117 | + |
| 118 | +### Clone count over time |
| 119 | + |
| 120 | +```promql |
| 121 | +dblab_clones_total |
| 122 | +``` |
| 123 | + |
| 124 | +### Alert on high disk usage |
| 125 | + |
| 126 | +```promql |
| 127 | +(1 - dblab_pool_free_bytes / dblab_pool_size_bytes) > 0.85 |
| 128 | +``` |
| 129 | + |
| 130 | +### Alert on replication lag |
| 131 | + |
| 132 | +```promql |
| 133 | +dblab_sync_replication_lag_seconds > 300 |
| 134 | +``` |
| 135 | + |
| 136 | +## Sample Output |
| 137 | + |
| 138 | +``` |
| 139 | +# HELP dblab_engine_info Database Lab Engine information |
| 140 | +# TYPE dblab_engine_info gauge |
| 141 | +dblab_engine_info{edition="standard",instance_id="my-instance",version="3.5.0"} 1 |
| 142 | + |
| 143 | +# HELP dblab_engine_uptime_seconds Time since Database Lab Engine started in seconds |
| 144 | +# TYPE dblab_engine_uptime_seconds gauge |
| 145 | +dblab_engine_uptime_seconds 86400 |
| 146 | + |
| 147 | +# HELP dblab_retrieval_mode Current retrieval mode (1 for physical, 2 for logical, 0 for unknown) |
| 148 | +# TYPE dblab_retrieval_mode gauge |
| 149 | +dblab_retrieval_mode 1 |
| 150 | + |
| 151 | +# HELP dblab_sync_replication_lag_seconds Replication lag in seconds (physical mode) |
| 152 | +# TYPE dblab_sync_replication_lag_seconds gauge |
| 153 | +dblab_sync_replication_lag_seconds 5 |
| 154 | + |
| 155 | +# HELP dblab_pool_size_bytes Total pool size in bytes |
| 156 | +# TYPE dblab_pool_size_bytes gauge |
| 157 | +dblab_pool_size_bytes{pool="dblab_pool"} 107374182400 |
| 158 | + |
| 159 | +# HELP dblab_pool_free_bytes Free space in pool in bytes |
| 160 | +# TYPE dblab_pool_free_bytes gauge |
| 161 | +dblab_pool_free_bytes{pool="dblab_pool"} 53687091200 |
| 162 | + |
| 163 | +# HELP dblab_clones_total Total number of clones |
| 164 | +# TYPE dblab_clones_total gauge |
| 165 | +dblab_clones_total 3 |
| 166 | +``` |
0 commit comments