Skip to content

Commit b74bf55

Browse files
TheOneFromNorwaybogsi17
authored andcommitted
Create ADR detailing our choice of Grafana for monitoring
- Focuses on Acceptance criteria and overall ability of tool
1 parent f69f475 commit b74bf55

1 file changed

Lines changed: 99 additions & 0 deletions

File tree

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# 13. Amazon Managed Grafana for Monitoring and Alerting
2+
3+
Date: 2025-07-01
4+
5+
## Status
6+
7+
Accepted
8+
9+
## Context
10+
11+
Currently, monitoring is done via CloudWatch and alerts are based on Sentry. We want to improve the monitoring and
12+
alerting capabilities of the Mavis application by integrating a more robust and unified solution.
13+
14+
### Acceptance Criteria
15+
16+
1. Fully cloud native solution that integrates with AWS services.
17+
2. Monitoring does not require access to the AWS console.
18+
3. Can be fully managed and automated using terraform.
19+
4. Aligns with TechRadar's accepted technologies.
20+
5. Authentication aligns with existing Identity and Access Management (IAM) setup.
21+
6. Allows Alerts to be configured and managed within the same platform.
22+
23+
## Considered Options
24+
25+
### Option 1 : AWS CloudWatch Dashboards and Alarms
26+
27+
This option involves expanding our use of the native AWS CloudWatch service for all monitoring and alerting, creating
28+
more sophisticated dashboards and migrating all alerts to CloudWatch Alarms.
29+
30+
- **Pros**:
31+
- Deeply integrated with all AWS services.
32+
- Fully manageable via Terraform.
33+
- Provides both dashboarding and alerting in a single service.
34+
- **Cons**:
35+
- Less flexible and powerful dashboarding capabilities.
36+
- User experience requires returning to the AWS console.
37+
38+
### Option 2 : Splunk
39+
40+
This option would involve leveraging our existing Splunk integration to handle not just log aggregation but also
41+
metric-based monitoring and alerting.
42+
43+
- **Pros**:
44+
- Powerful alerting features based on complex log queries.
45+
- Can view dashboards without accessing the AWS console.
46+
- **Cons**:
47+
- Primarily a log analysis tool, not ideal for metric-based monitoring.
48+
- Integrating it AWS is more difficult as it is an external service.
49+
- Restricting NHS-wide access to dashboards and alerts would require additional configuration.
50+
51+
### Option 3 : Amazon OpenSearch Service
52+
53+
This involves using the managed OpenSearch service, which includes OpenSearch Dashboards and an integrated alerting
54+
plugin.
55+
56+
- **Pros**:
57+
- Fully managed AWS service.
58+
- Provides powerful log analytics, visualization, and alerting.
59+
- Can view dashboards without accessing the AWS console.
60+
- **Cons**:
61+
- Core strength is in log data, not metrics.
62+
- Metric-based alerting setup is more complex than specialized tools.
63+
- Potentially overkill for our primary requirements.
64+
65+
### Option 4 : Amazon Managed Grafana
66+
67+
A fully managed service for the open-source Grafana platform, which is a popular tool for analytics, interactive
68+
visualization, and alerting.
69+
70+
- **Pros**:
71+
- Purpose-built for unified dashboards and alerting.
72+
- Best-in-class visualization capabilities.
73+
- Integrates seamlessly with CloudWatch and AWS IAM Identity Center.
74+
- Fully manageable via Terraform.
75+
- Can view dashboards without accessing the AWS console.
76+
- **Cons**:
77+
- Introduces a new service to the architecture.
78+
79+
## Decision
80+
81+
We will adopt **Amazon Managed Grafana** as our primary monitoring and alerting solution.
82+
83+
It is the only option that excels at meeting all our acceptance criteria, especially the need for a unified platform for
84+
both visualization and alerting. It provides best-in-class dashboard features while also integrating an alerting system.
85+
The service also integrates well with multiple types of data (logs, streams, metrics, etc.)
86+
This allows us to consolidate our tooling and deprecate the use of Sentry for alerts, creating a more streamlined
87+
operational workflow. Its native integration with AWS for data sources (CloudWatch) and authentication (IAM Identity
88+
Center) makes it a natural fit, and as an AWS Service it is tech radar accepted.
89+
90+
## Consequences
91+
92+
- We will provision a new Amazon Managed Grafana workspace using Terraform.
93+
- User access will be managed via AWS IAM Identity Center, granting authorized personnel access to dashboards and alert
94+
configurations without needing to log into the AWS console.
95+
- CloudWatch will be configured as the primary data source within Grafana.
96+
- An initial set of dashboards for key application and infrastructure metrics (e.g., CPU/Memory utilization, database
97+
connections, latency) will be created.
98+
- All future alerting will be configured and managed within Grafana, deprecating our reliance on Sentry for this
99+
purpose.

0 commit comments

Comments
 (0)