Creating MSK cloud watch alarms for consumer lags using cloud formation template.
Introduction :-
In this document, we will explore the process of creating #Amazon Managed Streaming for #Apache Kafka (MSK) CloudWatch Alarms to monitor consumer lags using a #CloudFormation template. #Amazon MSK is a fully managed service that enables the easy #deployment, management, and scaling of #ApacheKafka clusters for streaming data applications. #Monitoring consumer lags is essential for ensuring the health and efficiency of #Kafka consumers, as it helps identify potential bottlenecks or issues that could impact #realtime data processing.
By leveraging #CloudFormation, Amazon’s #Infrastructure-as-Code (IaC) service, we can define and provision the necessary resources and configurations in a structured and automated manner. We will learn how to set up custom #CloudWatch Alarms that track consumer lags in an #MSK cluster and trigger notifications when specific thresholds are exceeded.
Prerequisites:
An AWS account with appropriate permissions to create resources like #Cloudwatch alarms, Cloud formation stacks.
Basic knowledge of #AWS Cloud formation and template design.
Step 1 :- Create a folder named msk-cloudwatch in your home directory and with in the folder create a file named template.yaml.
Step 2 :- First, Define the required parameters in template.yaml file and create the cloud watch alarm as shown like below.
AWSTemplateFormatVersion: 2010-09-09
Description: This stack deploys MSK Cluster
Parameters:
EnvironmentName:
Description: Environment name for the application dev/production.
Type: String
AllowedValues:
- dev
- production
ConstraintDescription: Specify either dev or staging or production.
SlackBotSNSStack:
Description: Name of an active CloudFormation stack of Slack bot SNS resources
Type: String
Default: mahira-slackbot
MSKClusterName:
Description: Name of an active MSK Cluster
Type: String
ProcuraVisitsEventServiceConsumerGroupID:
Description: Consumer Group ID of Event-service kafka consumer for procura.visit topic
Type: String
ProcuraClientsEventServiceConsumerGroupID:
Description: Consumer Group ID of Event-service kafka consumer for procura.client topic
Type: String
AlarmThreshold:
Type: Number
Description: Threshold Value for the alarm to trigger
Resources:
ProcuraVisitTopicEventServiceConsumerAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmDescription: !Sub ${EnvironmentName}.procura.visit event service kafka consumer lags more than ${AlarmThreshold} messages
AlarmName: !Sub ${EnvironmentName}.procura.visit event service kafka consumer lag
ComparisonOperator: GreaterThanOrEqualToThreshold
DatapointsToAlarm: 1
EvaluationPeriods: 5
Threshold: !Ref AlarmThreshold
MetricName: "SumOffsetLag"
Namespace: "AWS/Kafka"
Dimensions:
- Value: !Ref ProcuraVisitsEventServiceConsumerGroupID
Name: "Consumer Group"
- Value: !Ref MSKClusterName
Name: "Cluster Name"
- Value: !Sub ${EnvironmentName}.procura.visit
Name: "Topic"
Period: 300
AlarmActions:
- { "Fn::ImportValue": !Sub "${SlackBotSNSStack}-SlackAlertTopic" }
Statistic: Average
TreatMissingData: notBreaching
Step 3 :- Now create params.json file in the msk-cloudwatch folder to define the parameters. refer below code -
[
{
"ParameterKey": "EnvironmentName",
"ParameterValue": "dev"
},
{
"ParameterKey": "MSKClusterName",
"ParameterValue": "kafka-msk"
},
{
"ParameterKey":"ProcuraVisitsEventServiceConsumerGroupID",
"ParameterValue":"**********************"
},
{
"ParameterKey":"ProcuraClientsEventServiceConsumerGroupID",
"ParameterValue":"*************************"
},
{
"ParameterKey": "AlarmThreshold",
"ParameterValue": "10000"
}
]
Step 4 :- Next, login to the aws management console and navigate to the cloud formation service using search bar.
Step 5 :- once the Cloud formation dashboard opens, click on create stack and name the stack, then upload the template.yaml file.
Step 6 :- Click on NEXT and select the env parameters and click on submit to create the stack.
Step 7 :- Once the stack creation is done, you are able to see the alarm is being created for msk cluster topics.
Conclusion :-
Creating #CloudWatch alarms for consumer lags in #AmazonMSK using AWS CloudFormation offers an efficient and standardized approach to monitor and manage the health of your #Kafka consumer applications. By implementing this automated solution, you gain visibility into #consumer group #performance, proactively detect anomalies, and take prompt actions to maintain smooth data #processing and avoid potential bottlenecks.