Creating MSK cloud watch alarms for consumer lags using cloud formation template.

Creating MSK cloud watch alarms for consumer lags using cloud formation template.

Introduction :-

In this document, we will explore the process of creating #Amazon Managed Streaming for #Apache Kafka (MSK) CloudWatch Alarms to monitor consumer lags using a #CloudFormation template. #Amazon MSK is a fully managed service that enables the easy #deployment, management, and scaling of #ApacheKafka clusters for streaming data applications. #Monitoring consumer lags is essential for ensuring the health and efficiency of #Kafka consumers, as it helps identify potential bottlenecks or issues that could impact #realtime data processing.

By leveraging #CloudFormation, Amazon’s #Infrastructure-as-Code (IaC) service, we can define and provision the necessary resources and configurations in a structured and automated manner. We will learn how to set up custom #CloudWatch Alarms that track consumer lags in an #MSK cluster and trigger notifications when specific thresholds are exceeded.

Prerequisites:

  • An AWS account with appropriate permissions to create resources like #Cloudwatch alarms, Cloud formation stacks.

  • Basic knowledge of #AWS Cloud formation and template design.

Step 1 :- Create a folder named msk-cloudwatch in your home directory and with in the folder create a file named template.yaml.

Step 2 :- First, Define the required parameters in template.yaml file and create the cloud watch alarm as shown like below.

AWSTemplateFormatVersion: 2010-09-09
Description: This stack deploys MSK Cluster
Parameters:
  EnvironmentName:
    Description: Environment name for the application dev/production.
    Type: String
    AllowedValues:
      - dev
      - production
    ConstraintDescription: Specify either dev or staging or production.
  SlackBotSNSStack:
    Description: Name of an active CloudFormation stack of Slack bot SNS resources
    Type: String
    Default: mahira-slackbot
  MSKClusterName:
    Description: Name of an active MSK Cluster
    Type: String
  ProcuraVisitsEventServiceConsumerGroupID:
    Description: Consumer Group ID of Event-service kafka consumer for procura.visit topic
    Type: String
  ProcuraClientsEventServiceConsumerGroupID:
    Description: Consumer Group ID of Event-service kafka consumer for procura.client topic
    Type: String
  AlarmThreshold:
    Type: Number
    Description: Threshold Value for the alarm to trigger  

Resources:
  ProcuraVisitTopicEventServiceConsumerAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: !Sub ${EnvironmentName}.procura.visit event service kafka consumer lags more than ${AlarmThreshold} messages
      AlarmName: !Sub ${EnvironmentName}.procura.visit event service kafka consumer lag
      ComparisonOperator: GreaterThanOrEqualToThreshold
      DatapointsToAlarm: 1
      EvaluationPeriods: 5
      Threshold: !Ref AlarmThreshold
      MetricName: "SumOffsetLag"
      Namespace: "AWS/Kafka"
      Dimensions:
        - Value: !Ref ProcuraVisitsEventServiceConsumerGroupID
          Name: "Consumer Group"
        - Value: !Ref MSKClusterName
          Name: "Cluster Name"
        - Value: !Sub ${EnvironmentName}.procura.visit
          Name: "Topic"
      Period: 300
      AlarmActions:
        - { "Fn::ImportValue": !Sub "${SlackBotSNSStack}-SlackAlertTopic" }
      Statistic: Average
      TreatMissingData: notBreaching

Step 3 :- Now create params.json file in the msk-cloudwatch folder to define the parameters. refer below code -

[
  {
    "ParameterKey": "EnvironmentName",
    "ParameterValue": "dev"
  },
  {
    "ParameterKey": "MSKClusterName",
    "ParameterValue": "kafka-msk"
  },
  {
    "ParameterKey":"ProcuraVisitsEventServiceConsumerGroupID",
    "ParameterValue":"**********************"
  },
  {
    "ParameterKey":"ProcuraClientsEventServiceConsumerGroupID",
    "ParameterValue":"*************************"
  },
  {
    "ParameterKey": "AlarmThreshold",
    "ParameterValue": "10000"
  }
]

Step 4 :- Next, login to the aws management console and navigate to the cloud formation service using search bar.

Step 5 :- once the Cloud formation dashboard opens, click on create stack and name the stack, then upload the template.yaml file.

Step 6 :- Click on NEXT and select the env parameters and click on submit to create the stack.

Step 7 :- Once the stack creation is done, you are able to see the alarm is being created for msk cluster topics.

Conclusion :-

Creating #CloudWatch alarms for consumer lags in #AmazonMSK using AWS CloudFormation offers an efficient and standardized approach to monitor and manage the health of your #Kafka consumer applications. By implementing this automated solution, you gain visibility into #consumer group #performance, proactively detect anomalies, and take prompt actions to maintain smooth data #processing and avoid potential bottlenecks.