Consumer-Lag Monitoring - Amazon Managed Streaming for Apache Kafka

Consumer-Lag Monitoring

Monitoring consumer lag allows you to identify slow or stuck consumers that aren't keeping up with the latest data available in a topic. When necessary, you can then take remedial actions, such as scaling or rebooting those consumers. To monitor consumer lag, you can use Amazon CloudWatch, open monitoring with Prometheus, or Burrow.

Consumer-Lag Metrics for CloudWatch and for Open Monitoring with Prometheus

Consumer lag metrics quantify the difference between the latest data written to your topics and the data read by your applications. Amazon MSK provides the following consumer-lag metrics, which you can get through Amazon CloudWatch or through open monitoring with Prometheus: EstimatedMaxTimeLag, EstimatedTimeLag, MaxOffsetLag, OffsetLag, and SumOffsetLag. For information about these metrics, see Amazon MSK Metrics for Monitoring with CloudWatch.

Consumer-Lag Monitoring with Burrow

Burrow is a monitoring companion for Apache Kafka that provides consumer-lag checking. Burrow has a modular design that includes the following subsystems:

  • Clusters run an Apache Kafka client that periodically updates topic lists and the current HEAD offset (the most recent offset) for every partition.

  • Consumers fetch information about consumer groups from a repository. This repository can be an Apache Kafka cluster (consuming the __consumer_offsets topic), ZooKeeper, or some other repository.

  • The storage subsystem stores all of this information in Burrow.

  • The evaluator subsystem retrieves information from the storage subsystem for a specific consumer group and calculates the status of that group. This follows the consumer lag evaluation rules.

  • The notifier subsystem requests status on consumer groups according to a configured interval and sends out notifications (Email, HTTP, or some other method) for groups that meet the configured criteria.

  • The HTTP Server subsystem provides an API interface to Burrow for fetching information about clusters and consumers.

For more information about Burrow, see Burrow - Kafka Consumer Lag Checking.

Important

Make sure that Burrow is compatible with the version of Apache Kafka that you are using for your MSK cluster.

To set up and use Burrow with Amazon MSK

Follow this if you use plaintext communication. For TLS, see the next procedure, as well.

  1. Create an MSK cluster and launch a client machine in the same VPC as the cluster. For example, you can follow the instructions at Getting Started Using Amazon MSK.

  2. Run the following command on the EC2 instance that serves as your client machine.

    sudo yum install go
  3. Run the following command on the client machine to get the Burrow project.

    go get github.com/linkedin/Burrow
  4. Run the following command to install dep. It installs it in the /home/ec2-user/go/bin/dep folder.

    curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh
  5. Go to the /home/ec2-user/go/src/github.com/linkedin/Burrow folder and run the following command.

    /home/ec2-user/go/bin/dep ensure
  6. Run the following command in the same folder.

    go install
  7. Open the /home/ec2-user/go/src/github.com/linkedin/Burrow/config/burrow.toml configuration file for editing. In the following sections of the configuration file, replace the placeholders with the name of your MSK cluster, the host:port pairs for your ZooKeeper servers, and your bootstrap brokers.

    To get your ZooKeeper host:port pairs, describe your MSK cluster and look for the value of ZookeeperConnectString. See Getting the Apache ZooKeeper connection string for an Amazon MSK Cluster.

    To get your bootstrap brokers, see Getting the bootstrap brokers for an Amazon MSK Cluster.

    Follow the formatting shown below when you edit the configuration file.

    [zookeeper] servers=[ "ZooKeeper-host-port-pair-1", "ZooKeeper-host-port-pair-2", "ZooKeeper-host-port-pair-3" ] timeout=6 root-path="/burrow" [client-profile.test] client-id="burrow-test" kafka-version="0.10.0" [cluster.MSK-cluster-name] class-name="kafka" servers=[ "bootstrap-broker-host-port-pair-1", "bootstrap-broker-host-port-pair-2", "bootstrap-broker-host-port-pair-3" ] client-profile="test" topic-refresh=120 offset-refresh=30 [consumer.MSK-cluster-name] class-name="kafka" cluster="MSK-cluster-name" servers=[ "bootstrap-broker-host-port-pair-1", "bootstrap-broker-host-port-pair-2", "bootstrap-broker-host-port-pair-3" ] client-profile="test" group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-).*$" group-whitelist=""
  8. In the go/bin folder run the following command.

    ./Burrow --config-dir /home/ec2-user/go/src/github.com/linkedin/Burrow/config
  9. Check for errors in the bin/log/burrow.log file.

  10. You can use the following command to test your setup.

    curl -XGET 'HTTP://your-localhost-ip:8000/v3/kafka'
  11. For all of the supported HTTP requests and links, see Burrow HTTP Endpoint.

To use Burrow with TLS

In addition to the previous procedure, see the following steps if you are using TLS communication.

  1. Run the following command.

    sudo yum install java-1.8.0-openjdk-devel -y
  2. Run the following command after you adjust the paths as necessary.

    find /usr/lib/jvm/ -name "cacerts" -exec cp {} /tmp/kafka.client.truststore.jks \;
  3. In the next step you use the keytool command, which asks for a password. The default password is changeit. We recommend that you run the following command to change the password before you proceed to the next step.

    keytool -keystore /tmp/kafka.client.truststore.jks -storepass changeit -storepasswd -new Password
  4. Run the following command.

    keytool --list -rfc -keystore /tmp/kafka.client.truststore.jks >/tmp/truststore.pem

    You need truststore.pem for the burrow.toml file that's described later in this procedure.

  5. To generate the certfile and the keyfile, use the code at Managing Client Certificates for Mutual Authentication with Amazon MSK. You need the pem flag.

  6. Set up your burrow.toml file like the following example. You can have multiple cluster and consumer sections to monitor multiple MSK clusters using one burrow cluster. You can also adjust the Apache Kafka version under client-profile. It represents the client version of Apache Kafka to support. For more information, see Client Profile on the Burrow GitHub.

    [general] pidfile="burrow.pid" stdout-logfile="burrow.out" [logging] filename="/tmp/burrow.log" level="info" maxsize=100 maxbackups=30 maxage=10 use-localtime=false use-compression=true [zookeeper] servers=[ "ZooKeeperConnectionString" ] timeout=6 root-path="/burrow" [client-profile.msk1-client] client-id="burrow-test" tls="msk-mTLS" kafka-version="2.0.0" [cluster.msk1] class-name="kafka" servers=[ "BootstrapBrokerString" ] client-profile="msk1-client" topic-refresh=120 offset-refresh=30 [consumer.msk1-cons] class-name="kafka" cluster="msk1" servers=[ "BootstrapBrokerString" ] client-profile="msk1-client" group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-).*$" group-whitelist="" [httpserver.default] address=":8000" [storage.default] class-name="inmemory" workers=20 intervals=15 expire-group=604800 min-distance=1 [tls.msk-mTLS] certfile="/tmp/client_cert.pem" keyfile="/tmp/private_key.pem" cafile="/tmp/truststore.pem" noverify=false