Consumer-Lag Monitoring
Monitoring consumer lag allows you to identify slow or stuck consumers that aren't keeping up with the latest data available in a topic. When necessary, you can then take remedial actions, such as scaling or rebooting those consumers. To monitor consumer lag, you can use Amazon CloudWatch, open monitoring with Prometheus, or Burrow.
Consumer-Lag Metrics for CloudWatch and for Open Monitoring with Prometheus
Consumer lag metrics quantify the difference between the latest data written to your
topics and the data read by your applications. Amazon MSK provides the following
consumer-lag metrics, which you can get through Amazon CloudWatch or through open
monitoring
with Prometheus: EstimatedMaxTimeLag
, EstimatedTimeLag
,
MaxOffsetLag
, OffsetLag
, and
SumOffsetLag
. For information about these metrics, see Amazon MSK Metrics for Monitoring with CloudWatch.
Consumer-Lag Monitoring with Burrow
Burrow is a monitoring companion for Apache Kafka that provides consumer-lag checking. Burrow has a modular design that includes the following subsystems:
-
Clusters run an Apache Kafka client that periodically updates topic lists and the current HEAD offset (the most recent offset) for every partition.
-
Consumers fetch information about consumer groups from a repository. This repository can be an Apache Kafka cluster (consuming the
__consumer_offsets
topic), ZooKeeper, or some other repository. -
The storage subsystem stores all of this information in Burrow.
-
The evaluator subsystem retrieves information from the storage subsystem for a specific consumer group and calculates the status of that group. This follows the consumer lag evaluation rules
. -
The notifier subsystem requests status on consumer groups according to a configured interval and sends out notifications (Email, HTTP, or some other method) for groups that meet the configured criteria.
-
The HTTP Server subsystem provides an API interface to Burrow for fetching information about clusters and consumers.
For more information about Burrow, see Burrow - Kafka Consumer Lag Checking
Make sure that Burrow is compatible with the version of Apache Kafka that you are using for your MSK cluster.
To set up and use Burrow with Amazon MSK
Follow this if you use plaintext communication. For TLS, see the next procedure, as well.
-
Create an MSK cluster and launch a client machine in the same VPC as the cluster. For example, you can follow the instructions at Getting Started Using Amazon MSK.
-
Run the following command on the EC2 instance that serves as your client machine.
sudo yum install go
-
Run the following command on the client machine to get the Burrow project.
go get github.com/linkedin/Burrow
-
Run the following command to install dep. It installs it in the
/home/ec2-user/go/bin/dep
folder.curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh
-
Go to the
/home/ec2-user/go/src/github.com/linkedin/Burrow
folder and run the following command./home/ec2-user/go/bin/dep ensure
-
Run the following command in the same folder.
go install
-
Open the
/home/ec2-user/go/src/github.com/linkedin/Burrow/config/burrow.toml
configuration file for editing. In the following sections of the configuration file, replace the placeholders with the name of your MSK cluster, the host:port pairs for your ZooKeeper servers, and your bootstrap brokers.To get your ZooKeeper host:port pairs, describe your MSK cluster and look for the value of
ZookeeperConnectString
. See Getting the Apache ZooKeeper connection string for an Amazon MSK Cluster.To get your bootstrap brokers, see Getting the bootstrap brokers for an Amazon MSK Cluster.
Follow the formatting shown below when you edit the configuration file.
[zookeeper] servers=[ "
ZooKeeper-host-port-pair-1
", "ZooKeeper-host-port-pair-2
", "ZooKeeper-host-port-pair-3
" ] timeout=6 root-path="/burrow" [client-profile.test] client-id="burrow-test" kafka-version="0.10.0" [cluster.MSK-cluster-name
] class-name="kafka" servers=[ "bootstrap-broker-host-port-pair-1
", "bootstrap-broker-host-port-pair-2
", "bootstrap-broker-host-port-pair-3
" ] client-profile="test" topic-refresh=120 offset-refresh=30 [consumer.MSK-cluster-name
] class-name="kafka" cluster="MSK-cluster-name
" servers=[ "bootstrap-broker-host-port-pair-1
", "bootstrap-broker-host-port-pair-2
", "bootstrap-broker-host-port-pair-3
" ] client-profile="test" group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-).*$" group-whitelist="" -
In the
go/bin
folder run the following command../Burrow --config-dir /home/ec2-user/go/src/github.com/linkedin/Burrow/config
-
Check for errors in the
bin/log/burrow.log
file. -
You can use the following command to test your setup.
curl -XGET 'HTTP://
your-localhost-ip
:8000/v3/kafka' -
For all of the supported HTTP requests and links, see Burrow HTTP Endpoint
.
To use Burrow with TLS
In addition to the previous procedure, see the following steps if you are using TLS communication.
-
Run the following command.
sudo yum install java-1.8.0-openjdk-devel -y
-
Run the following command after you adjust the paths as necessary.
find /usr/lib/jvm/ -name "cacerts" -exec cp {} /tmp/kafka.client.truststore.jks \;
-
In the next step you use the
keytool
command, which asks for a password. The default password ischangeit
. We recommend that you run the following command to change the password before you proceed to the next step.keytool -keystore /tmp/kafka.client.truststore.jks -storepass changeit -storepasswd -new Password
-
Run the following command.
keytool --list -rfc -keystore /tmp/kafka.client.truststore.jks >/tmp/truststore.pem
You need
truststore.pem
for theburrow.toml
file that's described later in this procedure. -
To generate the certfile and the keyfile, use the code at Managing Client Certificates for Mutual Authentication with Amazon MSK
. You need the pem
flag. -
Set up your
burrow.toml
file like the following example. You can have multiple cluster and consumer sections to monitor multiple MSK clusters using one burrow cluster. You can also adjust the Apache Kafka version underclient-profile
. It represents the client version of Apache Kafka to support. For more information, see Client Profileon the Burrow GitHub. [general] pidfile="burrow.pid" stdout-logfile="burrow.out" [logging] filename="/tmp/burrow.log" level="info" maxsize=100 maxbackups=30 maxage=10 use-localtime=false use-compression=true [zookeeper] servers=[ "
ZooKeeperConnectionString
" ] timeout=6 root-path="/burrow" [client-profile.msk1-client] client-id="burrow-test" tls="msk-mTLS" kafka-version="2.0.0" [cluster.msk1] class-name="kafka" servers=[ "BootstrapBrokerString
" ] client-profile="msk1-client" topic-refresh=120 offset-refresh=30 [consumer.msk1-cons] class-name="kafka" cluster="msk1" servers=[ "BootstrapBrokerString
" ] client-profile="msk1-client" group-blacklist="^(console-consumer-|python-kafka-consumer-|quick-).*$" group-whitelist="" [httpserver.default] address=":8000" [storage.default] class-name="inmemory" workers=20 intervals=15 expire-group=604800 min-distance=1 [tls.msk-mTLS] certfile="/tmp/client_cert.pem" keyfile="/tmp/private_key.pem" cafile="/tmp/truststore.pem" noverify=false