Best Practices
This topic outlines some best practices to follow when using Amazon MSK.
Right-size your cluster
When you create an MSK cluster, you specify the type and number of brokers.
Number of partitions per broker
The following table shows the recommended maximum number of partitions (including leader and follower replicas) per broker. However, the number of partitions per broker is affected by use case and configuration. We also recommend that you perform your own testing to determine the right type for your brokers. For more information about the different broker types, see Broker types.
Broker type | Maximum number of partitions (including leader and follower replicas) per broker |
---|---|
kafka.t3.small |
300 |
kafka.m5.large or kafka.m5.xlarge |
1000 |
kafka.m5.2xlarge |
2000 |
kafka.m5.4xlarge , kafka.m5.8xlarge ,
kafka.m5.12xlarge , kafka.m5.16xlarge ,
or kafka.m5.24xlarge |
4000 |
For guidance on choosing the number of partitions, see Apache Kafka Supports 200K Partitions Per Cluster
Number of brokers per cluster
To determine the right number of brokers for your MSK cluster and understand
costs, see the MSK Sizing and Pricing
Build highly available clusters
Use the following recommendations so that your MSK cluster can be highly available during an update or when Amazon MSK is replacing a broker.
-
Ensure that the replication factor (RF) is at least 2 for two-AZ clusters and at least 3 for three-AZ clusters. An RF of 1 can lead to offline partitions during a rolling update.
-
Set minimum in-sync replicas (minISR) to at most RF - 1. A minISR that is equal to the RF can prevent producing to the cluster during a rolling update. A minISR of 2 allows three-way replicated topics to be available when one replica is offline.
-
Ensure client connection strings include multiple brokers. Having multiple brokers in a client’s connection string allows for failover when a specific broker is offline for an update. For information about how to get a connection string with multiple brokers, see Getting the bootstrap brokers for an Amazon MSK Cluster.
Monitor disk space
To avoid running out of disk space for messages, create a CloudWatch alarm that watches
the
KafkaDataLogsDiskUsed
metric. When the value of this metric reaches or
exceeds 85%, perform one or more of the following actions:
-
Increase broker storage. For information on how to do this, see Scaling up broker storage.
-
Reduce the message retention period or log size. For information on how to do that, see Adjust data retention parameters.
-
Delete unused topics.
For information on how to set up and use alarms, see Using Amazon CloudWatch Alarms. For a full list of Amazon MSK metrics, see Monitoring an Amazon MSK Cluster.
Adjust data retention parameters
Consuming messages doesn't remove them from the log. To free up disk space regularly, you can explicitly specify a retention time period, which is how long messages stay in the log. You can also specify a retention log size. When either the retention time period or the retention log size are reached, Apache Kafka starts removing inactive segments from the log.
To specify a retention policy at the cluster level, set one or more of the following
parameters: log.retention.hours
, log.retention.minutes
,
log.retention.ms
, or log.retention.bytes
. For more
information, see Custom MSK Configurations.
You can also specify retention parameters at the topic level:
-
To specify a retention time period per topic, use the following command.
kafka-configs.sh --zookeeper
ZooKeeperConnectionString
--alter --entity-type topics --entity-nameTopicName
--add-config retention.ms=DesiredRetentionTimePeriod
-
To specify a retention log size per topic, use the following command.
kafka-configs.sh --zookeeper
ZooKeeperConnectionString
--alter --entity-type topics --entity-nameTopicName
--add-config retention.bytes=DesiredRetentionLogSize
The retention parameters that you specify at the topic level take precedence over cluster-level parameters.
Don't add non-MSK brokers
If you use Apache ZooKeeper commands to add brokers, these brokers don't get added to your MSK cluster, and your Apache ZooKeeper will contain incorrect information about the cluster. This might result in data loss. For supported cluster operations, see Amazon MSK: How It Works.
Enable in-transit encryption
For information about encryption in transit and how to enable it, see Encryption in Transit.
Reassign partitions
To move partitions to different brokers on the same cluster, you can use the partition
reassignment tool
named kafka-reassign-partitions.sh
. For example, after you add new brokers to expand a cluster, you can
rebalance that cluster by reassigning partitions to the new brokers. For information
about how to add brokers to a cluster, see Expanding an Amazon MSK Cluster.
For information about the partition reassignment tool, see Expanding
your cluster