Migrate to an Amazon MSK Cluster
Amazon MSK Replicator can be used for MSK cluster migration. See What is Amazon MSK Replicator?. Alternatively, you can use Apache MirrorMaker 2.0 to migrate from a non-MSK cluster to an Amazon MSK cluster. For an example of how to do this, see Migrate an on-premises Apache Kafka cluster to Amazon MSK by using MirrorMaker. For information about how to use
MirrorMaker, see Mirroring data between clusters
An outline of the steps to follow when using MirrorMaker to migrate to an MSK cluster
Create the destination MSK cluster
Start MirrorMaker from an Amazon EC2 instance within the same Amazon VPC as the destination cluster.
-
Inspect the MirrorMaker lag.
-
After MirrorMaker catches up, redirect producers and consumers to the new cluster using the MSK cluster bootstrap brokers.
-
Shut down MirrorMaker.
MirrorMaker 1.0 best practices
This list of best practices applies to MirrorMaker 1.0.
Run MirrorMaker on the destination cluster. This way, if a network problem happens, the messages are still available in the source cluster. If you run MirrorMaker on the source cluster and events are buffered in the producer and there is a network issue, events might be lost.
-
If encryption is required in transit, run it in the source cluster.
For consumers, set auto.commit.enabled=false
For producers, set
max.in.flight.requests.per.connection=1
retries=Int.Max_Value
acks=all
max.block.ms = Long.Max_Value
For a high producer throughput:
Buffer messages and fill message batches — tune buffer.memory, batch.size, linger.ms
Tune socket buffers — receive.buffer.bytes, send.buffer.bytes
To avoid data loss, turn off auto commit at the source, so that MirrorMaker can control the commits, which it typically does after it receives the ack from the destination cluster. If the producer has acks=all and the destination cluster has min.insync.replicas set to more than 1, the messages are persisted on more than one broker at the destination before the MirrorMaker consumer commits the offset at the source.
-
If order is important, you can set retries to 0. Alternatively, for a production environment, set max inflight connections to 1 to ensure that the batches sent out are not committed out of order if a batch fails in the middle. This way, each batch sent is retried until the next batch is sent out. If max.block.ms is not set to the maximum value, and if the producer buffer is full, there can be data loss (depending on some of the other settings). This can block and back-pressure the consumer.
-
For high throughput
-
Increase buffer.memory.
-
Increase batch size.
-
Tune linger.ms to allow the batches to fill. This also allows for better compression, less network bandwidth usage, and less storage on the cluster. This results in increased retention.
-
Monitor CPU and memory usage.
-
-
For high consumer throughput
-
Increase the number of threads/consumers per MirrorMaker process — num.streams.
-
Increase the number of MirrorMaker processes across machines first before increasing threads to allow for high availability.
-
Increase the number of MirrorMaker processes first on the same machine and then on different machines (with the same group ID).
-
Isolate topics that have very high throughput and use separate MirrorMaker instances.
-
For management and configuration
-
Use AWS CloudFormation and configuration management tools like Chef and Ansible.
-
Use Amazon EFS mounts to keep all configuration files accessible from all Amazon EC2 instances.
-
Use containers for easy scaling and management of MirrorMaker instances.
-
Typically, it takes more than one consumer to saturate a producer in MirrorMaker. So, set up multiple consumers. First, set them up on different machines to provide high availability. Then, scale individual machines up to having a consumer for each partition, with consumers equally distributed across machines.
-
For high throughput ingestion and delivery, tune the receive and send buffers because their defaults might be too low. For maximum performance, ensure that the total number of streams (num.streams) matches all of the topic partitions that MirrorMaker is trying to copy to the destination cluster.
Advantages of MirrorMaker 2.*
Makes use of the Apache Kafka Connect framework and ecosystem.
Detects new topics and partitions.
Automatically syncs topic configuration between clusters.
Supports "active/active" cluster pairs, as well as any number of active clusters.
-
Provides new metrics including end-to-end replication latency across multiple data centers and clusters.
-
Emits offsets required to migrate consumers between clusters and provides tooling for offset translation.
-
Supports a high-level configuration file for specifying multiple clusters and replication flows in one place, compared to low-level producer/consumer properties for each MirrorMaker 1.* process.