MirrorMaker 2.0 (MM2)
Note
This whitepaper is only meant to be used as a reference for deploying MirrorMaker 2.0 (MM2) on Kafka Connect to migrate data between multiple Amazon MSK clusters. AWS does not offer any support for it.
MirrorMaker 2.0 (MM2) is a multi-cluster data replication engine based on the Kafka Connect framework. MM2 is a combination of an Apache Kafka source connector and a sink connector. You can use a single MM2 cluster to migrate data between multiple clusters. MM2 automatically detects new topics and partitions, while also ensuring the topic configurations are synced between clusters.
MM2 is available as part of the Apache Kafka 2.4.0 release and supersedes all capabilities of MM1. AWS recommends using MM2.
MM2 remote topics
Replication policies help consumers perform migrations between self-managed Apache Kafka clusters and Amazon MSK clusters with no downtime. MM2 comes with a default replication policy that can be customized by providing a custom replication policy class.
Remote topics in MM2 are replicated topics on the target cluster. These topics reference the source cluster using a naming convention as shown in the following figure.
For the default replication policy, topicA is the source topic. Source is a source cluster alias used in the configuration properties file and the remote topic name is source.topicA. This distinguishes between a replicated remote topic and a topic created on the destination or target cluster.
MM2 ensures ordering in partitions when it replicates the topic from source cluster to the remote topic in the destination cluster. The following diagram shows an example of remote topics created on source and destination cluster when bi-directional replication is configured in MM2.
MM2 internal topics
MM2 uses the following internal topics for replication purposes:
Heartbeat topic: Emitted from the source cluster and replicated to demonstrate connectivity through connectors. This can be used by downstream consumers to verify that the connector is running and the corresponding source cluster is available. Messages in this topic contain information on the source cluster, target cluster, and timestamp when the heartbeat was created.
Checkpoint topic: Emitted in the target cluster by the
connector and contains consumer offsets for each consumer group in the source cluster. The
connector will periodically query the source cluster for all committed offsets of consumer
groups (except for replicated topics) and emit a message to this topic. Information in this
message includes the consumer group id, topic, partition, upstream offset, downstream
offset, metadata, and timestamp. Consumers use the checkpoint topic via MirrorClient
Offset sync topics: Encodes cluster-to-cluster offset mapping for each replicated topic-partition. Messages in this topic contain topic, partition, upstream offset, and downstream offset.
MM2 Connectors
MM2 has the following connectors for enabling complex flows between multiple Apache Kafka clusters and across data centers via existing Kafka Connect clusters.
-
MirrorSourceConnector
replicates a set of topics from a single source cluster into the primary cluster. -
MirrorCheckpointConnector
emits consumer offset checkpoints and syncs the offset with__consumer_offsets
(as of Apache Kafka 2.7.0). -
MirrorHeartbeatConnector
emits heartbeats.
MM2 Deployment Methods
MirrorMaker 2.0 can be deployed in several modes:
Deploying a dedicated MM2 cluster
AWS provides the self-service Amazon MSK workshopMirrorSourceConnector
, MirrorCheckpointConnector
, and
MirrorHeartbeatConnector
connectors based on the provided MM2 properties
file. See the following example file.
#Apache Kafka clusters clusters = #comma separated list of Apache Kafka cluster aliases source.bootstrap.servers = apache-kafka-source:9092 target.bootstrap.servers = apache-kafka-target:9092 source -> target.enabled = true #Source and target cluster configuration source.config.storage.replication.factor = 1 target.config.storage.replication.factor = 1 source.offset.storage.replication.factor = 1 target.offset.storage.replication.factor = 1 #Mirror Maker configuration offset-sync.topic.replication.factor = 1 heartbeat.topic.replication.factor = 1 checkpoint.topic.replication.factor = 1 topics = .* groups = .* tasks.max = 1 replication.factor = 1 refresh.topics.enabled = true sync.topic.configs.enabled = true #Enable heartbeats and checkpoints source->target.emit.heartbeats.enabled = true source->target.emit.checkpoints.enabled = true
Ensure MM2 has successfully connected to the source and target clusters using the
kafka-topics.sh
script to list and compare the topics created on both
clusters. See the following example.
#Source Topics ./kafka-topics.sh --bootstrap-server <source bootstrap string> --list __consumer_offsets heartbeats mm2-configs.target.internal mm2-offset-syncs.target.internal mm2-offsets.target.internal mm2-status.target.internal #Target Topics ./kafka-topics.sh --bootstrap-server <target bootstrap string> --list __consumer_offsets heartbeats mm2-configs.source.internal mm2-offsets.source.internal mm2-status.source.internal source.checkpoints.internal source.heartbeats
Deploying MM2 on a Kafka Connect cluster
Customers that already have an existing Kafka Connect cluster running on EC2 instances
or containers can run MM2 on the same cluster by starting the
MirrorSourceConnector
, MirrorCheckpointConnector
, and
MirrorHeartbeatConnector
connectors.
The AWS provided self-service Amazon
MSK workshop
In legacy mode
After legacy MirrorMaker is deprecated, the existing
./bin/kafka-mirror-maker.sh
scripts will be updated to run MM2 in legacy
mode:
$ ./bin/kafka-mirror-maker.sh --consumer consumer.properties --producer producer.properties