Summary Prerequisites and limitations Architecture Tools Best practices Epics Related resources Additional information

Migrate an on-premises Apache Kafka cluster to Amazon MSK by using MirrorMaker

Created by Han Zhang (AWS) and Tanner Pratt (AWS)

Environment: PoC or pilot	Source: On-premises or self-managed Apache Kafka cluster	Target: Amazon Managed Streaming for Apache Kafka (Amazon MSK)
R Type: Replatform	Workload: Open-source; All other workloads	Technologies: Analytics; Big data; Migration
AWS services: Amazon MSK

Summary

This pattern provides guidance for migrating an on-premises, self-managed, or hosted Apache Kafka cluster to Amazon Managed Streaming for Apache Kafka (Amazon MSK). You can also use this pattern to migrate from one Amazon MSK cluster to another.

Apache Kafka includes the MirrorMaker feature, which replicates data between two Kafka clusters. MirrorMaker consists of a collection of consumers, which are part of a consumer group. The consumers read data from the topics in the source cluster and then pass this data to producers, which write the data to the target cluster.

The Amazon MSK documentation contains a high-level overview of the process to use MirrorMaker version 1.0 to migrate on-premises Kafka clusters to Amazon MSK. This pattern supplements this information by offering comprehensive, step-by-step instructions for using MirrorMaker version 2.0.

Prerequisites and limitations

Prerequisites

An active AWS account
A Kafka source cluster that is one of the following:
- In an on-premises data center
- Self-managed in the cloud
- Hosted through a partner

Limitations

To use MirrorMaker version 2.0, the source cluster must be operating Apache Kafka version 2.4.0 or later. For earlier versions, see the instructions in the Amazon MSK documentation in order to use MirrorMaker version 1.0.

Product versions

MirrorMaker version 2.0
Apache Kafka version 2.4.0 or later. For more information about the versions of Apache Kafka that Amazon MSK supports, see Supported Apache Kafka versions.

Architecture

Source technology stack

On-premises or self-managed Kafka cluster

Target technology stack

Amazon MSK cluster

Target architecture

MirrorMaker reads the data on the source cluster and replicates it to the target Amazon MSK cluster

The diagram shows the following process:

MirrorMaker reads the data from the topics and consumer groups in the source Kafka cluster.
MirrorMaker replicates the data and consumer information to the target Amazon MSK cluster.

Tools

AWS services

Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the AWS Cloud. You can launch as many virtual servers as you need and quickly scale them up or down.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that helps you build and run applications that use Apache Kafka to process streaming data.
Amazon Virtual Private Cloud (Amazon VPC) helps you launch AWS resources into a virtual network that you’ve defined. This virtual network resembles a traditional network that you’d operate in your own data center, with the benefits of using the scalable infrastructure of AWS.

Other tools

Apache Kafka is an open-source event streaming platform. In this pattern, you use the MirrorMaker feature of Kafka to perform the cross-cluster migration.

Best practices

You can run MirrorMaker on in either the source or target environments, but it's recommended that you run it as close as possible to the target cluster. For more information, see Best Practice: Consume from Remote, Produce to Local in the Apache Kafka documentation.

Epics

Task	Description	Skills required
Create a VPC.	Create a VPC in the target AWS account. For instructions, see Create a VPC. Create three private subnets in different Availability Zones in the new VPC. For instructions, see Create a subnet. Using different Availability Zones provides high availability and fault tolerance. Note: If you are using a public internet connection to migrate the Kafka cluster, create public subnets and enable public access to the Amazon MSK cluster.	AWS systems administrator, DevOps engineer, Cloud administrator
Create the Amazon MSK cluster.	Create an Amazon MSK cluster. For instructions, see Creating a cluster using the AWS Management Console or Creating a cluster using the AWS CLI. Configure the cluster to use the VPC and subnets that you created previously.	AWS systems administrator, DevOps engineer, Cloud administrator

Task	Description	Skills required
Install MirrorMaker.	Launch an EC2 instance. Connect to your EC2 instance. On the EC2 instance, download and extract the latest Kafka release. For instructions, see Quick Start (Kafka documentation). Note: In this pattern, you install MirrorMaker 2.0 as a dedicated MirrorMaker cluster on an Amazon EC2 instance. This option is acceptable for development environments and is the approach used in this pattern. For more information about other deployment options for MirrorMaker 2.0, see the Additional information section of this pattern.	AWS systems administrator, Cloud administrator, DevOps engineer
Specify Kafka cluster information.	In the Kafka client installation `bin` folder, create a mm2.properties file and configure it for your source Kafka cluster. For instructions, see Running a dedicated MirrorMaker cluster (Kafka documentation).	AWS systems administrator, Cloud administrator, DevOps engineer
Start MirrorMaker.	Enter the following command to start MirrorMaker and pass the mm2.properties file. `$ ./bin/connect-mirror-maker.sh mm2.properties`	AWS systems administrator, Cloud administrator, DevOps engineer
Monitor the progress.	Check the progress by inspecting the lag between the last offset for each topic and the current offset for the topic MirrorMaker is consuming. For instructions, see Monitoring Geo-Replication in the Kafka documentation.	AWS systems administrator, Cloud administrator, DevOps engineer

Task	Description	Skills required
Stop the consumer applications.	Stop all consumer applications that consume data from the source cluster.	App developer
Start the consumer applications.	Alter the applications bootstrap configuration to point to the destination cluster. Then begin consuming on the target cluster.	App developer
Stop the producers on the source cluster.	When the consumer applications are successfully consuming on the target cluster, stop the producers on the source cluster.	App developer
Start the producers on the target cluster.	Alter the producer's configuration bootstrap servers, and point to the target cluster. Wait for MirrorMaker to finish mirroring all data from source cluster before starting the producers.	App developer
Stop MirrorMaker.	After producers have moved to the target cluster, stop MirrorMaker.	AWS systems administrator, Cloud administrator, DevOps engineer

Related resources

AWS resources

Migrating clusters using MirrorMaker (Amazon MSK documentation)
Amazon MSK migration labs (AWS workshop studio)

Other resources

MirrorMaker 2.0 (Apache Kafka Improvement Proposals)
Geo-Replication: Cross-Cluster Data Mirroring (Apache Kafka documentation)

Additional information

This pattern runs MirrorMaker 2.0 as a dedicated MirrorMaker cluster on Amazon EC2. This option is acceptable for development environments. Although it is not discussed in this pattern, you can also run MirrorMaker 2.0 in a Kafka Connect cluster. This deployment option uses a framework within the Kafka ecosystem that improves scaling and maintenance. You deploy the connector into a Kafka Connect cluster with the associated configuration to run the application. The connector can run in standalone mode for development or testing or in distributed mode for production. For more information, see Running MirrorMaker in a Connect cluster (Apache Kafka documentation). For more information about other MirrorMaker 2.0 deployment options, see Walkthrough: Running MirrorMaker 2.0 (Kafka documentation).

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Migrate Oracle Business Intelligence 12C to the AWS Cloud

Migrate an ELK Stack to the AWS Cloud