Migrate an on-premises Apache Kafka cluster to Amazon MSK by using MirrorMaker - AWS Prescriptive Guidance

Migrate an on-premises Apache Kafka cluster to Amazon MSK by using MirrorMaker

Created by Han Zhang (AWS) and Tanner Pratt (AWS)

Environment: PoC or pilot

Source: On-premises or self-managed Apache Kafka cluster

Target: Amazon Managed Streaming for Apache Kafka (Amazon MSK)

R Type: Replatform

Workload: Open-source; All other workloads

Technologies: Analytics; Big data; Migration

AWS services: Amazon MSK

Summary

This pattern provides guidance for migrating an on-premises, self-managed, or hosted Apache Kafka cluster to Amazon Managed Streaming for Apache Kafka (Amazon MSK). You can also use this pattern to migrate from one Amazon MSK cluster to another.

Apache Kafka includes the MirrorMaker feature, which replicates data between two Kafka clusters. MirrorMaker consists of a collection of consumers, which are part of a consumer group. The consumers read data from the topics in the source cluster and then pass this data to producers, which write the data to the target cluster.

The Amazon MSK documentation contains a high-level overview of the process to use MirrorMaker version 1.0 to migrate on-premises Kafka clusters to Amazon MSK. This pattern supplements this information by offering comprehensive, step-by-step instructions for using MirrorMaker version 2.0.

Prerequisites and limitations

Prerequisites

  • An active AWS account

  • A Kafka source cluster that is one of the following:

    • In an on-premises data center

    • Self-managed in the cloud

    • Hosted through a partner

Limitations

  • To use MirrorMaker version 2.0, the source cluster must be operating Apache Kafka version 2.4.0 or later. For earlier versions, see the instructions in the Amazon MSK documentation in order to use MirrorMaker version 1.0.

Product versions

  • MirrorMaker version 2.0

  • Apache Kafka version 2.4.0 or later. For more information about the versions of Apache Kafka that Amazon MSK supports, see Supported Apache Kafka versions.

Architecture

Source technology stack

  • On-premises or self-managed Kafka cluster

Target technology stack

  • Amazon MSK cluster

Target architecture

MirrorMaker reads the data on the source cluster and replicates it to the target Amazon MSK cluster

The diagram shows the following process:

  1. MirrorMaker reads the data from the topics and consumer groups in the source Kafka cluster.

  2. MirrorMaker replicates the data and consumer information to the target Amazon MSK cluster.

Tools

AWS services

Other tools

  • Apache Kafka is an open-source event streaming platform. In this pattern, you use the MirrorMaker feature of Kafka to perform the cross-cluster migration.

Best practices

You can run MirrorMaker on in either the source or target environments, but it's recommended that you run it as close as possible to the target cluster. For more information, see Best Practice: Consume from Remote, Produce to Local in the Apache Kafka documentation.

Epics

TaskDescriptionSkills required

Create a VPC.

  1. Create a VPC in the target AWS account. For instructions, see Create a VPC.

  2. Create three private subnets in different Availability Zones in the new VPC. For instructions, see Create a subnet. Using different Availability Zones provides high availability and fault tolerance.

    Note: If you are using a public internet connection to migrate the Kafka cluster, create public subnets and enable public access to the Amazon MSK cluster.

AWS systems administrator, DevOps engineer, Cloud administrator

Create the Amazon MSK cluster.

Create an Amazon MSK cluster. For instructions, see Creating a cluster using the AWS Management Console or Creating a cluster using the AWS CLI. Configure the cluster to use the VPC and subnets that you created previously.

AWS systems administrator, DevOps engineer, Cloud administrator
TaskDescriptionSkills required

Install MirrorMaker.

  1. Launch an EC2 instance.

  2. Connect to your EC2 instance.

  3. On the EC2 instance, download and extract the latest Kafka release. For instructions, see Quick Start (Kafka documentation).

Note: In this pattern, you install MirrorMaker 2.0 as a dedicated MirrorMaker cluster on an Amazon EC2 instance. This option is acceptable for development environments and is the approach used in this pattern. For more information about other deployment options for MirrorMaker 2.0, see the Additional information section of this pattern.

AWS systems administrator, Cloud administrator, DevOps engineer

Specify Kafka cluster information.

In the Kafka client installation bin folder, create a mm2.properties file and configure it for your source Kafka cluster. For instructions, see Running a dedicated MirrorMaker cluster (Kafka documentation).

AWS systems administrator, Cloud administrator, DevOps engineer

Start MirrorMaker.

Enter the following command to start MirrorMaker and pass the mm2.properties file.

$ ./bin/connect-mirror-maker.sh mm2.properties
AWS systems administrator, Cloud administrator, DevOps engineer

Monitor the progress.

Check the progress by inspecting the lag between the last offset for each topic and the current offset for the topic MirrorMaker is consuming. For instructions, see Monitoring Geo-Replication in the Kafka documentation.

AWS systems administrator, Cloud administrator, DevOps engineer
TaskDescriptionSkills required

Stop the consumer applications.

Stop all consumer applications that consume data from the source cluster.

App developer

Start the consumer applications.

Alter the applications bootstrap configuration to point to the destination cluster. Then begin consuming on the target cluster.

App developer

Stop the producers on the source cluster.

When the consumer applications are successfully consuming on the target cluster, stop the producers on the source cluster.

App developer

Start the producers on the target cluster.

Alter the producer's configuration bootstrap servers, and point to the target cluster. Wait for MirrorMaker to finish mirroring all data from source cluster before starting the producers.

App developer

Stop MirrorMaker.

After producers have moved to the target cluster, stop MirrorMaker.

AWS systems administrator, Cloud administrator, DevOps engineer

Related resources

AWS resources

Other resources

Additional information

This pattern runs MirrorMaker 2.0 as a dedicated MirrorMaker cluster on Amazon EC2. This option is acceptable for development environments. Although it is not discussed in this pattern, you can also run MirrorMaker 2.0 in a Kafka Connect cluster. This deployment option uses a framework within the Kafka ecosystem that improves scaling and maintenance. You deploy the connector into a Kafka Connect cluster with the associated configuration to run the application. The connector can run in standalone mode for development or testing or in distributed mode for production. For more information, see Running MirrorMaker in a Connect cluster (Apache Kafka documentation). For more information about other MirrorMaker 2.0 deployment options, see Walkthrough: Running MirrorMaker 2.0 (Kafka documentation).