Amazon MSK Migration Guide - Amazon MSK Migration Guide

Amazon MSK Migration Guide

Publication date: September 30, 2021 (Document history)

This guide covers options for your Apache Kafka migration to Amazon Managed Streaming for Apache Kafka (Amazon MSK). It provides guidance on the AWS Cloud infrastructure and migration fundamentals. This whitepaper details the Amazon MSK architecture and discusses architectural best practices around the AWS Well-Architected pillars of operational excellence, security, reliability, performance efficiency, and cost optimization.

Introduction

To be competitive and able to scale, businesses are reinventing their analytics applications around data streams, to reduce time-to-insight and to improve agility. Apache Kafka is one of the most widely adopted open source streaming platforms for ingesting and processing real-time data streams, enabling customers to decouple and independently scale data producing and data consuming applications. Apache Kafka is a distributed data store optimized for ingesting and processing streaming data in real time. Streaming data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes). Streaming data includes a wide variety of data such as log files generated by customers using your mobile or web applications, ecommerce purchases, in-game player activity, information from social networks, financial trading floors, geospatial services, and telemetry from connected devices or instrumentation in data centers.

Businesses may encounter challenges operating Apache Kafka on their own, or when trying to migrate open source Apache Kafka clusters to AWS. These challenges include a lack of agility deploying clusters, engineering obstacles when setting up self-managed Apache Kafka, and administrative operational overhead.

Operating a self-managed Apache Kafka environment requires customers to spend time on tasks including:

  • Managing capacity to meet demand fluctuations

  • Provisioning, configuring, and replacing servers on failure

  • Orchestrating patch management and software upgrades

  • Architecting for high availability, data durability, and security

  • Monitoring and alerting infrastructure

  • Enabling governance and security at scale

Amazon Managed Streaming for Apache Kafka (Amazon MSK) makes it easy for customers to build and run production applications on Apache Kafka without needing any Apache Kafka infrastructure management expertise. This means customers can spend less time managing infrastructure and more time building applications.

Customers choose to migrate to Amazon MSK for the following benefits:

Eliminate operational overhead -- Amazon MSK takes care of all operational overhead for your Apache Kafka environment, including the provisioning, configuration, and maintenance of highly available Apache Kafka clusters. Amazon MSK continuously monitors Apache Kafka and cluster health, automates patching and version upgrades, and shares key performance metrics in-console.

Migrate without changes to application code -- Amazon MSK deploys the latest versions of Apache Kafka so applications and tools built for Apache Kafka work with Amazon MSK out of the box, with no application code changes required.

Reduce time to production with native AWS integrations -- No other provider offers the breadth and depth of AWS integrations than Amazon MSK. These native integrations, such as private connectivity to an Amazon VPC or AWS IAM for authentication and authorization, allow you to quickly and easily deploy secure, production-ready applications.

Keep costs low with the most cost-effective provider -- Amazon MSK is the lowest-cost option for running managed Apache Kafka. The typical price, per gigabyte ingested, is as low as 1/13th the cost of the next best provider. Get started with Amazon MSK for less than $2.50 per day.

Amazon MSK provides the environment to integrate managed Amazon Managed Service for Apache Flink for Apache Flink applications natively. Apache Flink is a powerful open source streaming framework that can elastically scale to process data streams within Amazon MSK. Additionally, AWS Lambda supports Amazon MSK and Apache Kafka as events, which provides customers with more choices to build serverless streaming applications.

Amazon MSK also offers multiple levels of security for your Apache Kafka cluster including virtual private cloud (VPC) network isolation, AWS Identity & Access Management (IAM) for control-plane and data-plane API authorization, encryption of data at rest, Transport Layer Security (TLS) encryption in-transit, TLS-based client certificate authentication, SASL/SCRAM authentication secured by AWS Secrets Manager, and support for Apache Kafka Access Control Lists (ACLs) for data plane authorization.

This guide covers a range of scenarios to assist with your Apache Kafka migration to Amazon MSK. It provides guidance on:

  • AWS Cloud infrastructure and migration fundamentals

  • Migrating self-managed Apache Kafka to Amazon MSK

  • Security, cost, performance, reliability (disaster recovery and high availability), and performance of Amazon MSK clusters

  • Operating, monitoring, and maintaining an Amazon MSK cluster