Migrating to Apache HBase on Amazon S3 on Amazon EMR - Migrating to Apache HBase on Amazon S3 on Amazon EMR

Migrating to Apache HBase on Amazon S3 on Amazon EMR

Publication date: May 1, 2021 (Document revisions)

Abstract

This whitepaper provides an overview of Apache HBase on Amazon S3 and guides data engineers and software developers in the migration of an on- premises or HDFS backed Apache HBase cluster to Apache HBase on Amazon S3. The whitepaper offers a migration plan that includes detailed steps for each stage of the migration, including data migration, performance tuning, and operational guidance.

Are you Well-Architected?

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

For more expert guidance and best practices for your cloud architecture—reference architecture deployments, diagrams, and whitepapers—refer to the AWS Architecture Center.

Introduction

In 2006, Amazon Web Services (AWS) began offering IT infrastructure services to businesses in the form of web services—now commonly known as cloud computing. One of the key benefits of cloud computing is the opportunity to replace upfront capital infrastructure expenses with low variable costs that scale with your business. With the cloud, businesses no longer need to plan for and procure servers and other IT infrastructure weeks or months in advance.

Instead, they can instantly spin up hundreds or thousands of servers in minutes and deliver results faster. Today, AWS provides a highly reliable, scalable, low- cost infrastructure platform in the cloud that powers hundreds of thousands of businesses in 190 countries around the world.

Many businesses have been taking advantage of the unique properties of the cloud by migrating their existing Apache Hadoop workloads, including Apache HBase, to Amazon EMR and Amazon Simple Storage Service (Amazon S3). The ability to separate your durable storage layer from your compute layer, have flexible and scalable compute, and have the ease of integration with other AWS services provides immense benefits and opens up many opportunities to reimagine your data architectures.