Menu
Amazon EMR
Amazon EMR Release Guide

Apache HBase

This documentation is for versions 4.x and 5.x of Amazon EMR. For information about Amazon EMR AMI versions 2.x and 3.x, see the Amazon EMR Developer Guide (PDF).

HBase is an open source, non-relational, distributed database developed as part of Apache Software Foundation's Hadoop project. HBase runs on top of Hadoop Distributed File System (HDFS) to provide non-relational database capabilities for the Hadoop ecosystem. HBase works seamlessly with Hadoop, sharing its file system and serving as a direct input and output to the MapReduce framework and execution engine. HBase also integrates with Apache Hive, enabling SQL-like queries over HBase tables, joins with Hive-based tables, and support for Java Database Connectivity (JDBC). For more information about HBase, see Apache HBase and HBase documentation on the Apache website. For an example of how to use HBase with Hive, see the AWS Big Data Blog post Combine NoSQL and Massively Parallel Analytics Using Apache HBase and Apache Hive on Amazon EMR.

Amazon EMR offers options to integrate with Amazon Simple Storage Service (Amazon S3) for data persistence and disaster recovery.

  • HBase on Amazon S3—With Amazon EMR version 5.2.0 and later, you can use HBase on Amazon S3 to store a cluster's HBase root directory and metadata directly to Amazon S3. You can subsequently start a new cluster, pointing it to the root directory location in Amazon S3. Only one cluster at a time can use the HBase location in Amazon S3, with the exception of a read-replica cluster. For more information, see HBase on Amazon S3 (Amazon S3 Storage Mode).

  • HBase read-replicas—Amazon EMR version 5.7.0 and later with HBase on Amazon S3 supports read-replica clusters. A read-replica cluster provides read-only access to a primary cluster's store files and metadata for read-only operations. For more information, see Using a Read-Replica Cluster.

  • HBase Snapshots—As an alternative to HBase on Amazon S3, with EMR version 4.0 and later you can create snapshots of your HBase data directly to Amazon S3 and then recover data using the snapshots. For more information, see Using HBase Snapshots.

HBase Release Information for This Release of Amazon EMR

Application Amazon EMR Release Label Components installed with this application

HBase 1.3.1

emr-5.10.0

emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-mapred, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hbase-hmaster, hbase-client, hbase-region-server, hbase-rest-server, hbase-thrift-server, zookeeper-client, zookeeper-server

Topics