|This documentation is for versions 4.x and 5.x of Amazon EMR. For information about Amazon EMR AMI versions 2.x and 3.x, see the Amazon EMR Developer Guide (PDF).|
HBase is an open source, non-relational, distributed database. It was developed as part of Apache Software Foundation's Hadoop project and runs on top of Hadoop Distributed File System (HDFS) to provide non-relational database capabilities for the Hadoop ecosystem. HBase provides you a fault-tolerant, efficient way of storing large quantities of sparse data using column-based compression and storage. In addition, it provides fast lookup of data because large portions of data are cached in-memory. Cluster instance storage is still used. HBase is optimized for sequential write operations, and is highly efficient for batch inserts, updates, and deletes. HBase also supports cell versioning so you can look up and use several previous versions of a cell or a row.
HBase works seamlessly with Hadoop, sharing its file system and serving as a direct input and output to the MapReduce framework and execution engine. HBase also integrates with Apache Hive, enabling SQL-like queries over HBase tables, joins with Hive-based tables, and support for Java Database Connectivity (JDBC).
Additionally, HBase on Amazon EMR provides the ability to create snapshots of your HBase data directly to Amazon Simple Storage Service (Amazon S3). You can restore from previously created snapshots. Another option is Amazon S3 storage mode, which allows you to use Amazon S3 directly for the HBase root directory and metadata. Using Amazon S3 storage mode, you can start a new cluster, seamlessly pointing the new cluster to the root directory location in Amazon S3.
For an example of how to use HBase with Hive, see the AWS Big Data Blog post Combine NoSQL and Massively Parallel Analytics Using Apache HBase and Apache Hive on Amazon EMR.
|Application||Amazon EMR Release Label||Components installed with this application|
emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-mapred, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hbase-hmaster, hbase-client, hbase-region-server, hbase-rest-server, hbase-thrift-server, zookeeper-client, zookeeper-server
- Creating a Cluster with HBase Using the Console
- Creating a Cluster with HBase Using AWS CLI
- Amazon S3 Storage Mode for HBase
- Using the HBase Shell
- Access HBase Tables with Hive
- Using HBase Snapshots
- Configure HBase
- View the HBase User Interface
- View HBase Log Files
- Monitor HBase with Ganglia
- Migrating from Previous HBase Versions