Write-ahead logs (WAL) for Amazon EMR
With Amazon EMR 6.15 and higher, you can write your Apache HBase write-ahead logs (WAL) to the Amazon EMR WAL. With lower Amazon EMR releases, when you create a cluster with the HBase on Amazon S3 option, WAL is the only Apache HBase component that gets stored in the local disk for clusters, and you can store other components such as the root directory, store files (HFiles), table metadata, and data on Amazon S3.
You can use Amazon EMR WAL to recover data that didn't flush to Amazon S3. To fully back up your
HBase clusters, opt in to use the Amazon EMR WAL service. Behind the scenes,
RegionServer
writes your HBase write-ahead logs (WAL) to the WAL for
Amazon EMR.
In the event that your cluster or the AZ becomes unhealthy or unavailable, you can create a new cluster, point it to the same S3 root directory and Amazon EMR WAL workspace, and automatically recover the data in WAL within a few minutes. For more information, see Restoring from Amazon EMR WAL.
Note
Amazon EMR retains your write-ahead log and its data for 30 days from the time you create your cluster. After 30 days, Amazon EMR automatically deletes your Amazon EMR WAL and its data. However, if you launch a new WAL-enabled cluster from the same S3 root directory, you can extend the use of your WAL for 30 days from the launch time of the new cluster. Amazon EMR will still clean up any WAL data from the first cluster after the initial 30-day period. For more information, see Restoring from Amazon EMR WAL.
The following sections describe how to set up and use Amazon EMR WAL with your HBase-enabled EMR cluster.
Topics
- Amazon EMR WAL workspaces
- Required permissions for Amazon EMR WAL
- Enabling Amazon EMR WAL
- Restoring from Amazon EMR WAL
- Using security configurations with Amazon EMR WAL
- Access Amazon EMR WAL through AWS PrivateLink
- Understanding Amazon EMR WAL pricing and metrics
- Tagging WAL workspaces
- Considerations and Regions for Amazon EMR WAL
- Amazon EMR WAL (EMRWAL) CLI reference