Menu
Amazon EMR
Amazon EMR Release Guide

Using HBase Snapshots

HBase uses a built-in snapshot functionality to create lightweight backups of tables. In EMR clusters, these backups can be exported to Amazon S3 using EMRFS. You can create a snapshot on the master node using the HBase shell. This topic shows you how to run these commands interactively with the shell or through a step using command-runner.jar with either the AWS CLI or AWS SDK for Java. For more information about other types of HBase backups, see HBase Backup in the HBase documentation.

Create a snapshot using a table

hbase snapshot create -n snapshotName -t tableName

Using command-runner.jar from the AWS CLI:

aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
--steps Name="HBase Shell Step",Jar="command-runner.jar",\
Args=[ "hbase", "snapshot", "create","-n","snapshotName","-t","tableName"]

AWS SDK for Java:

HadoopJarStepConfig hbaseSnapshotConf = new HadoopJarStepConfig()
  .withJar("command-runner.jar")
  .withArgs("hbase","snapshot","create","-n","snapshotName","-t","tableName");

Note

If your snapshot name is not unique, the create operation fails with a return code of -1 or 255 but you may not see an error message that states what went wrong. To use the same snapshot name, delete it and then re-create it.

Delete a snapshot

hbase shell
>> delete_snapshot 'snapshotName'

View snapshot info

hbase snapshot info -snapshot snapshotName

Export a snapshot to Amazon S3

hbase snapshot export -snapshot snapshotName \
-copy-to s3://bucketName/folder -mappers 2

Using command-runner.jar from the AWS CLI:

aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
--steps Name="HBase Shell Step",Jar="command-runner.jar",\
Args=[ "hbase", "snapshot", "export","-snapshot","snapshotName","-copy-to","s3://bucketName/folder","-mappers","2"]

AWS SDK for Java:

HadoopJarStepConfig hbaseImportSnapshotConf = new HadoopJarStepConfig()
  .withJar("command-runner.jar")
  .withArgs("hbase","snapshot","export",
      "-snapshot","snapshotName","-copy-to",
      ""s3://bucketName/folder",
      "-mappers","2");

Import snapshot from Amazon S3

Note

Although this is an import, the HBase option used here is still export.

sudo -u hbase hbase snapshot export \
-D hbase.rootdir=s3://bucketName/folder \
-snapshot snapshotName \
-copy-to hdfs://masterPublicDNSName:8020/user/hbase \
-mappers 2

Using command-runner.jar from the AWS CLI:

aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
--steps Name="HBase Shell Step",Jar="command-runner.jar", \
Args=["sudo","-u","hbase","hbase snapshot export","-snapshot","snapshotName", \
"-D","hbase.rootdir=s3://bucketName/folder", \
"-copy-to","hdfs://masterPublicDNSName:8020/user/hbase","-mappers","2","-chmod","700"]

AWS SDK for Java:

HadoopJarStepConfig hbaseImportSnapshotConf = new HadoopJarStepConfig()
  .withJar("command-runner.jar")
  .withArgs("sudo","-u","hbase","hbase","snapshot","export", "-D","hbase.rootdir=s3://bucketName/folder",
      "-snapshot","snapshotName","-copy-to",
      "hdfs://masterPublicDNSName:8020/user/hbase",
      "-mappers","2","-chuser","hbase");

Restore a table from snapshots within the HBase shell

hbase shell
>> disable tableName
>> restore_snapshot snapshotName
>> enable tableName

HBase currently does not support all snapshot commands found in the HBase shell. For example, there is no HBase command-line option to restore a snapshot, so you must restore it within a shell. This means that command-runner.jar must run a Bash command.

Note

Because the command used here is echo, it is possible that your shell command will still fail even if the command run by Amazon EMR returns a 0 exit code. Check the step logs if you choose to run a shell command as a step.

echo 'disable tableName; \
restore_snapshot snapshotName; \
enable tableName' | hbase shell

Here is the step using the AWS CLI. First, create the following snapshot.json file:

[
  {
    "Name": "restore",
    "Args": ["bash", "-c", "echo $'disable \"tableName\"; restore_snapshot \"snapshotName\"; enable \"tableName\"' | hbase shell"],
    "Jar": "command-runner.jar",
    "ActionOnFailure": "CONTINUE",
    "Type": "CUSTOM_JAR"
  }
]
aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
--steps file://./snapshot.json

AWS SDK for Java:

HadoopJarStepConfig hbaseRestoreSnapshotConf = new HadoopJarStepConfig()
  .withJar("command-runner.jar")
  .withArgs("bash","-c","echo $'disable \"tableName\"; restore_snapshot \"snapshotName\"; enable \"snapshotName\"' | hbase shell");