Copying Data Between DynamoDB and a Native Hive Table
If you have data in a DynamoDB table, you can copy the data to a native Hive table. This will give you a snapshot of the data, as of the time you copied it.
You might decide to do this if you need to perform many HiveQL queries, but do not want to consume provisioned throughput capacity from DynamoDB. Because the data in the native Hive table is a copy of the data from DynamoDB, and not "live" data, your queries should not expect that the data is up-to-date.
The examples in this section are written with the assumption you followed the steps in Tutorial: Working with Amazon DynamoDB and Apache Hive and have an external table that is mastered in DynamoDB (ddb_features).
From Native Hive Table to DynamoDB
You can create a native Hive table and populate it with data from ddb_features, like this:
CREATE TABLE features_snapshot AS SELECT * FROM ddb_features;
You can then refresh the data at any time:
INSERT OVERWRITE TABLE features_snapshot SELECT * FROM ddb_features;
In these examples, the subquery
SELECT * FROM ddb_features
will retrieve all of the data from ddb_features. If you
only want to copy a subset of the data, you can use a
in the subquery.
The following example creates a native Hive table, containing only some of the attributes for lakes and summits:
CREATE TABLE lakes_and_summits AS SELECT feature_name, feature_class, state_alpha FROM ddb_features WHERE feature_class IN ('Lake','Summit');
From DynamoDB to Native Hive Table
Use the following HiveQL statement to copy the data from the native Hive table to ddb_features:
INSERT OVERWRITE TABLE ddb_features SELECT * FROM features_snapshot;