Creating an External Table in Hive
In Tutorial: Working with Amazon DynamoDB and Apache Hive, you created an external Hive table that mapped to a DynamoDB table. When you issued HiveQL statements against the external table, the read and write operations were passed through to the DynamoDB table.
You can think of an external table as a pointer to a data source that is managed and
stored elsewhere. In this case, the underlying data source is a DynamoDB table. (The table
must already exist. You cannot create, update, or delete a DynamoDB table from within
Hive.) You use the
CREATE EXTERNAL TABLE statement to create the external
table. After that, you can use HiveQL to work with data in DynamoDB, as if that data were
stored locally within Hive.
You can use
INSERT statements to insert data into an external table
SELECT statements to select data from it. However, you cannot use
DELETE statements to manipulate data in the
If you no longer need the external table, you can remove it using the
TABLE statement. In this case,
DROP TABLE only removes the
external table in Hive. It does not affect the underlying DynamoDB table or any of its
CREATE EXTERNAL TABLE Syntax
The following shows the HiveQL syntax for creating an external Hive table that maps to a DynamoDB table:
CREATE EXTERNAL TABLE
hive_column2_datatype...) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ( "dynamodb.table.name" =
"dynamodb_table", "dynamodb.column.mapping" = "
Line 1 is the start of the
CREATE EXTERNAL TABLE statement, where you
provide the name of the Hive table (hive_table) you want to
Line 2 specifies the columns and data types for hive_table. You need to define columns and data types that correspond to the attributes in the DynamoDB table.
Line 3 is the
STORED BY clause, where you specify a class that
handles data management between the Hive and the DynamoDB table. For DynamoDB,
STORED BY should be set to
Line 4 is the start of the
TBLPROPERTIES clause, where you define the
following parameters for
dynamodb.table.name—the name of the DynamoDB table.
dynamodb.column.mapping—pairs of column names in the Hive table and their corresponding attributes in the DynamoDB table. Each pair is of the form hive_column_name:dynamodb_attribute_name, and the pairs are separated by commas.
Note the following:
The name of the Hive table name does not have to be the same as the DynamoDB table name.
The Hive table column names do not have to be the same as those in the DynamoDB table.
The table specified by
dynamodb.table.namemust exist in DynamoDB.
You must map the key schema attributes for the DynamoDB table. This includes the partition key and the sort key (if present).
You do not have to map the non-key attributes of the DynamoDB table. However, you will not see any data from those attributes when you query the Hive table.
If the data types of a Hive table column and a DynamoDB attribute are incompatible, you will see
NULLin these columns when you query the Hive table.
CREATE EXTERNAL TABLE statement does not perform any
validation on the
TBLPROPERTIES clause. The values you provide for
are only evaluated by the
DynamoDBStorageHandler class when you
attempt to access the table.
Data Type Mappings
The following table shows DynamoDB data types and compatible Hive data types:
|DynamoDB Data Type||Hive Data Type|
The following DynamoDB data types are not supported by the
DynamoDBStorageHandler class, so they cannot be used with
If you want to map a DynamoDB attribute of type Number, you must choose an appropriate Hive type:
BIGINTtype is for 8-byte signed integers. It is the same as the
longdata type in Java.
DOUBLEtype is for 8-bit double precision floating point numbers. It is the same as the
doubletype in Java.
If you have numeric data stored in DynamoDB that has a higher precision than the Hive data type you choose, then accessing the DynamoDB data could cause a loss of precision.
If you export data of type Binary from DynamoDB to (Amazon S3) or HDFS, the data is stored as a Base64-encoded string. If you import data from Amazon S3 or HDFS into the DynamoDB Binary type, you must ensure the data is encoded as a Base64 string.