Menu
Amazon EMR
Developer Guide

Amazon EMR Hive Queries to Accommodate Partial DynamoDB Schemas

This documentation is for AMI versions 2.x and 3.x of Amazon EMR. For information about Amazon EMR releases 4.0.0 and above, see the Amazon EMR Release Guide. For information about managing the Amazon EMR service in 4.x releases, see the Amazon EMR Management Guide.

Amazon EMR Hive provides maximum flexibility when querying DynamoDB tables by allowing you to specify a subset of columns on which you can filter data, rather than requiring your query to include all columns. This partial schema query technique is effective when you have a sparse database schema and want to filter records based on a few columns, such as filtering on time stamps.

The following example shows how to use a Hive query to:

  • Create a DynamoDB table.

  • Select a subset of items (rows) in DynamoDB and further narrow the data to certain columns.

  • Copy the resulting data to Amazon S3.

DROP TABLE dynamodb; DROP TABLE s3; CREATE EXTERNAL TABLE dynamodb(hashKey STRING, recordTimeStamp BIGINT, fullColumn map<String, String>) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ( "dynamodb.table.name" = "myTable", "dynamodb.throughput.read.percent" = ".1000", "dynamodb.column.mapping" = "hashKey:HashKey,recordTimeStamp:RangeKey"); CREATE EXTERNAL TABLE s3(map<String, String>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://bucketname/path/subpath/'; INSERT OVERWRITE TABLE s3 SELECT item fullColumn FROM dynamodb WHERE recordTimeStamp < "2012-01-01";

The following table shows the query syntax for selecting any combination of items from DynamoDB.

Query example Result description
SELECT * FROM table_name; Selects all items (rows) from a given table and includes data from all columns available for those items.
SELECT * FROM table_name WHERE field_name =value; Selects some items (rows) from a given table and includes data from all columns available for those items.
SELECT column1_name, column2_name, column3_name FROM table_name; Selects all items (rows) from a given table and includes data from some columns available for those items.
SELECT column1_name, column2_name, column3_name FROM table_name WHERE field_name =value; Selects some items (rows) from a given table and includes data from some columns available for those items.