Using aggregate functions Using the GROUP BY and HAVING clauses Joining two DynamoDB tables Joining tables from different sources

Querying data in DynamoDB

The following examples show some ways that you can use HiveQL to query data stored in DynamoDB.

These examples refer to the ddb_features table in the tutorial (Step 5: Copy data to DynamoDB).

Using aggregate functions

HiveQL provides built-in functions for summarizing data values. For example, you can use the MAX function to find the largest value for a selected column. The following example returns the elevation of the highest feature in the state of Colorado.


SELECT MAX(elev_in_ft)
FROM ddb_features
WHERE state_alpha = 'CO';

Using the GROUP BY and HAVING clauses

You can use the GROUP BY clause to collect data across multiple records. This is often used with an aggregate function such as SUM, COUNT, MIN, or MAX. You can also use the HAVING clause to discard any results that do not meet certain criteria.

The following example returns a list of the highest elevations from states that have more than five features in the ddb_features table.


SELECT state_alpha, max(elev_in_ft)
FROM ddb_features
GROUP BY state_alpha
HAVING count(*) >= 5;

Joining two DynamoDB tables

The following example maps another Hive table (east_coast_states) to a table in DynamoDB. The SELECT statement is a join across these two tables. The join is computed on the cluster and returned. The join does not take place in DynamoDB.

Consider a DynamoDB table named EastCoastStates that contains the following data:


StateName       StateAbbrev

Maine           ME
New Hampshire   NH
Massachusetts   MA
Rhode Island    RI
Connecticut     CT
New York        NY
New Jersey      NJ
Delaware        DE
Maryland        MD
Virginia        VA
North Carolina  NC
South Carolina  SC
Georgia         GA
Florida         FL

Let's assume the table is available as a Hive external table named east_coast_states:


CREATE EXTERNAL TABLE ddb_east_coast_states (state_name STRING, state_alpha STRING)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "EastCoastStates",
"dynamodb.column.mapping" = "state_name:StateName,state_alpha:StateAbbrev");

The following join returns the states on the East Coast of the United States that have at least three features:


SELECT ecs.state_name, f.feature_class, COUNT(*)
FROM ddb_east_coast_states ecs
JOIN ddb_features f on ecs.state_alpha = f.state_alpha
GROUP BY ecs.state_name, f.feature_class
HAVING COUNT(*) >= 3;

Joining tables from different sources

In the following example, s3_east_coast_states is a Hive table associated with a CSV file stored in Amazon S3. The ddb_features table is associated with data in DynamoDB. The following example joins these two tables, returning the geographic features from states whose names begin with "New."


create external table s3_east_coast_states (state_name STRING, state_alpha STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://bucketname/path/subpath/';


SELECT ecs.state_name, f.feature_name, f.feature_class
FROM s3_east_coast_states ecs
JOIN ddb_features f
ON ecs.state_alpha = f.state_alpha
WHERE ecs.state_name LIKE 'New%';

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Processing HiveQL statements

Copying data to and from Amazon DynamoDB