Menu
Amazon DynamoDB
Developer Guide (API Version 2012-08-10)

Processing HiveQL Statements

Hive is an application that runs on Hadoop, which is a batch-oriented framework for running MapReduce jobs. When you issue a HiveQL statement, Hive determines whether it can return the results immediately or whether it must submit a MapReduce job.

For example, consider the ddb_features table (from Tutorial: Working with Amazon DynamoDB and Apache Hive). The following Hive query prints state abbreviations and the number of summits in each:


SELECT state_alpha, count(*) 
FROM ddb_features 
WHERE feature_class = 'Summit' 
GROUP BY state_alpha;

Hive does not return the results immediately. Instead, it submits a MapReduce job, which is processed by the Hadoop framework. Hive will wait until the job is complete before it shows the results from the query:


AK  2
AL  2
AR  2
AZ  3
CA  7
CO  2
CT  2
ID  1
KS  1
ME  2
MI  1
MT  3
NC  1
NE  1
NM  1
NY  2
OR  5
PA  1
TN  1
TX  1
UT  4
VA  1
VT  2
WA  2
WY  3
Time taken: 8.753 seconds, Fetched: 25 row(s)

Monitoring and Canceling Jobs

When Hive launches a Hadoop job, it prints output from that job. The job completion status is updated as the job progresses. In some cases, the status might not be updated for a long time. (This can happen when you are querying a large DynamoDB table that has a low provisioned read capacity setting.)

If you need to cancel the job before it is complete, you can type Ctrl+C at any time.