Menu
Amazon EMR
Developer Guide

Query Data in Impala

In this section, you perform queries on the data that you loaded in the Impala tables in the previous steps.

To perform various queries on the test data in the Impala tables

  1. Perform a table scan through the entire customers table by running the following query at the Impala shell prompt:

    Copy
    SELECT COUNT(*) FROM customers WHERE name = 'Harrison SMITH';
  2. Perform an aggregation query that scans a single table, groups the rows, and calculates the size of each group by running the following query at the Impala shell prompt:

    Copy
    SELECT category, count(*) cnt FROM books GROUP BY category ORDER BY cnt DESC LIMIT 10;
  3. Perform a query that joins the books table with the transactions table to determine the top ten book categories that have the maximum total revenue during a certain period of time by running the following query at the Impala shell prompt:

    Copy
    SELECT tmp.book_category, ROUND(tmp.revenue, 2) AS revenue FROM ( SELECT books.category AS book_category, SUM(books.price * transactions.quantity) AS revenue FROM books JOIN [SHUFFLE] transactions ON ( transactions.book_id = books.id AND YEAR(transactions.transaction_date) BETWEEN 2008 AND 2010 ) GROUP BY books.category ) tmp ORDER BY revenue DESC LIMIT 10;
  4. Perform a memory-intensive query that joins three tables by running the following query at the Impala shell prompt:

    Copy
    SELECT tmp.book_category, ROUND(tmp.revenue, 2) AS revenue FROM ( SELECT books.category AS book_category, SUM(books.price * transactions.quantity) AS revenue FROM books JOIN [SHUFFLE] transactions ON ( transactions.book_id = books.id ) JOIN [SHUFFLE] customers ON ( transactions.customer_id = customers.id AND customers.state IN ('WA', 'CA', 'NY') ) GROUP BY books.category ) tmp ORDER BY revenue DESC LIMIT 10;

    Important

    Now that you've completed the tutorial, you should terminate the cluster to ensure that your account does not accrue additional charges. For more information, see Terminate a Cluster.