Tutorial: Launching and Querying Impala Clusters on Amazon EMR


Amazon EMR installing Impala at cluster creation time is not supported in current release versions of Amazon EMR. The examples and tutorial in this section require Amazon EMR release versions 3.11.0, 3.10.0, and 3.9.0 and an older version of Impala (1.2.4) is installed.

The instructions in this tutorial include how to:

  • Sign up for Amazon EMR

  • Launch a long-running cluster with Impala installed

  • Connect to the cluster using SSH

  • Generate a test data set

  • Create Impala tables and populate them with data

  • Perform interactive queries on Impala tables

Amazon EMR provides several tools you can use to launch and manage clusters: the console, a CLI, an API, and several SDKs. For more information about these tools, see What Tools are Available for Amazon EMR?.