| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
You can choose to run Hive in several different configurations. You set the
--hadoop-version, --hive-versions, and --ami-version
parameters in the job creation call as shown in the following
table.
The default configuration for Amazon EMR is the latest version of Hive running on the latest AMI version.
The Amazon EMR console does not support Hive versioning and always loads the latest version of Hive.
Versions of the Amazon EMR CLI released on 9 April 2012 and later load the latest version of Hive by default. To use a version of Hive other than the
latest, specify the --hive-versions parameter when you create the cluster. Versions of the Amazon EMR CLI
released prior to 9 April 2012 load the default configuration of Hive.
Calls to the API will launch the default configuration of Hive, unless you specify
--hive-versions as an argument to the step that
loads Hive onto the cluster during the call to RunJobFlow.
| Hive Version | Compatible Hadoop Versions | Hive Version Notes |
|---|---|---|
| 0.8.1.7 | 1.0.3 |
|
| 0.8.1.6 | 1.0.3 |
|
| 0.8.1.5 | 1.0.3 |
|
| 0.8.1.4 | 1.0.3 |
Updates the HBase client on Hive clusters to version 0.92.0 to match the version of HBase used on HBase clusters. This fixes issues that occurred when connecting to an HBase cluster from a Hive cluster. |
| 0.8.1.3 | 1.0.3 |
Adds support for Hadoop 1.0.3. |
| 0.8.1.2 | 1.0.3, 0.20.205 |
Fixes an issue with duplicate data in large clusters. |
| 0.8.1.1 | 1.0.3, 0.20.205 |
Adds support for MapR and HBase. |
| 0.8.1 | 1.0.3, 0.20.205 |
Introduces new features and improvements. The most significant of these are as follows. For complete information about the changes in Hive 0.8.1, go to the Apache Hive 0.8.1 Release Notes.
|
| 0.7.1.4 | 0.20.205 |
Prevents the "SET" command in Hive from changing the current database of the current session. |
| 0.7.1.3 | 0.20.205 |
Adds the |
| 0.7.1.2 | 0.20.205 |
Modifies the way files are named in Amazon S3 for dynamic partitions. It prepends file names in Amazon S3 for dynamic partitions with a
unique identifier. Using Hive 0.7.1.2 you can run queries in parallel with |
| 0.7.1.1 | 0.20.205 |
Introduces support for accessing Amazon DynamoDB, as detailed in Export, Import, Query, and Join Tables in Amazon DynamoDB Using Amazon EMR. It is a minor version of 0.7.1 developed by the Amazon EMR team. When specified as the Hive version, Hive 0.7.1.1 overwrites the Hive 0.7.1 directory structure and configuration with its own values. Specifically, Hive 0.7.1.1 matches Apache Hive 0.7.1 and uses the Hive server port, database, and log location of 0.7.1 on the cluster. |
| 0.7.1 | 0.20.205, 0.20, 0.18 |
Improves Hive query performance for a large number of partitions and for Amazon S3 queries. Changes Hive to skip commented lines. |
| 0.7 | 0.20, 0.18 |
Improves Recover Partitions to use less memory, fixes the hashCode method, and introduces the ability to use the HAVING clause to filter on groups by expressions. |
| 0.5 | 0.20, 0.18 |
Fixes issues with FileSinkOperator and modifies UDAFPercentile to tolerate null percentiles. |
| 0.4 | 0.20, 0.18 |
Introduces the ability to write to Amazon S3, run Hive scripts from Amazon S3, and recover partitions from table data stored in Amazon S3. Also creates a separate namespace for managing Hive variables. |
For additional details about the changes in a version of Hive, go to Supported Hive Versions. For information about Hive patches and functionality developed by the Amazon EMR team, go to Additional Features of Hive in Amazon EMR.
To specify the Hive version when creating the cluster
Use the --hive-versions parameter. The following command-line example
creates an interactive Hive cluster running Hadoop 0.20 and Hive 0.7.1.
In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --create --alive --name "Test Hive" \ --hadoop-version0.20\ --num-instances 5 --instance-type m1.large \ --hive-interactive \ --hive-versions0.7.1
Windows users:
ruby elastic-mapreduce --create --alive --name "Test Hive" --hadoop-version0.20--num-instances 5 --instance-type m1.large --hive-interactive --hive-versions0.7.1
The --hive-versions parameter must come after any reference to
the parameters --hive-interactive,
--hive-script, or --hive-site.
To specify the latest Hive version when creating the cluster
Use the --hive-versions parameter with the latest keyword.
The following command-line example
creates an interactive Hive cluster running the latest version of Hive.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --create --alive --name "Test Hive" \
--hadoop-version 0.20 \
--num-instances 5 --instance-type m1.large \
--hive-interactive \
--hive-versions latest Windows users:
ruby elastic-mapreduce --create --alive --name "Test Hive" --hadoop-version 0.20 --num-instances 5 --instance-type m1.large --hive-interactive --hive-versions latest To specify the Hive version for a cluster that is interactive and uses a Hive script
If you have a cluster that uses Hive both interactively and from a script, you must set the Hive version for each type of use. The following command-line example illustrates setting both the interactive and the script version of Hive to use 0.7.1.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --create --debug --log-uri s3://myawsbucket/perftest/logs/ \ --name "Testing m1.large AMI 1" \ --ami-version latest --hadoop-version 0.20 \ --instance-type m1.large --num-instances 5 \ --hive-interactive --hive-versions 0.7.1.2 \ --hive-script s3://myawsbucket/perftest/hive-script.hql --hive-versions 0.7.1.2
Windows users:
ruby elastic-mapreduce --create --debug --log-uri s3://myawsbucket/perftest/logs/ --name "Testing m1.large AMI --ami-version latest --hadoop-version 0.20 --instance-type m1.large --num-instances 5 --hive-interactive --hive-versions 0.7.1.2 --hive-script s3://myawsbucket/perftest/hive-script.hql --hive-versions 0.7.1.2
To load multiple versions of Hive for a given cluster
Use the --hive-versions parameter and separate the version numbers by comma. The following command-line example creates an interactive cluster running Hadoop 0.20 and multiple versions of Hive. With this configuration, you can use any of the installed versions of Hive on the cluster.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --create --alive --name "Test Hive" \ --hadoop-version0.20\ --num-instances 5 --instance-type m1.large \ --hive-interactive \ --hive-versions0.5,0.7.1
Windows users:
ruby elastic-mapreduce --create --alive --name "Test Hive" --hadoop-version0.20--num-instances 5 --instance-type m1.large --hive-interactive --hive-versions0.5,0.7.1
To call a specific version of Hive
Add the version number to the call. For example, hive-0.5 or hive-0.7.1.
Note
If you have multiple versions of Hive loaded on a cluster, calling hive will access the
default version of Hive or the version loaded last
if there are multiple --hive-versions
parameters specified in the cluster creation call. When the comma-separated syntax is used with --hive-versions to load
multiple versions, hive will access the default version of Hive.
Note
When running multiple versions of Hive concurrently, all versions of Hive can read the same data. They cannot, however, share metadata. Use an external metastore if you want multiple versions of Hive to read and write to the same location.
You can use the --print-hive-version command to display the version of the Hive
currently in use for a given cluster. This is a useful command to call after you have
upgraded to a new version of Hive to confirm that the upgrade succeeded, or when you are
using multiple versions of Hive and need to confirm which version is currently running.
The syntax for this is as follows, where
JobFlowID is the identifier of the cluster to check the Hive version on.
In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --jobflow JobFlowID --print-hive-versionWindows users:
ruby elastic-mapreduce --jobflow JobFlowID --print-hive-version