Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)

Supported Hive Versions

The default configuration for Amazon EMR is the latest version of Hive running on the latest AMI version. The following versions of Hive are available:

Hive VersionCompatible Hadoop VersionsHive Version Notes


Introduces the following features, improvements, and backwards incompatibilities. For more information, see Apache Hive 0.13.1 Release Notes and Apache Hive 0.13.0 Release Notes.

  • Vectorized query: processes thousand-row blocks instead of processing by row.

  • In-memory cache: hot data kept in-memory for quick reads.

  • Faster plan serialization

  • Support for DECIMAL and CHAR datatypes

  • Sub-query for IN, NOT IN, EXISTS and NOT EXISTS (correlated and uncorrelated)

  • JOIN conditions in the WHERE clause

Other feature contributed by Amazon EMR:

  • Includes an optimization to Hive windowing functions that allows them to scale to large data sets.

Notable backward incompatibilities:

  • Does not support -el flag for pushing error-logs to Amazon S3 bucket in case a query failed.

  • Does not support RECOVER PARTITION syntax. Instead use the native capability, MSCK REPAIR.

  • Round(sum( c ),2) over w1 -> round(sum( c ) over w1,2) (in several places). This syntax was changed in Hive 0.12. See HIVE-4214.

  • Default precision and scale was changed for DECIMAL. Compared to previous Hive versions, DECIMAL in Hive 13 is DECIMAL(10,0).

  • The default SerDe for RCFile-backed tables is LazyBinaryColumnarSerDe in Apache Hive 0.12 and above. This means tables that were created with Hive versions 0.12 or greater will not be able to read data files which were generated with Hive 0.11 correctly unless hive.default.rcfile.serde is set to org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe. See HIVE-4475.

Other notes and known issues:

  • When a Hive database is created with a custom location, the CREATE TABLE AS SELECT (CTAS) operation ignores it. It takes the location from parameter hive.metastore.warehouse.dir instead of the database's properties. See HIVE-3486.

  • When a user loads data into a table using OVERWRITE with a different file it is not being overwritten. See HIVE-6209.

  • Since Amazon EMR uses HiveServer2, the username must be hadoop with no password.

The following patches Hive 0.14.0 patches were backported to this release:



Introduces the following features and improvements. For more information, see Apache Hive 0.11.0 Release Notes.

  • Adds the Parquet library.

  • Fixes a problem related to the Avro serializer/deserializer accepting a schema URL in Amazon S3.

  • Fixes a problem with Hive returning incorrect results with indexing turned on.

  • Change Hive's log level from DEBUG to INFO.

  • Fixes a problem when tasks do not report progress while deleting files in Amazon S3 dynamic partitions.

  • This Hive version fixes the following issues:



  • Creates symlink /home/hadoop/hive/lib/hive_contrib.jar for backward compatibility.

  • Fixes a problem that prevents installation of Hive 0.11.0 with IAM roles.




Introduces the following features and improvements. For more information, see Apache Hive 0.11.0 Release Notes.

  • Simplifies hive.metastore.uris and the hive.metastore.local configuration settings. (HIVE-2585)

  • Changes the internal representation of binary type to byte[]. (HIVE-3246)

  • Allows HiveStorageHandler.configureTableJobProperties() to signal to its handler whether the configuration is input or output. (HIVE-2773)

  • Add environment context to metastore Thrift calls. (HIVE-3252)

  • Adds a new, optimized row columnar file format. (HIVE-3874)

  • Implements TRUNCATE. (HIVE-446)

  • Adds LEAD/LAG/FIRST/LAST analytical windowing functions. (HIVE-896)

  • Adds DECIMAL data type. (HIVE-2693)

  • Supports Hive list bucketing/DML. (HIVE-3073)

  • Supports custom separator for file output. (HIVE-3682)

  • Supports ALTER VIEW AS SELECT. (HIVE-3834)

  • Adds method to retrieve uncompressed/compressed sizes of columns from RC files. (HIVE-3897)

  • Allows updating bucketing/sorting metadata of a partition through the CLI. (HIVE-3903)

  • Allows PARTITION BY/ORDER BY in OVER clause and partition function. (HIVE-4048)

  • Improves GROUP BY syntax. (HIVE-581)

  • Adds more query plan optimization rules. (HIVE-948)

  • Allows CREATE TABLE LIKE command to accept TBLPROPERTIES. (HIVE-3527)

  • Fixes sort-merge join with sub-queries. (HIVE-3633)

  • Supports altering partition column type. (HIVE-3672)

  • De-emphasizes mapjoin hint. (HIVE-3784)

  • Changes object inspectors to initialize based on partition metadata. (HIVE-3833)

  • Adds merge map-job followed by map-reduce job. (HIVE-3952)

  • Optimizes hive.enforce.bucketing and hive.enforce.sorting insert. (HIVE-4240)
  • Fixes ColumnPruner so that it works on LateralView. (HIVE-3226)

  • Fixes utc_from_timestamp and utc_to_timestamp to return correct results. (HIVE- 2803)

  • Fixes a NullPointerException error on a join query with authorization enabled. (HIVE-3225)

  • Improves mapjoin filtering in the ON condition. (HIVE-2101)

  • Preserves the filter on a OUTER JOIN condition while merging the join tree. (HIVE- 3070)

  • Fixes ConcurrentModificationException on a lateral view used with explode. (HIVE- 2540)

  • Fixes an issue where an insert into a table overwrites the existing table, if the table name contains an uppercase character. (HIVE-3062)

  • Fixes an issue where jobs fail when there are multiple aggregates in a query. (HIVE-3732)

  • Fixes a NullPointerException error in nested user-defined aggregation functions (UDAFs). (HIVE-1399)

  • Provides an error message when using a user- defined aggregation function (UDAF) in the place of a user-defined function (UDF). (HIVE-2956)

  • Fixes an issue where Timestamp values without a nano-second part break the following columns in a row. (HIVE- 3090)

  • Fixes an issue where the move task is not picking up changes to hive.exec.max.dynamic.partitions set in the Hive CLI. (HIVE-2918)

  • Adds the ability to atomically add drop partitions from the metastore. (HIVE-2777)

  • Adds partition pruning pushdown to the database for non-string partitions. (HIVE-2702)

  • Adds support for merging small files in Amazon S3 at the end of a map-only job using the hive.merge.mapfiles parameter. If the output path is in Amazon S3, the hive.merge.smallfiles.avgsize setting is ignored. For more information, see Hive File Merge Behavior with Amazon S3 and Hive Configuration Variables.

  • Improves clean-up of junk files after an INSERT OVERWRITE command.
  • Adds support for the new DynamoDB binary data type.

  • Adds the patch Hive-2955, which fixes an issue where queries consisting only of metadata always return an empty value.

  • Adds the patch Hive-1376, which fixes an issue where Hive would crash on an empty result set generated by "where false" clause queries.

  • Fixes the RCFile interaction with Amazon Simple Storage Service (Amazon S3).

  • Replaces JetS3t with the AWS SDK for Java.

  • Uses BatchWriteItem for puts to DynamoDB.

  • Adds schemaless mapping of DynamoDB tables into a Hive table using a Hive map<string, string> column.

Updates the HBase client on Hive clusters to version 0.92.0 to match the version of HBase used on HBase clusters. This fixes issues that occurred when connecting to an HBase cluster from a Hive cluster.

Adds support for Hadoop 1.0.3., 0.20.205

Fixes an issue with duplicate data in large clusters., 0.20.205

Adds support for MapR and HBase., 0.20.205

Introduces new features and improvements. The most significant of these are as follows. For more information about the changes in Hive 0.8.1, go to Apache Hive 0.8.1 Release Notes.

Prevents the "SET" command in Hive from changing the current database of the current session.

Adds the dynamodb.retry.duration option, which you can use to configure the timeout duration for retrying Hive queries against tables in Amazon DynamoDB. This version of Hive also supports the dynamodb.endpoint option, which you can use to specify the Amazon DynamoDB endpoint to use for a Hive table. For more information about these options, see Hive Options.

Modifies the way files are named in Amazon S3 for dynamic partitions. It prepends file names in Amazon S3 for dynamic partitions with a unique identifier. Using Hive you can run queries in parallel with set hive.exec.parallel=true. It also fixes an issue with filter pushdown when accessing DynamoDB with spare data sets.

Introduces support for accessing DynamoDB, as detailed in Export, Import, Query, and Join Tables in DynamoDB Using Amazon EMR. It is a minor version of 0.7.1 developed by the Amazon EMR team. When specified as the Hive version, Hive overwrites the Hive 0.7.1 directory structure and configuration with its own values. Specifically, Hive matches Apache Hive 0.7.1 and uses the Hive server port, database, and log location of 0.7.1 on the cluster., 0.20, 0.18

Improves Hive query performance for a large number of partitions and for Amazon S3 queries. Changes Hive to skip commented lines.

0.70.20, 0.18

Improves Recover Partitions to use less memory, fixes the hashCode method, and introduces the ability to use the HAVING clause to filter on groups by expressions.

0.50.20, 0.18

Fixes issues with FileSinkOperator and modifies UDAFPercentile to tolerate null percentiles.

0.40.20, 0.18

Introduces the ability to write to Amazon S3, run Hive scripts from Amazon S3, and recover partitions from table data stored in Amazon S3. Also creates a separate namespace for managing Hive variables.

The AWS CLI does not support installing specific Hive versions. When using the AWS CLI, the latest version of Hive included on the AMI is installed by default.

Display the Hive Version

You can view the version of Hive installed on your cluster using the console. In the console, the Hive version is displayed on the Cluster Details page. In the Configuration Details column, the Applications field displays the Hive version.