Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Upgrade to Hive 0.11 (MySQL on Amazon RDS)

The following procedures explain how to upgrade a Hive metastore stored outside of the cluster, as described in Create a Metastore Outside the Hadoop Cluster.

If your metastore is stored on the master node (the default behavior) please use the procedures in Upgrade to Hive 0.11 (MySQL on the Master Node) instead.

In order to connect to the Amazon RDS database from the master node, you need to add the EC2 Security Group elasticmapreduce-master to the DB Security Groups. You can do this as described in the following procedure.

To configure security groups to enable connections to Amazon RDS from the master node

  1. From the Amazon RDS Console, click DB Security Groups in the left pane.

  2. Select the security group to modify in the center pane. This should be the security group that was used to launch the MySQL database hosting the Hive metastore. The default security group is default.

  3. In the information pane at the bottom, select EC2 Security Group for Connection Type and select ElasticMapReduce-master for EC2 Security Group.

  4. Click Add.

To upgrade the metastore from 0.8 to 0.11 (MySQL on Amazon RDS)

  1. Stop running Hive processes during the upgrade procedure. This ensures that the database is not altered while the upgrade scripts are running.

  2. Back up the Amazon RDS database as described in Creating a DB Snapshot in the Amazon Relational Database Service User Guide.

  3. Use SSH to connect to the master node of the Hive cluster to upgrade. For more information, see Connect to the Master Node Using SSH.

  4. Copy the upgrade scripts from Apache to the master node. You can do this by running the wget utility on the master node. (The view=co request variable in the following URLs ensures that you get the plain-text version of the scripts, not the HTML-encoded version.)

     wget -O 010-HIVE-3072.mysql.sql http://svn.apache.org/viewvc/hive/branches/branch-0.11/metastore/scripts/upgrade/mysql/010-HIVE-3072.mysql.sql?view=co
     wget -O 011-HIVE-3649.mysql.sql http://svn.apache.org/viewvc/hive/branches/branch-0.11/metastore/scripts/upgrade/mysql/011-HIVE-3649.mysql.sql?view=co  
     wget -O 012-HIVE-1362.mysql.sql http://svn.apache.org/viewvc/hive/branches/branch-0.11/metastore/scripts/upgrade/mysql/012-HIVE-1362.mysql.sql?view=co
     wget -O upgrade-0.9.0-to-0.10.0.mysql.sql http://svn.apache.org/viewvc/hive/branches/branch-0.11/metastore/scripts/upgrade/mysql/upgrade-0.9.0-to-0.10.0.mysql.sql?view=co
    
  5. Use the MySQL monitor installed on the master node to connect to the Amazon RDS database. Before you can do this, you must have completed the steps in the preceding procedure, To configure security groups to enable connections to Amazon RDS from the master node.

    From the master node, run the following command to connect to the Amazon RDS database, where is myinstance the name of the database, mydnsnameexample is the custom portion of the DNS name assigned to the database and mymasteruser is the master user on the database. For more information about how to connect to an Amazon RDS instance, go to Connecting to a DB Instance Running the MySQL Database Engine in the Amazon Relational Database Service User Guide.

    mysql -h myinstance.mydnsnameexample.rds.amazonaws.com -P 3306 -u mymasteruser -p
    				
  6. From the MySQL Monitor command prompt, find the name of the database that contains your Hive metastore. You can do this by running the following command.

    mysql> show databases;					
    				

    This returns results such as the following. The database that starts with "hive_" is the one that contains your Hive MetaStore.

    +--------------------+
    | Database           |
    +--------------------+
    | information_schema |
    | InstanceController |
    | hive_081           |
    | mysql              |
    +--------------------+
    4 rows in set (0.00 sec)
    				
  7. Exit MySQL Monitor.

    mysql> exit
    				
  8. Export the current metastore schema from Amazon RDS. The following mysqldump command extracts only the schema content.

    Note

    If your Amazon RDS instance runs mysql 5.6, you will not be able to execute following mysqldump command. This is because the EMR master node's mysqldump version is 5.1 and running following command results in version mismatch errors. A workaround is to run the dump command on a machine that has compatible mysqldump utility installed.

    mysqldump --skip-add-drop-table --no-data hive_081 > my-schema-0.8.1.mysql.sql \
    -u root -h myinstance.mydnsnameexample.rds.amazonaws.com \
    -P 3306 -u mymasteruser -p
    				
  9. Compare your current schema against the official Apache Hive schema listed at http://svn.apache.org/viewvc/hive/branches/branch-0.11/metastore/scripts/upgrade/mysql/. If you are upgrading from 0.8 to 0.11, compare the schema you exported against the version in hive-schema-0.8.0.mysql.sql. The schemas should match. If you have made custom changes to the schema, you may need to roll those back in order for the upgrade scripts to work properly.

    Differences you may find:

    • Missing tables. By default, Hive only creates schema elements when they are used. If you have not created a certain type of Hive catalog object, the corresponding table will not exist in your schema. You must create these missing tables for the upgrade script to succeed. You can do this by hand or by running the official schema DDL script (located at http://svn.apache.org/viewvc/hive/branches/branch-0.11/metastore/scripts/upgrade/mysql/hive-schema-0.8.0.mysql.sql?view=co) against your metastore. This script ignores tables that already exist, and will only create those that are missing.

    • Extra tables. If your schema contains tables named NUCLEUS_TABLES or SEQUENCE_TABLE, these will not affect the upgrade script. You do not need to remove them.

    • Reversed Column Constraint Names. If a table has multiple constraints, the names may be reversed between your schema and the canonical schema. For example, if a table contains PARTITIONS_FK1 and PARTITIONS_FK2 which reference SDS.SD_ID and TBLS.TBL_ID, your schema may instead connect PARTITIONS_FK1 to TBLS.TBL_ID and PARTITIONS_FK2 to SDS.SD_ID. This will not affect the upgrade script and can be ignored.

    • Changes in Column and Constraint Names. If your schema contains tables with unique keys named "UNIQUE<tab_name>" or columns named "IDX" you will need to rename these to "UNIQUE_<tab_name>" and "INTEGER_IDX" before you run the upgrade script. The reason for this is explained in HIVE-1435.

  10. Use the MySQL monitor installed on the master node to connect to the Amazon RDS database.

    mysql -h myinstance.mydnsnameexample.rds.amazonaws.com \
    -P 3306 -u mymasteruser -p				
    				
  11. Run the upgrade-0.9.0-to-0.10.0.mysql.sql script using the source command in the MySQL command line. This is shown in the following example, where hive_081 is the database containing the metastore.

    mysql> use hive_081;
    mysql> source upgrade-0.9.0-to-0.10.0.mysql.sql;
    				

    The script should complete without error.

  12. Exit MySQL Monitor.

    mysql> exit
    				
  13. Export the upgraded metastore schema from Amazon RDS. The following mysqldump command extracts only the schema content.

    Note

    The following command may generate a version mismatch error if your mysql version is incorrect. In that case, refer to the workaround mentioned in the previous steps.

    mysqldump --skip-add-drop-table --no-data hive_081 > metastore_upgraded_backup.sql \
    -u root -h myinstance.mydnsnameexample.rds.amazonaws.com \
    -P 3306 -u mymasteruser -p				
    				
  14. Compare this schema to the official Apache Hive 0.11 schema: http://svn.apache.org/viewvc/hive/branches/branch-0.11/metastore/scripts/upgrade/mysql/hive-schema-0.10.0.mysql.sql. They should match.

  15. Backup the upgraded metastore by backing up the Amazon RDS database as described in Creating a DB Snapshot in the Amazon Relational Database Service User Guide.

To move your upgraded configuration files and metastore to a Hive 0.11 cluster (MySQL on Amazon RDS)

  1. Create a new cluster on Hadoop 1.0.3 (AMI version 2.2 or later). Follow the instructions at Create a Metastore Outside the Hadoop Cluster to have the Hive cluster use the metastore you upgraded to Hive 0.11 in the previous procedure.

  2. Use SSH to connect to the master node. For more information about how to do this, see Connect to the Master Node Using SSH.

  3. Use scp to copy your upgraded configuration files to the new cluster.

  4. Copy custom configuration settings from hive-default.xml and hive-site.xml to the hive-site.xml on the new cluster (found on the master node at hive/conf/.) Take care not to overwrite any settings used by Amazon EMR.