Menu
Amazon EMR
Amazon EMR Release Guide

Apache Phoenix

Apache Phoenix is an application used for OLTP workloads and low-level latency SQL. Phoenix uses Apache HBase as its backing store and you can connect to it using a JDBC driver bundled with Phoenix. For more information, see https://phoenix.apache.org/.

Release Information

ApplicationAmazon EMR Release LabelComponents installed with this application

Phoenix 4.7.0

emr-5.3.1

emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-mapred, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hbase-hmaster, hbase-client, hbase-region-server, phoenix-library, phoenix-query-server, zookeeper-client, zookeeper-server


Creating a Cluster with Phoenix

Install Phoenix by choosing that application when you create the cluster. The following procedure creates a cluster with Phoenix and HBase installed. For more information about launching clusters with the console, see Step 3: Launch an Amazon EMR Cluster in the Amazon EMR Management Guide.

To launch a cluster with Phoenix installed using the console

  1. Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/.

  2. Choose Create cluster to use Quick Create.

  3. For Software Configuration, choose Amazon Release Version emr-4.7.0 or later.

  4. For Select Applications, choose either All Applications or Phoenix and HBase.

    Note

    Selecting Phoenix always includes and installs HBase components, but this is made explicit in examples.

  5. Select other options as necessary and then choose Create cluster.

To launch a cluster with Phoenix and HBase using the AWS CLI

  • Create the cluster with the following command:

    aws emr create-cluster --name "Cluster with Phoenix" --release-label  \
    --applications Name=Phoenix Name=HBase --ec2-attributes KeyName=myKey \
    --instance-type m3.xlarge --instance-count 3 --use-default-roles

    Note

    For Windows, replace the above Linux line continuation character (\) with the caret (^).

Configuring Phoenix

You configure Phoenix by setting values in hbase-site.xml using the hbase-site configuration classification when you create your cluster.

For more information, see Configuration and Tuning in the Phoenix documentation.

To change a setting in Phoenix

  • Create a cluster with Phoenix and HBase installed and set phoenix.schema.dropMetaData to false, using the following command:

    aws emr create-cluster --release-label  --applications Name=Phoenix \
    Name=HBase --instance-type m3.xlarge --instance-count 2 --configurations https://s3.amazonaws.com/mybucket/myfolder/myConfig.json

    Note

    For Windows, replace the above Linux line continuation character (\) with the caret (^).

    myConfig.json:

    [
        {
          "Classification": "hbase-site",
          "Properties": {
            "phoenix.schema.dropMetaData": "false"
          }
        }
      ]
    

    Note

    If you plan to store your configuration in Amazon S3, you must specify the URL location of the object. For example:

    aws emr create-cluster --release-label  --applications Name=Phoenix Name=Hive \
    Name=HBase --instance-type m3.xlarge --instance-count 3 --configurations https://s3.amazonaws.com/mybucket/myfolder/myConfig.json

Phoenix Clients

You connect to Phoenix using either a JDBC client built with full dependencies or using the "thin client" that uses the Phoenix Query Server and can only be run on a master node of a cluster (e.g. by using an SQL client, a step, command line, SSH port forwarding, etc.). When using the "fat" JDBC client, it still needs to have access to all nodes of the cluster because it connects to HBase services directly. The "thin" Phoenix client only needs access to the Phoenix Query Server at a default port 8765. There are several scripts within Phoenix that use these clients.

Use an Amazon EMR step to query using Phoenix

The following procedure restores a snapshot from HBase and uses that data to run a Phoenix query. You can extend this example or create a new script that leverages Phoenix's clients to suit your needs.

  1. Create a cluster with Phoenix installed, using the following command:

    aws emr create-cluster --name "Cluster with Phoenix" --log-uri s3://myBucket/myLogFolder --release-label  \
    --applications Name=Phoenix Name=HBase --ec2-attributes KeyName=myKey \
    --instance-type m3.xlarge --instance-count 3 --use-default-roles
  2. Create then upload the following files to Amazon S3:

    copySnapshot.sh

    sudo su hbase -s /bin/sh -c 'hbase snapshot export \
    -D hbase.rootdir=s3://us-east-1.elasticmapreduce.samples/hbase-demo-customer-data/snapshot/.hbase-snapshot \
    -snapshot customer_snapshot1 \
    -copy-to hdfs://masterDNSName:8020/user/hbase \
    -mappers 2 -chuser hbase -chmod 700'

    runQuery.sh

    aws s3 cp s3://myBucket/phoenixQuery.sql /home/hadoop/
    /usr/lib/phoenix/bin/sqlline-thin.py http://localhost:8765 /home/hadoop/phoenixQuery.sql

    phoenixQuery.sql

    CREATE VIEW "customer" (
    pk VARCHAR PRIMARY KEY, 
    "address"."state" VARCHAR,
    "address"."street" VARCHAR,
    "address"."city" VARCHAR,
    "address"."zip" VARCHAR,
    "cc"."number" VARCHAR,
    "cc"."expire" VARCHAR,
    "cc"."type" VARCHAR,
    "contact"."phone" VARCHAR);
    
    CREATE INDEX my_index ON "customer" ("customer"."state") INCLUDE("PK", "customer"."city", "customer"."expire", "customer"."type");
    
    SELECT "customer"."type" AS credit_card_type, count(*) AS num_customers FROM "customer" WHERE "customer"."state" = 'CA' GROUP BY "customer"."type";

    Use the AWS CLI to submit the files to the S3 bucket:

    aws s3 cp copySnapshot.sh s3://myBucket/
    aws s3 cp runQuery.sh s3://myBucket/
    aws s3 cp phoenixQuery.sql s3://myBucket/
  3. Create a table using the following step submitted to the cluster that you created in Step 1:

    createTable.json

    [
      {
        "Name": "Create HBase Table",
        "Args": ["bash", "-c", "echo $'create \"customer\",\"address\",\"cc\",\"contact\"' | hbase shell"],
        "Jar": "command-runner.jar",
        "ActionOnFailure": "CONTINUE",
        "Type": "CUSTOM_JAR"
      }
    ]

    aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
    --steps file://./createTable.json
  4. Use script-runner.jar to run the copySnapshot.sh script that you previously uploaded to your S3 bucket:

    aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
    --steps Type=CUSTOM_JAR,Name="HBase Copy Snapshot",ActionOnFailure=CONTINUE,\
    Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://myBucket/copySnapshot.sh"]

    This runs a MapReduce job to copy your snapshot data to the cluster HDFS.

  5. Restore the snapshot that you copied to the cluster using the following step:

    restoreSnapshot.json

    [
      {
        "Name": "restore",
        "Args": ["bash", "-c", "echo $'disable \"customer\"; restore_snapshot \"customer_snapshot1\"; enable \"customer\"' | hbase shell"],
        "Jar": "command-runner.jar",
        "ActionOnFailure": "CONTINUE",
        "Type": "CUSTOM_JAR"
      }
    ]

    aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
    --steps file://./restoreSnapshot.json
  6. Use script-runner.jar to run the runQuery.sh script that you previously uploaded to your S3 bucket:

    aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
    --steps Type=CUSTOM_JAR,Name="Phoenix Run Query",ActionOnFailure=CONTINUE,\
    Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://myBucket/runQuery.sh"]

    The query runs and returns the results to the step's stdout. It may take a few minutes for this step to complete.

  7. Inspect the results of the step's stdout at the log URI that you used when you created the cluster in Step 1. The results should look like the following:

    +------------------------------------------+-----------------------------------+
    |             CREDIT_CARD_TYPE             |              NUM_CUSTOMERS        |
    +------------------------------------------+-----------------------------------+
    | american_express                         | 5728                              |
    | dankort                                  | 5782                              |
    | diners_club                              | 5795                              |
    | discover                                 | 5715                              |
    | forbrugsforeningen                       | 5691                              |
    | jcb                                      | 5762                              |
    | laser                                    | 5769                              |
    | maestro                                  | 5816                              |
    | mastercard                               | 5697                              |
    | solo                                     | 5586                              |
    | switch                                   | 5781                              |
    | visa                                     | 5659                              |
    +------------------------------------------+-----------------------------------+