Accessing Amazon Athena with JDBC

Access Amazon Athena using a Java Database Connectivity (JDBC) driver available on Amazon Simple Storage Service.

Athena JDBC Driver#

Using this driver allows you to connect to popular third-party applications such as SQL Workbench. You can also use this driver to run queries programmatically against Athena (Java 8 is required).

For example, you cannot currently run more than one query against Athena in the AWS Management Console. However, with the JDBC driver, you can submit a script that runs more than one query. By default, you can run five (5) queries concurrently from an account. You can request a service limit increase to raise this limit.

Download the Driver#

Current driver (JDBC 4.1 and Java 8 compatible): https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.1.0.jar.

Use the AWS CLI with the following command:

aws s3 cp s3://athena-downloads/drivers/AthenaJDBC41-1.1.0.jar [local_directory]

JDBC URL Format#

The format of the JDBC connection string for Athena is the following:

jdbc:awsathena://athena.REGION.amazonaws.com:443

Current REGION values are us-east-1, us-east-2, and us-west-2.

Driver Class Name#

To use the driver in custom applications, you need to set up your Java class path to the location of the JAR file that you downloaded from s3://athena-downloads/drivers/ in the previous section. This makes the classes within the JAR available for use. The main JDBC driver class is com.amazonaws.athena.jdbc.AthenaDriver.

Credentials#

Credentials are required to gain access to AWS services and resources, such as Athena and the Amazon S3 buckets to access.

For providing credentials in Java code you should use a class which implements the AWSCredentialsProvider interface. You would then set the JDBC property, aws_credentials_provider_class, equal to the class name and make sure that it is included in your classpath. To include constructor parameters, you set the JDBC property, aws_credentials_provider_arguments. For more information, see Using a Credentials Provider.

Another method to supply credentials—used in BI tools like SQL Workbench, for example—would be to supply the credentials used for the JDBC as AWS access key and AWS secret key for the JDBC properties for user and password, respectively.

Note

Users who connect through the JDBC driver and have custom access policies attached to their profiles need permission for policy actions in addition to those in the Amazon Athena SDK. For more information, see Athena Policy Actions.

JDBC Driver Options#

You can configure the following options for the JDBC driver. With version 1.0.1 of the driver, you can also pass parameters using the standard JDBC URL syntax, for example jdbc:awsathena://athena.us-west-1.amazonaws.com:443?max_error_retries=20&connection_timeout=20000

JDBC Options
Property Name Description Default Value Is Required
s3_staging_dir The Amazon S3 location to which your query output is written. The JDBC driver then asks Athena to read the results and provide rows of data back to the user. N/A Yes
query_results_encryption_option The encryption method to use for the directory specified by s3_staging_dir. If not specified, the location is not encrypted. Valid values are SSE_S3, SSE_KMS, and CSE_KMS. For more information, see Configuring Encryption Options. N/A No
query_results_aws_kms_key The Key ID of the AWS customer master key (CMK) to use if query_results_encryption_option specifies SSE-KMS or CSE-KMS. For example, 123abcde-4e56-56f7-g890-1234h5678i9j. N/A No
aws_credentials_provider_class The credentials provider class name, which implements the AWSCredentialsProvider interface. N/A No
aws_credentials_provider_arguments Arguments for the credentials provider constructor as comma-separated values. N/A No
max_error_retries The maximum number of retries that the JDBC client attempts to make a request to Athena. 10 No
connection_timeout The maximum amount of time, in milliseconds, to make a successful connection to Athena before an attempt is terminated. 10,000 No
socket_timeout The maximum amount of time, in milliseconds, to wait for a socket in order to send data to Athena. 10,000 No
retry_base_delay Minimum delay amount, in milliseconds, between retrying attempts to connect Athena. 100 No
retry_max_backoff_time Maximum delay amount, in milliseconds, between retrying attempts to connect Athena. 1000 No
log_path Local path of the Athena JDBC driver logs. If no log path is provided, then no log files are created. N/A No
log_level Log level of the Athena JDBC driver logs. Valid values: INFO, DEBUG, WARN, ERROR, ALL, OFF, FATAL, TRACE. N/A No

Use Athena with SQL Workbench#

Follow these instructions as a general guideline for how to access Athena with a JDBC driver.

Prerequisites#

This tutorial assumes that:

  • You have downloaded and installed SQL Workbench for your operating system.
  • You have set up Athena according to Setting Up.
  • The AWS JAVA SDK is included in your classpath, specifically the aws-java-sdk-core module, which includes the authorization packages (com.amazonaws.auth.*) referenced in the example.

Configuring SQL Workbench#

  1. Download the Athena driver and place it in the SQL Workbench directory.

  2. Open SQL Workbench.

  3. Configure an Athena driver by clicking File, Manage Drivers....

  4. For Name, type something like "Athena JDBC Driver".

  5. For Library, type the path for the location to which you downloaded your driver. For example, on a Linux machine, this might look like: /usr/local/SqlWorkBench-121/AthenaJDBC41-1.0.0.jar.

  6. For Classname, enter the full class name: com.amazonaws.athena.jdbc.AthenaDriver.

  7. For Sample URL, enter the URL, replacing REGION with your desired region. Currently, the supported regions are us-east-1, us-east-2, and us-west-2.

    jdbc:awsathena://athena.REGION.amazonaws.com:443

  8. Click OK.

    _images/workbenchdriver.png
  9. Set up a connection by clicking File, Connect window.

  10. Create a new connection profile and call it "Athena".

  11. Under Driver, select the Athena driver (com.amazonaws.athena.jdbc.AthenaDriver).

  12. For URL, enter the connection string. For example, in us-east-1, this would be jdbc:awsathena://athena.us-east-1.amazonaws.com:443/.

  13. For Username and Password, enter your AWS access key and secret key, respectively.

  14. Under Extended Properties, enter a desired value for s3_staging_dir that is in the same region where you are working, and then click OK. This setting is necessary to place the query results in Amazon S3 so you can download them locally using the JDBC driver. For more information about other options, see JDBC Driver Options.

  15. You can leave other options at their default condition and click OK.

    _images/connectwindow.png

Querying Data#

In the Statement window, you can enter a series of queries on your data. You can also use a CREATE statement to add new tables. The JDBC uses the default database but you can also create databases and use them. In this case, you should use the database identifier as a namespace prefix to your table name when writing your queries, to distinguish between tables in the default and custom databases.

Although you can enter a series of queries in the Statement window, keep in mind that you can only run five (5) simultaneous queries per account.

  1. Create a table in the default database using the example in the Getting Started chapter. Here's the CREATE TABLE statement:

    CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (
       `Date` DATE,
       Time STRING,
       Location STRING,
       Bytes INT,
       RequestIP STRING,
       Method STRING,
       Host STRING,
       Uri STRING,
       Status INT,
       Referrer STRING,
       os STRING,
       Browser STRING,
       BrowserVersion STRING
       ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
       WITH SERDEPROPERTIES (
       "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$"
       ) LOCATION 's3://athena-examples/cloudfront/plaintext/';
    
Click Execute.
  1. Run a simple query such as SELECT DISTINCT os FROM cloudfront_logs, and view the result.

Exploring Data#

There is also a more slightly interactive way to view your Athena data using the Database Explorer tab. On this tab, you can select tables and view information such as the CREATE TABLE statement or a window into the data.

  1. On the Database Explorer tab, select the default schema (database).

  2. Select the cloudfront_logs table. This loads the Columns tab, which shows the table schema.

    _images/workbenchcolumns.png
  3. Open other tabs. Two other tabs are of immediate interest:

    • SQL Source shows something similar to the output of EXPLAIN CREATE TABLE cloudfront_logs

      _images/workbenchsqlsource.png
    • Data shows rows returned from your table. It may take time to load the data.

      _images/workbenchdata.png

Using the JDBC Driver with the JDK#

The following code examples demonstrate how to use the JDBC driver in an application.

Creating a Driver#

Properties info = new Properties();
info.put("user", "AWSAccessKey");
info.put("password", "AWSSecretAccessKey");
info.put("s3_staging_dir", "s3://S3 Bucket Location/");

Class.forName("com.amazonaws.athena.jdbc.AthenaDriver");

Connection connection = DriverManager.getConnection("jdbc:awsathena://athena.us-east-1.amazonaws.com:443/", info);

Using a Credentials Provider#

The following examples demonstrate different ways to use a credentials provider that implements the AWSCredentialsProvider interface with the JDBC.

Credentials provider with a single argument

Properties myProps = new Properties();
myProps.put("aws_credentials_provider_class","com.amazonaws.auth.PropertiesFileCredentialsProvider");
myProps.put("aws_credentials_provider_arguments","/Users/myUser/.athenaCredentials");

In this case, the filename called /Users/myUser/.athenaCredentials should contain the following:

accessKey = ACCESSKEY
secretKey = SECRETKEY

You replace the right hand part of the assignments with your account's AWS access and secret keys.

Credentials provider with a multiple arguments

This example shows a credentials provider that uses an access and secret key as well as a session token, for example. The signature of the class looks like the following:

public CustomSessionCredentialsProvider(String accessId, String secretKey, String token)
{
  //...
}

You would then set the properties as follows:

Properties myProps = new Properties();
myProps.put("aws_credentials_provider_class","com.amazonaws.athena.jdbc.CustomSessionCredentialsProvider");
String providerArgs = "My_Access_Key," + "My_Secret_Key," + "My_Token";
myProps.put("aws_credentials_provider_arguments",providerArgs);

Using InstanceProfileCredentialsProvider

If you use the InstanceProfileCredentialsProvider, you don't need to supply any credential provider arguments because they are provided using the EC2 instance profile for the instance on which you are running your application. You would still set the aws_credentials_provider_class property to this class name, however.

Executing a SELECT Query#

Statement statement = connection.createStatement();
ResultSet queryResults = statement.executeQuery("SELECT * FROM cloudfront_logs");

Executing CREATE/ALTER Statements#

Statement statement = connection.createStatement();
ResultSet queryResults = statement.executeQuery("CREATE EXTERNAL TABLE tableName ( Col1 String ) LOCATION ‘s3://bucket/tableLocation");

Full Example Listing Tables#

import java.sql.*;
import java.util.Properties;
import com.amazonaws.athena.jdbc.AthenaDriver;
import com.amazonaws.auth.PropertiesFileCredentialsProvider;


public class AthenaJDBCDemo {

  static final String athenaUrl = "jdbc:awsathena://athena.us-east-1.amazonaws.com:443";

  public static void main(String[] args) {

      Connection conn = null;
      Statement statement = null;

      try {
          Class.forName("com.amazonaws.athena.jdbc.AthenaDriver");
          Properties info = new Properties();
          info.put("s3_staging_dir", "s3://my-athena-result-bucket/test/");
          info.put("log_path", "/Users/myUser/.athena/athenajdbc.log");
          info.put("aws_credentials_provider_class","com.amazonaws.auth.PropertiesFileCredentialsProvider");
          info.put("aws_credentials_provider_arguments","/Users/myUser/.athenaCredentials");
          String databaseName = "default";

          System.out.println("Connecting to Athena...");
          conn = DriverManager.getConnection(athenaUrl, info);

          System.out.println("Listing tables...");
          String sql = "show tables in "+ databaseName;
          statement = conn.createStatement();
          ResultSet rs = statement.executeQuery(sql);

          while (rs.next()) {
              //Retrieve table column.
              String name = rs.getString("tab_name");

              //Display values.
              System.out.println("Name: " + name);
          }
          rs.close();
          conn.close();
      } catch (Exception ex) {
          ex.printStackTrace();
      } finally {
          try {
              if (statement != null)
                  statement.close();
          } catch (Exception ex) {

          }
          try {
              if (conn != null)
                  conn.close();
          } catch (Exception ex) {

              ex.printStackTrace();
          }
      }
      System.out.printf("Finished connectivity test.");
  }
}