Amazon Athena
User Guide

Tables and Databases Creation Process in Athena

You can run DDL statements in the Athena console, using a JDBC or an ODBC driver, or using the Athena Create Table wizard.

When you create a new table schema in Athena, Athena stores the schema in a data catalog and uses it when you run queries.

Athena uses an approach known as schema-on-read, which means a schema is projected on to your data at the time you execute a query. This eliminates the need for data loading or transformation.

Athena does not modify your data in Amazon S3.

Athena uses Apache Hive to define tables and create databases, which are essentially a logical namespace of tables.

When you create a database and table in Athena, you are simply describing the schema and the location where the table data are located in Amazon S3 for read-time querying. Database and table, therefore, have a slightly different meaning than they do for traditional relational database systems because the data isn't stored along with the schema definition for the database and table.

When you query, you query the table using standard SQL and the data is read at that time. You can find guidance for how to create databases and tables using Apache Hive documentation, but the following provides guidance specifically for Athena.

The maximum query string length is 256 KB.

Hive supports multiple data formats through the use of serializer-deserializer (SerDe) libraries. You can also define complex schemas using regular expressions. For a list of supported SerDe libraries, see Supported Data Formats, SerDes, and Compression Formats.

Requirements for Tables in Athena and Data in Amazon S3

When you create a table, you specify an Amazon S3 bucket location for the underlying data using the LOCATION clause. Consider the following:

  • Athena can only query the latest version of data on a versioned Amazon S3 bucket, and cannot query previous versions of the data.

  • You must have the appropriate permissions to work with data in the Amazon S3 location. For more information, see Setting User and Amazon S3 Bucket Permissions.

  • If the data is not encrypted in Amazon S3, it can be stored in a different region from the primary region where you run Athena. Standard inter-region data transfer rates for Amazon S3 apply in addition to standard Athena charges.

  • If the data is encrypted in Amazon S3, it must be stored in the same region, and the user or principal who creates the table in Athena must have the appropriate permissions to decrypt the data. For more information, see Configuring Encryption Options.

  • Athena does not support different storage classes within the bucket specified by the LOCATION clause, does not support the GLACIER storage class, and does not support Requester Pays buckets. For more information, see Storage Classes, Changing the Storage Class of an Object in Amazon S3, and Requester Pays Buckets in the Amazon Simple Storage Service Developer Guide.

  • If you issue queries against Amazon S3 buckets with a large number of objects and the data is not partitioned, such queries may affect the Get request rate limits in Amazon S3 and lead to Amazon S3 exceptions. To prevent errors, partition your data. Additionally, consider tuning your Amazon S3 request rates. For more information, see Request Rate and Performance Considerations.

Functions Supported

The functions supported in Athena queries are those found within Presto. For more information, see Presto 0.172 Functions and Operators in the Presto documentation.

Transactional Data Transformations Are Not Supported

Athena does not support transaction-based operations (such as the ones found in Hive or Presto) on table data. For a full list of keywords not supported, see Unsupported DDL.

Operations That Change Table States Are ACID

When you create, update, or delete tables, those operations are guaranteed ACID-compliant. For example, if multiple users or clients attempt to create or alter an existing table at the same time, only one will be successful.

All Tables Are EXTERNAL

If you use CREATE TABLE without the EXTERNAL keyword, Athena issues an error; only tables with the EXTERNAL keyword can be created. We recommend that you always use the EXTERNAL keyword. When you drop a table in Athena, only the table metadata is removed; the data remains in Amazon S3.

UDF and UDAF Are Not Supported

User-defined functions (UDF or UDAFs) and stored procedures are not supported.

To create a table using the AWS Glue Data Catalog

  1. Open the Athena console at https://console.aws.amazon.com/athena/.

  2. Choose AWS Glue Data Catalog. You can now create a table with the AWS Glue Crawler. For more information, see Using AWS Glue Crawlers.

To create a table using the wizard

  1. Open the Athena console at https://console.aws.amazon.com/athena/.

  2. Under the database display in the Query Editor, choose Add table, which displays a wizard.

  3. Follow the steps for creating your table.

To create a database using Hive DDL

A database in Athena is a logical grouping for tables you create in it.

  1. Open the Athena console at https://console.aws.amazon.com/athena/.

  2. Choose Query Editor.

  3. Enter CREATE DATABASE myDataBase and choose Run Query.

  4. Select your database from the menu. It is likely to be an empty database.

To create a table using Hive DDL

The Athena Query Editor displays the current database. If you create a table and don't specify a database, the table is created in the database chosen in the Databases section on the Catalog tab.

  1. In the database that you created, create a table by entering the following statement and choosing Run Query:

    CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs ( `Date` Date, Time STRING, Location STRING, Bytes INT, RequestIP STRING, Method STRING, Host STRING, Uri STRING, Status INT, Referrer STRING, OS String, Browser String, BrowserVersion String ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$" ) LOCATION 's3://athena-examples/cloudfront/plaintext/';
  2. If the table was successfully created, you can then run queries against your data.