Creating tables in Athena - Amazon Athena

Creating tables in Athena

You can run DDL statements in the Athena console, using a JDBC or an ODBC driver, or using the Athena Create table form.

When you create a new table schema in Athena, Athena stores the schema in a data catalog and uses it when you run queries.

Athena uses an approach known as schema-on-read, which means a schema is projected on to your data at the time you run a query. This eliminates the need for data loading or transformation.

Athena does not modify your data in Amazon S3.

Athena uses Apache Hive to define tables and create databases, which are essentially a logical namespace of tables.

When you create a database and table in Athena, you are simply describing the schema and the location where the table data are located in Amazon S3 for read-time querying. Database and table, therefore, have a slightly different meaning than they do for traditional relational database systems because the data isn't stored along with the schema definition for the database and table.

When you query, you query the table using standard SQL and the data is read at that time. You can find guidance for how to create databases and tables using Apache Hive documentation, but the following provides guidance specifically for Athena.

The maximum query string length is 256 KB.

Hive supports multiple data formats through the use of serializer-deserializer (SerDe) libraries. You can also define complex schemas using regular expressions. For a list of supported SerDe libraries, see Supported SerDes and data formats.

Considerations and limitations

Following are some important limitations and considerations for tables in Athena.

Requirements for tables in Athena and data in Amazon S3

When you create a table, you specify an Amazon S3 bucket location for the underlying data using the LOCATION clause. Consider the following:

  • Athena can only query the latest version of data on a versioned Amazon S3 bucket, and cannot query previous versions of the data.

  • You must have the appropriate permissions to work with data in the Amazon S3 location. For more information, see Access to Amazon S3.

  • Athena supports querying objects that are stored with multiple storage classes in the same bucket specified by the LOCATION clause. For example, you can query data in objects that are stored in different Storage classes (Standard, Standard-IA and Intelligent-Tiering) in Amazon S3.

  • Athena supports Requester Pays buckets. For information how to enable Requester Pays for buckets with source data you intend to query in Athena, see Create a workgroup.

  • Athena does not support querying the data in the S3 Glacier flexible retrieval or S3 Glacier Deep Archive storage classes. Objects in the S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage classes are ignored. As an alternative, you can use the Amazon S3 Glacier Instant Retrieval storage class, which is queryable by Athena. For more information, see Amazon S3 Glacier instant retrieval storage class.

    For information about storage classes, see Storage classes, Changing the storage class of an object in amazon S3, Transitioning to the GLACIER storage class (object archival) , and Requester Pays buckets in the Amazon Simple Storage Service User Guide.

  • If you issue queries against Amazon S3 buckets with a large number of objects and the data is not partitioned, such queries may affect the Get request rate limits in Amazon S3 and lead to Amazon S3 exceptions. To prevent errors, partition your data. Additionally, consider tuning your Amazon S3 request rates. For more information, see Request rate and performance considerations.

  • If you use the AWS Glue CreateTable API operation or the AWS CloudFormation AWS::Glue::Table template to create a table for use in Athena without specifying the TableType property and then run a DDL query like SHOW CREATE TABLE or MSCK REPAIR TABLE, you can receive the error message FAILED: NullPointerException Name is null.

    To resolve the error, specify a value for the TableInput TableType attribute as part of the AWS Glue CreateTable API call or AWS CloudFormation template. Possible values for TableType include EXTERNAL_TABLE or VIRTUAL_VIEW.

    This requirement applies only when you create a table using the AWS Glue CreateTable API operation or the AWS::Glue::Table template. If you create a table for Athena by using a DDL statement or an AWS Glue crawler, the TableType property is defined for you automatically.

Functions supported

The functions supported in Athena queries correspond to those in Trino and Presto. For information about individual functions, see the functions and operators section in the Trino or Presto documentation.

Transactional data transformations are not supported

Athena does not support transaction-based operations (such as the ones found in Hive or Presto) on table data. For a full list of keywords not supported, see Unsupported DDL.

Operations that change table states are ACID

When you create, update, or delete tables, those operations are guaranteed ACID-compliant. For example, if multiple users or clients attempt to create or alter an existing table at the same time, only one will be successful.

Tables are EXTERNAL

Except when creating Iceberg tables, always use the EXTERNAL keyword. If you use CREATE TABLE without the EXTERNAL keyword for non-Iceberg tables, Athena issues an error. When you drop a table in Athena, only the table metadata is removed; the data remains in Amazon S3.

Creating tables using AWS Glue or the Athena console

You can create tables in Athena by using AWS Glue, the add table form, or by running a DDL statement in the Athena query editor.

To create a table using the AWS Glue crawler

  1. Open the Athena console at https://console.aws.amazon.com/athena/.

  2. In the query editor, next to Tables and views, choose Create, and then choose AWS Glue crawler.

  3. Follow the steps on the Add crawler page of the AWS Glue console to add a crawler.

    For more information, see Using AWS Glue crawlers.

To create a table using the Athena create table form

  1. Open the Athena console at https://console.aws.amazon.com/athena/.

  2. In the query editor, next to Tables and views, choose Create, and then choose S3 bucket data.

  3. In the Create Table From S3 bucket data form, enter the information to create your table, and then choose Create table. For more information about the fields in the form, see Adding a table using a form.

To create a table using Hive DDL

  1. From the Database menu, choose the database for which you want to create a table. If you don't specify a database in your CREATE TABLE statement, the table is created in the database that is currently selected in the query editor.

  2. Enter a statement like the following in the query editor, and then choose Run, or press Ctrl+ENTER.

    CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs ( `Date` Date, Time STRING, Location STRING, Bytes INT, RequestIP STRING, Method STRING, Host STRING, Uri STRING, Status INT, Referrer STRING, OS String, Browser String, BrowserVersion String ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$" ) LOCATION 's3://athena-examples-MyRegion/cloudfront/plaintext/';

Showing table information

After you have created a table in Athena, its name displays in the Tables list on the left. To show information about the table and manage it, choose the vertical three dots next to the table name in the Athena console.

  • Preview table – Shows the first 10 rows of all columns by running the SELECT * FROM "database_name"."table_name" LIMIT 10 statement in the Athena query editor.

  • Generate table DDL – Generates a DDL statement that you can use to re-create the table by running the SHOW CREATE TABLE table_name statement in the Athena query editor.

  • Load partitions – Runs the MSCK REPAIR TABLE table_name statement in the Athena query editor. This option is available only if the table has partitions.

  • Insert into editor – Inserts the name of the table into the query editor at the current editing location.

  • Delete table – Displays a confirmation dialog box asking if you want to delete the table. If you agree, runs the DROP TABLE table_name statement in the Athena query editor.

  • Table properties – Shows the table name, database name, time created, and whether the table has encrypted data.