Amazon Athena
User Guide

Considerations and Limitations for CTAS Queries

The following table describes what you need to know about CTAS queries in Athena:

Item What You Need to Know
CTAS query syntax

The CTAS query syntax differs from the syntax of CREATE [EXTERNAL] TABLE used for creating tables. See CREATE TABLE AS.

Note

Table, database, or column names for CTAS queries should not contain quotes or backticks. To ensure this, check that your table, database, or column names do not represent reserved words, and do not contain special characters (which require enclosing them in quotes or backticks). For more information, see Names for Tables, Databases, and Columns.

CTAS queries vs views

CTAS queries write new data to a specified location in Amazon S3, whereas views do not write any data.

Location of CTAS query results

The location for storing CTAS query results in Amazon S3 must be empty. A CTAS query checks that the path location (prefix) in the bucket is empty and never overwrites the data if the location already has data in it. To use the same location again, delete the data in the key prefix location in the bucket, otherwise your CTAS query will fail.

You can specify the location for storing your CTAS query results. If omitted and if your workgroup does not override client-side settings, Athena uses this location by default: s3://aws-athena-query-results-<account>-<region>/<query-name-or-unsaved>/<year>/<month/<date>/<query-id>/.

If your workgroup overrides client-side settings, this means that the workgroup's query result location is used for your CTAS queries. If you specify a different results location, your query will fail. To obtain the results location specified for the workgroup, view workgroup's details.

If the workgroup in which a query will run is configured with an enforced query results location, do not specify an external_location for the CTAS query. Athena issues an error and fails a query that specifies an external_location in this case. For example, this query fails, if you override client-side settings for query results location, enforcing the workgroup to use its own location: CREATE TABLE <DB>.<TABLE1> WITH (format='Parquet', external_location='s3://my_test/test/') AS SELECT * FROM <DB>.<TABLE2> LIMIT 10;

Formats for storing query results

The results of CTAS queries are stored in Parquet by default if you don't specify a data storage format. You can store CTAS results in PARQUET, ORC, AVRO, JSON, and TEXTFILE. CTAS queries do not require specifying a SerDe to interpret format transformations. See Example: Writing Query Results to a Different Format.

Compression formats

GZIP compression is used for CTAS query results by default. For Parquet and ORC, you can also specify SNAPPY. See Example: Specifying Data Storage and Compression Formats.

Partition and Bucket Limits

You can partition and bucket the results data of a CTAS query. For more information, see Bucketing vs Partitioning. Athena supports writing to 100 unique partition and bucket combinations. For example, if no buckets are defined in the destination table, you can specify a maximum of 100 partitions. If you specify five buckets, 20 partitions (each with five buckets) is allowed. If you exceed this count, an error occurs.

Include partitioning and bucketing predicates at the end of the WITH clause that specifies properties of the destination table. For more information, see Example: Creating Bucketed and Partitioned Tables and Bucketing vs Partitioning.

Encryption

You can encrypt CTAS query results in Amazon S3, similar to the way you encrypt other query results in Athena. For more information, see Configuring Encryption Options.

Data types

Column data types for a CTAS query are the same as specified for the original query.