Considerations and Limitations for CTAS Queries - Amazon Athena

Considerations and Limitations for CTAS Queries

The following table describes what you need to know about CTAS queries in Athena:

Item What You Need to Know
CTAS query syntax

The CTAS query syntax differs from the syntax of CREATE [EXTERNAL] TABLE used for creating tables. See CREATE TABLE AS.

Note

Table, database, or column names for CTAS queries should not contain quotes or backticks. To ensure this, check that your table, database, or column names do not represent reserved words, and do not contain special characters (which require enclosing them in quotes or backticks). For more information, see Names for Tables, Databases, and Columns.

CTAS queries vs views

CTAS queries write new data to a specified location in Amazon S3, whereas views do not write any data.

Location of CTAS query results

If your workgroup overrides the client-side setting for query results location, Athena creates your table in the location s3://<workgroup-query-results-location>/tables/<query-id>/. To see the query results location specified for the workgroup, view the workgroup's details.

If your workgroup does not override the query results location, you can use the syntax WITH (external_location ='s3://location/') in your CTAS query to specify where your CTAS query results are stored.

Note

The external_location property must specify a location that is empty. A CTAS query checks that the path location (prefix) in the bucket is empty and never overwrites the data if the location already has data in it. To use the same location again, delete the data in the key prefix location in the bucket.

If you omit the external_location syntax and are not using the workgroup setting, Athena uses your client-side setting for the query results location and creates your table in the location s3://<client-query-results-location>/<Unsaved-or-query-name>/<year>/<month/<date>/tables/<query-id>/.

Locating Orphaned Files

If a CTAS or INSERT INTO statement fails, it is possible that orphaned data are left in the data location. Because Athena does not delete any data (even partial data) from your bucket, you might be able to read this partial data in subsequent queries. To locate orphaned files for inspection or deletion, you can use the data manifest file that Athena provides to track the list of files to be written. For more information, see Identifying Query Output Files and DataManifestLocation.

Formats for storing query results

The results of CTAS queries are stored in Parquet by default if you don't specify a data storage format. You can store CTAS results in PARQUET, ORC, AVRO, JSON, and TEXTFILE. Multi-character delimiters are not supported for the CTAS TEXTFILE format. CTAS queries do not require specifying a SerDe to interpret format transformations. See Example: Writing Query Results to a Different Format.

Compression formats

GZIP compression is used for CTAS query results by default. For Parquet and ORC, you can also specify SNAPPY. See Example: Specifying Data Storage and Compression Formats.

Partition and Bucket Limits

You can partition and bucket the results data of a CTAS query. For more information, see Bucketing vs Partitioning. Athena supports writing to 100 unique partition and bucket combinations. For example, if no buckets are defined in the destination table, you can specify a maximum of 100 partitions. If you specify five buckets, 20 partitions (each with five buckets) are allowed. If you exceed this count, an error occurs.

Include partitioning and bucketing predicates at the end of the WITH clause that specifies properties of the destination table. For more information, see Example: Creating Bucketed and Partitioned Tables and Bucketing vs Partitioning.

For information about working around the 100-partition limitation, see Using CTAS and INSERT INTO to Create a Table with More Than 100 Partitions.

Encryption

You can encrypt CTAS query results in Amazon S3, similar to the way you encrypt other query results in Athena. For more information, see Encrypting Query Results Stored in Amazon S3.

Data types

Column data types for a CTAS query are the same as specified for the original query.