Considerations and limitations for CTAS queries - Amazon Athena

Considerations and limitations for CTAS queries

The following sections describe considerations and limitations to keep in mind when you use CREATE TABLE AS SELECT (CTAS) queries in Athena.

CTAS query syntax

The CTAS query syntax differs from the syntax of CREATE [EXTERNAL] TABLE used for creating tables. See CREATE TABLE AS.

CTAS queries vs views

CTAS queries write new data to a specified location in Amazon S3, whereas views do not write any data.

Location of CTAS query results

If your workgroup overrides the client-side setting for query results location, Athena creates your table in the location s3://<workgroup-query-results-location>/tables/<query-id>/. To see the query results location specified for the workgroup, view the workgroup's details.

If your workgroup does not override the query results location, you can use the syntax WITH (external_location ='s3://location/') in your CTAS query to specify where your CTAS query results are stored.

Note

The external_location property must specify a location that is empty. A CTAS query checks that the path location (prefix) in the bucket is empty and never overwrites the data if the location already has data in it. To use the same location again, delete the data in the key prefix location in the bucket.

If you omit the external_location syntax and are not using the workgroup setting, Athena uses your client-side setting for the query results location and creates your table in the location s3://<client-query-results-location>/<Unsaved-or-query-name>/<year>/<month/<date>/tables/<query-id>/.

Locating Orphaned Files

If a CTAS or INSERT INTO statement fails, it is possible that orphaned data are left in the data location. Because Athena in some cases does not delete data or partial data from your bucket, you might be able to read this partial data in subsequent queries. To locate orphaned files for inspection or deletion, you can use the data manifest file that Athena provides to track the list of files to be written. For more information, see Identifying query output files and DataManifestLocation.

ORDER BY clauses ignored

In a CTAS query, Athena ignores ORDER BY clauses in the SELECT portion of the query.

According to the SQL specification (ISO 9075 Part 2), the ordering of the rows of a table specified by a query expression is guaranteed only for the query expression that immediately contains the ORDER BY clause. Tables in SQL are in any case inherently unordered, and implementing the ORDER BY in sub query clauses would both cause the query to perform poorly and not result in ordered output. Thus, in Athena CTAS queries, there is no guarantee that the order specified by the ORDER BY clause will be preserved when the data is written.

Formats for storing query results

The results of CTAS queries are stored in Parquet by default if you don't specify a data storage format. You can store CTAS results in PARQUET, ORC, AVRO, JSON, and TEXTFILE. Multi-character delimiters are not supported for the CTAS TEXTFILE format. CTAS queries do not require specifying a SerDe to interpret format transformations. See Example: Writing query results to a different format.

Compression formats

GZIP compression is used for CTAS query results in JSON and TEXTFILE formats. For Parquet, you can use GZIP or SNAPPY, and the default is GZIP. For ORC, you can use LZ4, SNAPPY, ZLIB, or ZSTD, and the default is ZLIB. For CTAS examples that specify compression, see Example: Specifying data storage and compression formats. For more information about compression in Athena, see Athena compression support.

Partition and bucket limits

You can partition and bucket the results data of a CTAS query. For more information, see Partitioning and bucketing in Athena. When creating a partitioned table using CTAS, Athena has a limit of writing 100 partitions.

Include partitioning and bucketing predicates at the end of the WITH clause that specifies properties of the destination table. For more information, see Example: Creating bucketed and partitioned tables and Partitioning and bucketing in Athena.

For information about working around the 100-partition limitation, see Using CTAS and INSERT INTO to work around the 100 partition limit.

Encryption

You can encrypt CTAS query results in Amazon S3, similar to the way you encrypt other query results in Athena. For more information, see Encrypting Athena query results stored in Amazon S3.

Expected bucket owner

For CTAS statements, the expected bucket owner setting does not apply to the destination table location in Amazon S3. The expected bucket owner setting applies only to the Amazon S3 output location that you specify for Athena query results. For more information, see Specifying a query result location using the Athena console.

Data types

Column data types for a CTAS query are the same as specified for the original query.