Considerations and limitations for CTAS queries
The following sections describe considerations and limitations to keep in mind when you
use CREATE TABLE AS SELECT
(CTAS) queries in Athena.
CTAS query syntax
The CTAS query syntax differs from the syntax of CREATE [EXTERNAL] TABLE
used for creating tables. See CREATE TABLE AS.
Note
Table, database, or column names for CTAS queries should not contain quotes or backticks. To ensure this, check that your table, database, or column names do not represent reserved words, and do not contain special characters (which require enclosing them in quotes or backticks). For more information, see Names for tables, databases, and columns.
CTAS queries vs views
CTAS queries write new data to a specified location in Amazon S3, whereas views do not write any data.
Location of CTAS query results
If your workgroup overrides the
client-side setting for query results location, Athena creates your table in
the location
s3://
.
To see the query results location specified for the workgroup, view the workgroup's details.<workgroup-query-results-location>
/tables/<query-id>
/
If your workgroup does not override the query results location, you can use the syntax
WITH (external_location ='s3://
in your CTAS query to specify where your CTAS query results are stored. location
/')
Note
The external_location
property must specify a location that is empty.
A CTAS query checks that the path location (prefix) in the bucket is empty and never
overwrites the data if the location already has data in it. To use the same location
again, delete the data in the key prefix location in the bucket.
If you omit the external_location
syntax and are not using the workgroup
setting, Athena uses your client-side setting
for the query results location and creates your table in the location
s3://
.
<client-query-results-location>
/<Unsaved-or-query-name>
/<year>/<month/<date>
/tables/<query-id>
/
Locating Orphaned Files
If a CTAS
or INSERT INTO
statement fails, it is possible
that orphaned data are left in the data location. Because Athena does not delete any
data (even partial data) from your bucket, you might be able to read this partial data
in subsequent queries. To locate orphaned files for inspection or deletion, you can use
the data manifest file that Athena provides to track the list of files to be written.
For more information, see Identifying query output
files and DataManifestLocation.
ORDER BY clauses ignored
In a CTAS query, Athena ignores ORDER BY
clauses in the
SELECT
portion of the query.
According to the SQL specification (ISO 9075 Part 2), the ordering of the rows of a
table specified by a query expression is guaranteed only for the query expression that
immediately contains the ORDER BY
clause. Tables in SQL are in any case
inherently unordered, and implementing the ORDER BY
in sub query clauses
would both cause the query to perform poorly and not result in ordered output. Thus, in
Athena CTAS queries, there is no guarantee that the order specified by the ORDER
BY
clause will be preserved when the data is written.
Formats for storing query results
The results of CTAS queries are stored in Parquet by default if you don't specify a
data storage format. You can store CTAS results in PARQUET
,
ORC
, AVRO
, JSON
, and TEXTFILE
.
Multi-character delimiters are not supported for the CTAS TEXTFILE
format.
CTAS queries do not require specifying a SerDe to interpret format transformations. See
Example: Writing query results to a different format.
Compression formats
GZIP
compression is used for CTAS query results in JSON and TEXTFILE
formats. For Parquet, you can use GZIP
or SNAPPY
, and the
default is GZIP
. For ORC, you can use LZ4
,
SNAPPY
, ZLIB
, or ZSTD
, and the default is
ZLIB
. For CTAS examples that specify compression, see Example: Specifying data storage and compression formats. For more information about compression in
Athena, see Athena compression support.
Partition and Bucket Limits
You can partition and bucket the results data of a CTAS query. For more information, see Partitioning and bucketing in Athena. Athena supports writing to 100 unique partition and bucket combinations. For example, if no buckets are defined in the destination table, you can specify a maximum of 100 partitions. If you specify five buckets, 20 partitions (each with five buckets) are allowed. If you exceed this count, an error occurs.
Include partitioning and bucketing predicates at the end of the WITH
clause that specifies properties of the destination table. For more information, see
Example: Creating bucketed and partitioned tables and Partitioning and bucketing in Athena.
For information about working around the 100-partition limitation, see Using CTAS and INSERT INTO to work around the 100 partition limit.
Encryption
You can encrypt CTAS query results in Amazon S3, similar to the way you encrypt other query results in Athena. For more information, see Encrypting Athena query results stored in Amazon S3 Encrypting Athena query results when using JDBC or ODBC.
Expected bucket owner
For CTAS statements, the expected bucket owner setting does not apply to the destination table location in Amazon S3. The expected bucket owner setting applies only to the Amazon S3 output location that you specify for Athena query results. For more information, see Specifying a query result location using the Athena console.
Data types
Column data types for a CTAS query are the same as specified for the original query.