Use ZSTD compression levels
The Zstandard real-time data compression
algorithm
You can use ZSTD compression levels to adjust the compression ratio and speed according to your requirements. The ZSTD library supports compression levels from 1 to 22. Athena uses ZSTD compression level 3 by default.
Compression levels provide granular trade-offs between compression speed and the amount of
compression achieved. Lower compression levels provide faster speed but larger file sizes.
For example, you can use level 1 if speed is most important and level 22 if size is most
important. Level 3 is suitable for many use cases and is the default. Use levels greater
than 19 with caution as they require more memory. The ZSTD library also offers negative
compression levels that extend the range of compression speed and ratios. For more
information, see the Zstandard
Compression RFC
The abundance of compression levels offers substantial opportunities for fine tuning. However, make sure that you measure your data and consider the tradeoffs when deciding on a compression level. We recommend using the default level of 3 or a level in the range from 6 to 9 for a reasonable tradeoff between compression speed and compressed data size. Reserve levels 20 and greater for cases where size is most important and compression speed is not a concern.
Considerations and limitations
When using ZSTD compression level in Athena, consider the following points.
-
The ZSTD
compression_level
property is supported only in Athena engine version 3. -
The ZSTD
compression_level
property is supported for theALTER TABLE
,CREATE TABLE
,CREATE TABLE AS
(CTAS), andUNLOAD
statements. -
The
compression_level
property is optional. -
The
compression_level
property is supported only for ZSTD compression. -
Possible compression levels are 1 through 22.
-
The default compression level is 3.
For information about Apache Hive ZSTD compression support in Athena, see Use Hive table compression. For information about Apache Iceberg ZSTD compression support in Athena, see Use Iceberg table compression.
Specify ZSTD compression levels
To specify the ZSTD compression level for the ALTER TABLE
, CREATE
TABLE
, CREATE TABLE AS
, and UNLOAD
statements, use
the compression_level
property. To specify ZSTD compression itself, you
must use the individual compression property that the syntax for the statement
uses.
In the ALTER TABLE SET
TBLPROPERTIES statement SET
TBLPROPERTIES
clause, specify ZSTD compression using
'write.compression' = ' ZSTD'
or 'parquet.compression' =
'ZSTD'
. Then use the compression_level
property to specify a
value from 1 to 22 (for example, 'compression_level' = '5'
). If you do
not specify a compression level property, the compression level defaults to
3.
Example
The following example modifies the table existing_table
to use
Parquet file format with ZSTD compression and ZSTD compression level 4. Note
that in the TBLPROPERTIES
clause the compression level value must
be entered as a string rather an integer and therefore must be enclosed in
either single or double quotes.
ALTER TABLE existing_table SET TBLPROPERTIES ('parquet.compression' = 'ZSTD', 'compression_level' = '4')
In the CREATE TABLE statement
TBLPROPERTIES
clause, specify 'write.compression' =
'ZSTD'
or 'parquet.compression' = 'ZSTD'
, and then use
compression_level =
and specify a value from 1 to 22 as a string. If the compression_level
compression_level
property is not specified, the default compression level is 3.
Example
The following example creates a table in Parquet file format using ZSTD compression and ZSTD compression level 4.
CREATE EXTERNAL TABLE new_table ( `col0` string COMMENT '', `col1` string COMMENT '' ) STORED AS PARQUET LOCATION 's3://amzn-s3-demo-bucket/' TBLPROPERTIES ('write.compression' = 'ZSTD', 'compression_level' = '4')
In the CREATE TABLE AS statement
WITH
clause, specify write_compression = 'ZSTD'
, or
parquet_compression = 'ZSTD'
, and then use compression_level
=
and specify a value from 1
to 22 as an integer. If the compression_level
compression_level
property is not
specified, the default compression level is 3.
Example
The following CTAS example specifies Parquet as the file format using ZSTD
compression with compression level 4. Note that, in the WITH
clause, the value for compression level must be specified as an integer, not as
a string.
CREATE TABLE new_table WITH ( format = 'PARQUET', write_compression = 'ZSTD', compression_level = 4) AS SELECT * FROM old_table
In the UNLOAD statement WITH
clause, specify compression = 'ZSTD'
, and then use
compression_level =
and specify a value from 1 to 22 as an integer. If the
compression_level
compression_level
property is not specified, the default
compression level is 3.
Example
The following example unloads the query results to the specified location using the Parquet file format, ZSTD compression, and ZSTD compression level 4.
UNLOAD (SELECT * FROM old_table) TO 's3://amzn-s3-demo-bucket/' WITH (format = 'PARQUET', compression = 'ZSTD', compression_level = 4)