Loading Tables with Automatic Compression
You can apply compression encodings to columns in tables manually, based on your own evaluation of the data, or you can use the COPY command to analyze and apply compression automatically. We strongly recommend using the COPY command to apply automatic compression.
You can use automatic compression when you create and load a brand new table. The COPY command will perform a compression analysis. You can also perform a compression analysis without loading data or changing the compression on a table by running the ANALYZE COMPRESSION command against an already populated table. For example, you can run the ANALYZE COMPRESSION command when you want to analyze compression on a table for future use, while preserving the existing DDL.
Automatic compression balances overall performance when choosing compression encodings. Range-restricted scans might perform poorly if sort key columns are compressed much more highly than other columns in the same query. As a result, automatic compression will choose a less efficient compression encoding to keep the sort key columns balanced with other columns. However, ANALYZE COMPRESSION does not take sort keys into account, so it might recommend a different encoding for the sort key than what automatic compression would choose. If you use ANALYZE COMPRESSION, consider changing the encoding to RAW for sort keys.
How Automatic Compression Works
By default, the COPY command applies automatic compression whenever you run the COPY command with an empty target table and all of the table columns either have RAW encoding or no encoding.
To apply automatic compression to an empty table, regardless of its current compression encodings, run the COPY command with the COMPUPDATE option set to ON. To disable automatic compression, run the COPY command with the COMPUPDATE option set to OFF.
You cannot apply automatic compression to a table that already contains data.
Automatic compression analysis requires enough rows in the load data (at least 100,000 rows per slice) to allow sampling to take place.
Automatic compression performs these operations in the background as part of the load transaction:
An initial sample of rows is loaded from the input file. Sample size is based on the value of the COMPROWS parameter. The default is 100,000.
Compression options are chosen for each column.
The sample rows are removed from the table.
If enough data is being loaded to provide a meaningful sample, the table is re-created with the chosen compression encodings.
The entire input file is loaded and compressed using the new encodings.
After you run the COPY command, the table is fully loaded, compressed, and ready for use. If you load more data later, appended rows are compressed according to the existing encoding.
If you only want to perform a compression analysis, run ANALYZE COMPRESSION, which is more efficient than running a full COPY. Then you can evaluate the results to decide whether to use automatic compression or recreate the table manually.
Automatic compression is supported only for the COPY command. Alternatively, you can manually apply compression encoding when you create the table. For information about manual compression encoding, see Choosing a Column Compression Type.
Automatic Compression Example
In this example, assume that the TICKIT database contains a copy of the LISTING table called BIGLIST, and you want to apply automatic compression to this table when it is loaded with approximately 3 million rows.
To load and automatically compress the table
Ensure that the table is empty. You can apply automatic compression only to an empty table:
Load the table with a single COPY command. Although the table is empty, some earlier encoding might have been specified. To ensure that Amazon Redshift performs a compression analysis, set the COMPUPDATE parameter to ON.
copy biglist from 's3://mybucket/biglist.txt' credentials '
<aws-auth-args>' delimiter '|' COMPUPDATE ON;
Because no COMPROWS option is specified, the default and recommended sample size of 100,000 rows per slice is used.
Look at the new schema for the BIGLIST table in order to review the automatically chosen encoding schemes.
select "column", type, encoding from pg_table_def where tablename = 'biglist'; Column | Type | Encoding ---------------+-----------------------------+---------- listid | integer | delta sellerid | integer | delta32k eventid | integer | delta32k dateid | smallint | delta +numtickets | smallint | delta priceperticket | numeric(8,2) | delta32k totalprice | numeric(8,2) | mostly32 listtime | timestamp without time zone | none
Verify that the expected number of rows were loaded:
select count(*) from biglist; count --------- 3079952 (1 row)
When rows are later appended to this table using COPY or INSERT statements, the same compression encodings will be applied.