Byte-dictionary encoding
In byte dictionary encoding, a separate dictionary of unique values is created for each block of column values on disk. (An Amazon Redshift disk block occupies 1 MB.) The dictionary contains up to 256 one-byte values that are stored as indexes to the original data values. If more than 256 values are stored in a single block, the extra values are written into the block in raw, uncompressed form. The process repeats for each disk block.
This encoding is very effective on low cardinality string columns. This encoding is optimal when the data domain of a column is fewer than 256 unique values.
For columns with the string data type (CHAR and VARCHAR) encoded with BYTEDICT, Amazon Redshift performs vectorized scans and predicate evaluations that operate over compressed data directly. These scans use hardware-specific single instruction and multiple data (SIMD) instructions for parallel processing. This significantly speeds up the scanning of string columns. Byte-dictionary encoding is especially space-efficient if a CHAR/VARCHAR column holds long character strings.
Suppose that a table has a COUNTRY column with a CHAR(30) data type. As data is loaded, Amazon Redshift creates the dictionary and populates the COUNTRY column with the index value. The dictionary contains the indexed unique values, and the table itself contains only the one-byte subscripts of the corresponding values.
Note
Trailing blanks are stored for fixed-length character columns. Therefore, in a CHAR(30) column, every compressed value saves 29 bytes of storage when you use the byte-dictionary encoding.
The following table represents the dictionary for the COUNTRY column.
Unique data value | Dictionary index | Size (fixed length, 30 bytes per value) |
---|---|---|
England | 0 | 30 |
United States of America | 1 | 30 |
Venezuela | 2 | 30 |
Sri Lanka | 3 | 30 |
Argentina | 4 | 30 |
Japan | 5 | 30 |
Total | 180 |
The following table represents the values in the COUNTRY column.
Original data value | Original size (fixed length, 30 bytes per value) | Compressed value (index) | New size (bytes) |
---|---|---|---|
England | 30 | 0 | 1 |
England | 30 | 0 | 1 |
United States of America | 30 | 1 | 1 |
United States of America | 30 | 1 | 1 |
Venezuela | 30 | 2 | 1 |
Sri Lanka | 30 | 3 | 1 |
Argentina | 30 | 4 | 1 |
Japan | 30 | 5 | 1 |
Sri Lanka | 30 | 3 | 1 |
Argentina | 30 | 4 | 1 |
Total | 300 | 10 |
The total compressed size in this example is calculated as follows: 6 different entries are stored in the dictionary (6 * 30 = 180), and the table contains 10 1-byte compressed values, for a total of 190 bytes.