メニュー
Amazon Redshift
データベース開発者ガイド (API Version 2012-12-01)

ステップ 5: 圧縮エンコードを確認する

圧縮は、データの格納時にそのサイズを小さくする列レベルの操作です。圧縮によってストレージスペースが節約され、ストレージから読み込まれるデータのサイズが小さくなり、ディスク I/O の量が減少するので、クエリパフォーマンスが向上します。

デフォルトでは、Amazon Redshift は非圧縮の raw 形式でデータを格納します。Amazon Redshift データベースでテーブルを作成するとき、列に対して圧縮タイプ (エンコード) を定義できます。詳細については、「圧縮エンコード」を参照してください。

テーブルの作成時に圧縮エンコードをテーブルの列に手動で適用するか、または COPY コマンドを使用して自動的にロードデータが分析されて圧縮エンコードが適用されるようにできます。

圧縮エンコードを確認するには

  1. Find how much space each column uses.

    Query the STV_BLOCKLIST system view to find the number of 1 MB blocks each column uses. The MAX aggregate function returns the highest block number for each column. This example uses col < 17 in the WHERE clause to exclude system-generated columns.

    Execute the following command.

    Copy
    select col, max(blocknum) from stv_blocklist b, stv_tbl_perm p where (b.tbl=p.id) and name ='lineorder' and col < 17 group by name, col order by col;

    Your results will look similar to the following.

    Copy
    col | max ----+----- 0 | 572 1 | 572 2 | 572 3 | 572 4 | 572 5 | 572 6 | 1659 7 | 715 8 | 572 9 | 572 10 | 572 11 | 572 12 | 572 13 | 572 14 | 572 15 | 572 16 | 1185 (17 rows)
  2. Experiment with the different encoding methods.

    In this step, you create a table with identical columns, except that each column uses a different compression encoding. Then you insert a large number of rows, using data from the p_name column in the PART table, so that every column has the same data. Finally, you will examine the table to compare the effects of the different encodings on column sizes.

    1. Create a table with the encodings that you want to compare.

      Copy
      create table encodingshipmode ( moderaw varchar(22) encode raw, modebytedict varchar(22) encode bytedict, modelzo varchar(22) encode lzo, moderunlength varchar(22) encode runlength, modetext255 varchar(22) encode text255, modetext32k varchar(22) encode text32k);
    2. Insert the same data into all of the columns using an INSERT statement with a SELECT clause. The command will take a couple minutes to execute.

      Copy
      insert into encodingshipmode select lo_shipmode as moderaw, lo_shipmode as modebytedict, lo_shipmode as modelzo, lo_shipmode as moderunlength, lo_shipmode as modetext255, lo_shipmode as modetext32k from lineorder where lo_orderkey < 200000000;
    3. Query the STV_BLOCKLIST system table to compare the number of 1 MB disk blocks used by each column.

      Copy
      select col, max(blocknum) from stv_blocklist b, stv_tbl_perm p where (b.tbl=p.id) and name = 'encodingshipmode' and col < 6 group by name, col order by col;

      The query returns results similar to the following. Depending on how your cluster is configured, your results will be different, but the relative sizes should be similar.

      Copy
      col | max –------+----- 0 | 221 1 | 26 2 | 61 3 | 192 4 | 54 5 | 105 (6 rows)

      The columns show the results for the following encodings:

      • Raw

      • Bytedict

      • LZO

      • Runlength

      • Text255

      • Text32K

      You can see that Bytedict encoding on the second column produced the best results for this data set, with a compression ratio of better than 8:1. Different data sets will produce different results, of course.

  3. Use the ANALYZE COMPRESSION command to view the suggested encodings for an existing table.

    Execute the following command.

    Copy
    analyze compression lineorder;

    Your results should look similar to the following.

    Copy
    Table | Column | Encoding -----------+------------------+------------------- lineorder lo_orderkey delta lineorder lo_linenumber delta lineorder lo_custkey raw lineorder lo_partkey raw lineorder lo_suppkey raw lineorder lo_orderdate delta32k lineorder lo_orderpriority bytedict lineorder lo_shippriority runlength lineorder lo_quantity delta lineorder lo_extendedprice lzo lineorder lo_ordertotalprice lzo lineorder lo_discount delta lineorder lo_revenue lzo lineorder lo_supplycost delta32k lineorder lo_tax delta lineorder lo_commitdate delta32k lineorder lo_shipmode bytedict

    Notice that ANALYZE COMPRESSION chose BYTEDICT encoding for the lo_shipmode column.

    For an example that walks through choosing manually applied compression encodings, see 例: CUSTOMER テーブルの圧縮エンコードの選択.

  4. Apply automatic compression to the SSB tables.

    By default, the COPY command automatically applies compression encodings when you load data into an empty table that has no compression encodings other than RAW encoding. For this tutorial, you will let the COPY command automatically select and apply optimal encodings for the tables as part of the next step, Recreate the test data set.

    For more information, see 自動圧縮ありでテーブルをロードする.

次のステップ

ステップ 6: テストデータセットを再作成する