メニュー
Amazon Redshift
データベース開発者ガイド (API Version 2012-12-01)

ステップ 1: テストデータセットを作成する

データウェアハウスデータベースは一般的に、スタースキーマ設計を使用します。この設計では、中央のファクトテーブルにデータベースのコアデータが格納され、複数のディメンションテーブルにファクトテーブルの内容を示す属性情報が格納されます。ファクトテーブルは、ディメンションのプライマリキーと一致する外部キーを使用して各ディメンションテーブルを結合します。

スタースキーマベンチマーク (SSB)

このチュートリアルでは、スタースキーマベンチマーク (SSB) スキーマに基づいた 5 つのテーブルをセットとして使用します。以下の図に示しているのは SSB データモデルです。

テストデータセットを作成するには

ソートキー、分散スタイル、または圧縮エンコードを使用しないでテーブルのセットを作成します。続いて、SSB データセットのデータをテーブルにロードします。

  1. (Optional) Launch a cluster.

    If you already have a cluster that you want to use, you can skip this step. Your cluster should have at least two nodes. For the exercises in this tutorial, you will use a four-node cluster.

    To launch a dc1.large cluster with four nodes, follow the steps in Amazon Redshift 入門ガイド, but select Multi Node for Cluster Type and set Number of Compute Nodes to 4.

    Follow the steps to connect to your cluster from a SQL client and test a connection. You do not need to complete the remaining steps to create tables, upload data, and try example queries.

  2. Create the SSB test tables using minimum attributes.

    注記

    If the SSB tables already exist in the current database, you will need to drop the tables first. See ステップ 6: テストデータセットを再作成する for the DROP TABLE commands.

    For the purposes of this tutorial, the first time you create the tables, they will not have sort keys, distribution styles, or compression encodings.

    Execute the following CREATE TABLE commands.

    Copy
    CREATE TABLE part ( p_partkey INTEGER NOT NULL, p_name VARCHAR(22) NOT NULL, p_mfgr VARCHAR(6) NOT NULL, p_category VARCHAR(7) NOT NULL, p_brand1 VARCHAR(9) NOT NULL, p_color VARCHAR(11) NOT NULL, p_type VARCHAR(25) NOT NULL, p_size INTEGER NOT NULL, p_container VARCHAR(10) NOT NULL ); CREATE TABLE supplier ( s_suppkey INTEGER NOT NULL, s_name VARCHAR(25) NOT NULL, s_address VARCHAR(25) NOT NULL, s_city VARCHAR(10) NOT NULL, s_nation VARCHAR(15) NOT NULL, s_region VARCHAR(12) NOT NULL, s_phone VARCHAR(15) NOT NULL ); CREATE TABLE customer ( c_custkey INTEGER NOT NULL, c_name VARCHAR(25) NOT NULL, c_address VARCHAR(25) NOT NULL, c_city VARCHAR(10) NOT NULL, c_nation VARCHAR(15) NOT NULL, c_region VARCHAR(12) NOT NULL, c_phone VARCHAR(15) NOT NULL, c_mktsegment VARCHAR(10) NOT NULL ); CREATE TABLE dwdate ( d_datekey INTEGER NOT NULL, d_date VARCHAR(19) NOT NULL, d_dayofweek VARCHAR(10) NOT NULL, d_month VARCHAR(10) NOT NULL, d_year INTEGER NOT NULL, d_yearmonthnum INTEGER NOT NULL, d_yearmonth VARCHAR(8) NOT NULL, d_daynuminweek INTEGER NOT NULL, d_daynuminmonth INTEGER NOT NULL, d_daynuminyear INTEGER NOT NULL, d_monthnuminyear INTEGER NOT NULL, d_weeknuminyear INTEGER NOT NULL, d_sellingseason VARCHAR(13) NOT NULL, d_lastdayinweekfl VARCHAR(1) NOT NULL, d_lastdayinmonthfl VARCHAR(1) NOT NULL, d_holidayfl VARCHAR(1) NOT NULL, d_weekdayfl VARCHAR(1) NOT NULL ); CREATE TABLE lineorder ( lo_orderkey INTEGER NOT NULL, lo_linenumber INTEGER NOT NULL, lo_custkey INTEGER NOT NULL, lo_partkey INTEGER NOT NULL, lo_suppkey INTEGER NOT NULL, lo_orderdate INTEGER NOT NULL, lo_orderpriority VARCHAR(15) NOT NULL, lo_shippriority VARCHAR(1) NOT NULL, lo_quantity INTEGER NOT NULL, lo_extendedprice INTEGER NOT NULL, lo_ordertotalprice INTEGER NOT NULL, lo_discount INTEGER NOT NULL, lo_revenue INTEGER NOT NULL, lo_supplycost INTEGER NOT NULL, lo_tax INTEGER NOT NULL, lo_commitdate INTEGER NOT NULL, lo_shipmode VARCHAR(10) NOT NULL );
  3. Load the tables using SSB sample data.

    The sample data for this tutorial is provided in an Amazon S3 buckets that give read access to all authenticated AWS users, so any valid AWS credentials that permit access to Amazon S3 will work.

    1. Create a new text file named loadssb.sql containing the following SQL.

      Copy
      copy customer from 's3://awssampledbuswest2/ssbgz/customer' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' gzip compupdate off region 'us-west-2'; copy dwdate from 's3://awssampledbuswest2/ssbgz/dwdate' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' gzip compupdate off region 'us-west-2'; copy lineorder from 's3://awssampledbuswest2/ssbgz/lineorder' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' gzip compupdate off region 'us-west-2'; copy part from 's3://awssampledbuswest2/ssbgz/part' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' gzip compupdate off region 'us-west-2'; copy supplier from 's3://awssampledbuswest2/ssbgz/supplier' credentials 'aws_access_key_id=<Your-Access-Key-ID>;aws_secret_access_key=<Your-Secret-Access-Key>' gzip compupdate off region 'us-west-2';
    2. Replace <Your-Access-Key-ID> and <Your-Secret-Access-Key> with your own AWS account credentials. The segment of the credentials string that is enclosed in single quotes must not contain any spaces or line breaks.

    3. Execute the COPY commands either by running the SQL script or by copying and pasting the commands into your SQL client.

      注記

      The load operation will take about 10 to 15 minutes for all five tables.

      Your results should look similar to the following.

      Copy
      Load into table 'customer' completed, 3000000 record(s) loaded successfully. 0 row(s) affected. copy executed successfully Execution time: 10.28s (Statement 1 of 5 finished) ... ... Script execution finished Total script execution time: 9m 51s
  4. Sum the execution time for all five tables, or else note the total script execution time. You’ll record that number as the load time in the benchmarks table in Step 2, following.

  5. To verify that each table loaded correctly, execute the following commands.

    Copy
    select count(*) from LINEORDER; select count(*) from PART; select count(*) from CUSTOMER; select count(*) from SUPPLIER; select count(*) from DWDATE;

    The following results table shows the number of rows for each SSB table.

    Table Name Rows
    LINEORDER 600,037,902
    PART 1,400,000
    CUSTOMER 3,000,000
    SUPPLIER 1,000,000
    DWDATE 2,556

次のステップ

ステップ 2: システムパフォーマンスをテストしてベースラインを確定する