啟動您的 AWS CloudFormation 堆疊，然後在 Amazon S3 中查詢您的資料

建立 Amazon Redshift 叢集並連線到叢集之後，您可以安裝 Redshift 頻譜 DataLake AWS CloudFormation 範本，然後查詢您的資料。

CloudFormation 安裝 Redshift 頻譜入門 DataLake 範本，並建立包含下列項目的堆疊：

名為 myspectrum_role 且與您的 Redshift 叢集相關聯的角色
名為 myspectrum_schema 的外部結構描述
Amazon S3 儲存貯體中名為 sales 的外部資料表
名為 event 且已載入資料的 Redshift 資料表

若要啟動您的 Redshift 頻譜入門堆疊 DataLake CloudFormation

選擇啟動 CFN 堆疊。主 CloudFormation 控台會開啟，並選取 DataLake .yml 範本。

您也可以下載並自訂 Redshift 頻譜入門 DataLake CloudFormation CFN 範本，然後開啟 CloudFormation 主控台 (https://console.aws.amazon.com/cloudformation) 並使用自訂範本建立堆疊。
選擇下一步。
在參數下，輸入 Amazon Redshift 叢集名稱、資料庫名稱和您的資料庫使用者名稱。
選擇下一步。

堆疊選項隨即出現。
選擇下一步以接受預設設定。
檢閱資訊並在 [功能] 下方，選擇 [我確認 AWS CloudFormation 可能會建立 IAM 資源]。
選擇建立堆疊。

如果在建立堆疊時發生錯誤，請參閱下列資訊：

檢視 CloudFormation 事件索引標籤，瞭解可協助您解決錯誤的資訊。
請先刪除 DataLake CloudFormation 堆疊，然後再次嘗試此作業。
請確定您已連線到 Amazon Redshift 資料庫。
請確定您為 Amazon Redshift 叢集名稱、資料庫名稱和資料庫使用者名稱輸入正確的資訊。

在 Amazon S3 中查詢您的資料

您可以使用用於查詢其他 Amazon Redshift 資料表的相同 SELECT 陳述式，來查詢外部資料表。這些 SELECT 陳述式查詢包括聯結資料表、彙總資料和述詞篩選。

下列查詢會傳回 myspectrum_schema.sales 外部資料表中的列數。


select count(*) from myspectrum_schema.sales;

count 
------
172462

聯結外部資料表與本機資料表

以下範例會結合使用外部資料表 myspectrum_schema.sales 與本機資料表 event，以尋找前十名事件的總銷售額。


select top 10 myspectrum_schema.sales.eventid, sum(myspectrum_schema.sales.pricepaid) from myspectrum_schema.sales, event
where myspectrum_schema.sales.eventid = event.eventid
and myspectrum_schema.sales.pricepaid > 30
group by myspectrum_schema.sales.eventid
order by 2 desc;

eventid | sum     
--------+---------
    289 | 51846.00
   7895 | 51049.00
   1602 | 50301.00
    851 | 49956.00
   7315 | 49823.00
   6471 | 47997.00
   2118 | 47863.00
    984 | 46780.00
   7851 | 46661.00
   5638 | 46280.00

檢視查詢計劃

檢視先前查詢的查詢計畫。請注意在 Amazon S3 的資料上執行的 S3 Seq Scan、S3 HashAggregate 和 S3 Query Scan 步驟。


explain
select top 10 myspectrum_schema.sales.eventid, sum(myspectrum_schema.sales.pricepaid) 
from myspectrum_schema.sales, event
where myspectrum_schema.sales.eventid = event.eventid
and myspectrum_schema.sales.pricepaid > 30
group by myspectrum_schema.sales.eventid
order by 2 desc;



QUERY PLAN                                                                                                                                                                                
-----------------------------------------------------------------------------
XN Limit  (cost=1001055770628.63..1001055770628.65 rows=10 width=31)                                                                                                                      
  ->  XN Merge  (cost=1001055770628.63..1001055770629.13 rows=200 width=31)                                                                                                               
        Merge Key: sum(sales.derived_col2)                                                                                                                                                
        ->  XN Network  (cost=1001055770628.63..1001055770629.13 rows=200 width=31)                                                                                                       
              Send to leader                                                                                                                                                              
              ->  XN Sort  (cost=1001055770628.63..1001055770629.13 rows=200 width=31)                                                                                                    
                    Sort Key: sum(sales.derived_col2)                                                                                                                                     
                    ->  XN HashAggregate  (cost=1055770620.49..1055770620.99 rows=200 width=31)                                                                                           
                          ->  XN Hash Join DS_BCAST_INNER  (cost=3119.97..1055769620.49 rows=200000 width=31)                                                                             
                                Hash Cond: ("outer".derived_col1 = "inner".eventid)                                                                                                       
                                ->  XN S3 Query Scan sales  (cost=3010.00..5010.50 rows=200000 width=31)                                                                                  
                                      ->  S3 HashAggregate  (cost=3010.00..3010.50 rows=200000 width=16)                                                                                  
                                            ->  S3 Seq Scan spectrum.sales location:"s3://redshift-downloads/tickit/spectrum/sales" format:TEXT  (cost=0.00..2150.00 rows=172000 width=16)
                                                  Filter: (pricepaid > 30.00)                                                                                                             
                                ->  XN Hash  (cost=87.98..87.98 rows=8798 width=4)                                                                                                        
                                      ->  XN Seq Scan on event  (cost=0.00..87.98 rows=8798 width=4)

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

步驟 4：在 Amazon S3 中查詢您的資料

Amazon Redshift Spectrum 的 IAM 政策