启动您的 AWS CloudFormation 堆栈，然后在 Amazon S3 中查询您的数据

创建 Amazon Redshift 集群并连接到集群后，您可以安装 Redshift Spectrum DataLake AWS CloudFormation 模板，然后查询您的数据。

CloudFormation 安装 Redshift Spectrum Getting Started DataLake 模板，并创建一个包含以下信息的堆栈：

与 Redshift 集群关联的角色 myspectrum_role
外部架构 myspectrum_schema
Amazon S3 桶中的外部表 sales
已加载数据的 Redshift 表 event

启动您的 Redshift Spectrum Getting Started DataLake CloudFormation 堆栈：

选择启动 CFN 堆栈。CloudFormation 控制台打开并选定 DataLake.yml 模板。

您还可以下载和自定义 Redshift Spectrum Getting Started DataLake CloudFormation CFN 模板，然后打开 CloudFormation 控制台 (https://console.aws.amazon.com/cloudformation)，并使用自定义模板创建堆栈。
选择 Next（下一步）。
在参数下，输入 Amazon Redshift 集群名称、数据库名称和您的数据库用户名。
选择下一步。

此时将显示堆栈选项。
选择下一步以接受原定设置。
检查信息，然后在功能下选择我确认 AWS CloudFormation 可能会创建 IAM 资源。
选择创建堆栈。

如果在创建堆栈时发生错误，请参阅以下信息：

查看 CloudFormation 事件选项卡，以获取可以帮助解决错误的信息。
删除 DataLake CloudFormation 堆栈后再重试操作。
确保您已连接到 Amazon Redshift 数据库。
确保您输入了 Amazon Redshift 集群名称、数据库名称和数据库用户名的正确信息。

在 Amazon S3 中查询您的数据

使用用于查询其它 Amazon Redshift 表的同一 SELECT 语句查询外部表。这些 SELECT 语句查询包括联接表、聚合数据和筛选谓词。

以下查询会返回 myspectrum_schema.sales 外部表中的行数。


select count(*) from myspectrum_schema.sales;

count 
------
172462

将外部表与本地表联接

以下示例将外部表 myspectrum_schema.sales 与本地表 event 联接以查找排名前十的活动的销量总额。


select top 10 myspectrum_schema.sales.eventid, sum(myspectrum_schema.sales.pricepaid) from myspectrum_schema.sales, event
where myspectrum_schema.sales.eventid = event.eventid
and myspectrum_schema.sales.pricepaid > 30
group by myspectrum_schema.sales.eventid
order by 2 desc;

eventid | sum     
--------+---------
    289 | 51846.00
   7895 | 51049.00
   1602 | 50301.00
    851 | 49956.00
   7315 | 49823.00
   6471 | 47997.00
   2118 | 47863.00
    984 | 46780.00
   7851 | 46661.00
   5638 | 46280.00

查看查询计划

查看上一查询的查询计划。注意针对 Amazon S3 上的数据执行的 S3 Seq Scan、S3 HashAggregate 和 S3 Query Scan 步骤。


explain
select top 10 myspectrum_schema.sales.eventid, sum(myspectrum_schema.sales.pricepaid) 
from myspectrum_schema.sales, event
where myspectrum_schema.sales.eventid = event.eventid
and myspectrum_schema.sales.pricepaid > 30
group by myspectrum_schema.sales.eventid
order by 2 desc;



QUERY PLAN                                                                                                                                                                                
-----------------------------------------------------------------------------
XN Limit  (cost=1001055770628.63..1001055770628.65 rows=10 width=31)                                                                                                                      
  ->  XN Merge  (cost=1001055770628.63..1001055770629.13 rows=200 width=31)                                                                                                               
        Merge Key: sum(sales.derived_col2)                                                                                                                                                
        ->  XN Network  (cost=1001055770628.63..1001055770629.13 rows=200 width=31)                                                                                                       
              Send to leader                                                                                                                                                              
              ->  XN Sort  (cost=1001055770628.63..1001055770629.13 rows=200 width=31)                                                                                                    
                    Sort Key: sum(sales.derived_col2)                                                                                                                                     
                    ->  XN HashAggregate  (cost=1055770620.49..1055770620.99 rows=200 width=31)                                                                                           
                          ->  XN Hash Join DS_BCAST_INNER  (cost=3119.97..1055769620.49 rows=200000 width=31)                                                                             
                                Hash Cond: ("outer".derived_col1 = "inner".eventid)                                                                                                       
                                ->  XN S3 Query Scan sales  (cost=3010.00..5010.50 rows=200000 width=31)                                                                                  
                                      ->  S3 HashAggregate  (cost=3010.00..3010.50 rows=200000 width=16)                                                                                  
                                            ->  S3 Seq Scan spectrum.sales location:"s3://redshift-downloads/tickit/spectrum/sales" format:TEXT  (cost=0.00..2150.00 rows=172000 width=16)
                                                  Filter: (pricepaid > 30.00)                                                                                                             
                                ->  XN Hash  (cost=87.98..87.98 rows=8798 width=4)                                                                                                        
                                      ->  XN Seq Scan on event  (cost=0.00..87.98 rows=8798 width=4)

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

文档惯例

步骤 4：在 Amazon S3 中查询数据

适用于 Amazon Redshift Spectrum 的 IAM 策略