由于 CloudTrail 日志具有一个已知结构,您可以预先指定该结构的分区方案,因此可以使用 Athena 分区投影功能减少查询运行时间并自动管理分区。当添加新数据时,分区投影会自动添加新分区。这样就不必使用 ALTER TABLE ADD PARTITION
手动添加分区了。
以下示例 CREATE TABLE
语句会自动在 CloudTrail 日志上从指定日期开始到当前为单个 AWS 区域 使用分区投影。在 LOCATION
和 storage.location.template
子句中,将存储桶
、account-id
和 aws-region
占位符替换为对应的相同值。对于 projection.timestamp.range
,将 2020
/01
/01
替换为要使用的开始日期。成功运行查询后,您可以查询表。您无需运行 ALTER TABLE ADD PARTITION
来加载分区。
CREATE EXTERNAL TABLE cloudtrail_logs_pp(
eventversion STRING,
useridentity STRUCT<
type: STRING,
principalid: STRING,
arn: STRING,
accountid: STRING,
invokedby: STRING,
accesskeyid: STRING,
username: STRING,
onbehalfof: STRUCT<
userid: STRING,
identitystorearn: STRING>,
sessioncontext: STRUCT<
attributes: STRUCT<
mfaauthenticated: STRING,
creationdate: STRING>,
sessionissuer: STRUCT<
type: STRING,
principalid: STRING,
arn: STRING,
accountid: STRING,
username: STRING>,
ec2roledelivery:string,
webidfederationdata: STRUCT<
federatedprovider: STRING,
attributes: map<string,string>>
>
>,
eventtime STRING,
eventsource STRING,
eventname STRING,
awsregion STRING,
sourceipaddress STRING,
useragent STRING,
errorcode STRING,
errormessage STRING,
requestparameters STRING,
responseelements STRING,
additionaleventdata STRING,
requestid STRING,
eventid STRING,
readonly STRING,
resources ARRAY<STRUCT<
arn: STRING,
accountid: STRING,
type: STRING>>,
eventtype STRING,
apiversion STRING,
recipientaccountid STRING,
serviceeventdetails STRING,
sharedeventid STRING,
vpcendpointid STRING,
vpcendpointaccountid STRING,
eventcategory STRING,
addendum STRUCT<
reason:STRING,
updatedfields:STRING,
originalrequestid:STRING,
originaleventid:STRING>,
sessioncredentialfromconsole STRING,
edgedevicedetails STRING,
tlsdetails STRUCT<
tlsversion:STRING,
ciphersuite:STRING,
clientprovidedhostheader:STRING>
)
PARTITIONED BY (
`timestamp` string)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://amzn-s3-demo-bucket/AWSLogs/account-id
/CloudTrail
/aws-region
'
TBLPROPERTIES (
'projection.enabled'='true',
'projection.timestamp.format'='yyyy/MM/dd',
'projection.timestamp.interval'='1',
'projection.timestamp.interval.unit'='DAYS',
'projection.timestamp.range'='2020
/01
/01
,NOW',
'projection.timestamp.type'='date',
'storage.location.template'='s3://amzn-s3-demo-bucket/AWSLogs/account-id
/CloudTrail
/aws-region
/${timestamp}')
更多有关分区投影的信息,请参阅 将分区投影与 Amazon Athena 结合使用。