Amazon Athena
User Guide

CloudTrail SerDe

AWS CloudTrail is a service that records AWS API calls and events for AWS accounts. CloudTrail generates encrypted logs and stores them in Amazon S3. You can use Athena to query these logs directly from Amazon S3, specifying the LOCATION of logs.

To query CloudTrail logs in Athena, create table from the logs and use the CloudTrail SerDe to deserialize the logs data.

In addition to using the CloudTrail SerDe, instances exist where you need to use a different SerDe or to extract data from JSON. Certain fields in CloudTrail logs are STRING values that may have a variable data format, which depends on the service. As a result, the CloudTrail SerDe is unable to predictably deserialize them. To query the following fields, identify the data pattern and then use a different SerDe, such as the OpenX JSON SerDe. Alternatively, to get data out of these fields, use JSON_EXTRACT functions. For more information, see Extracting Data From JSON.

  • requestParameters

  • responseElements

  • additionalEventData

  • serviceEventDetails

SerDe Name

CloudTrail SerDe

Library Name

com.amazon.emr.hive.serde.CloudTrailSerde

Examples

The following example uses the CloudTrail SerDe on a fictional set of logs to create a table based on them.

In this example, the fields requestParameters, responseElements, and additionalEventData are included as part of STRUCT data type used in JSON. To get data out of these fields, use JSON_EXTRACT functions. For more information, see Extracting Data From JSON.

CREATE EXTERNAL TABLE cloudtrail_logs ( eventversion STRING, userIdentity STRUCT< type:STRING, principalid:STRING, arn:STRING, accountid:STRING, invokedby:STRING, accesskeyid:STRING, userName:STRING, sessioncontext:STRUCT< attributes:STRUCT< mfaauthenticated:STRING, creationdate:STRING>, sessionIssuer:STRUCT< type:STRING, principalId:STRING, arn:STRING, accountId:STRING, userName:STRING>>>, eventTime STRING, eventSource STRING, eventName STRING, awsRegion STRING, sourceIpAddress STRING, userAgent STRING, errorCode STRING, errorMessage STRING, requestParameters STRING, responseElements STRING, additionalEventData STRING, requestId STRING, eventId STRING, resources ARRAY<STRUCT< ARN:STRING, accountId:STRING, type:STRING>>, eventType STRING, apiVersion STRING, readOnly STRING, recipientAccountId STRING, serviceEventDetails STRING, sharedEventID STRING, vpcEndpointId STRING ) ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde' STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://cloudtrail_bucket_name/AWSLogs/Account_ID/';

The following query returns the logins that occurred over a 24-hour period:

SELECT useridentity.username, sourceipaddress, eventtime, additionaleventdata FROM default.cloudtrail_logs WHERE eventname = 'ConsoleLogin' AND eventtime >= '2017-02-17T00:00:00Z' AND eventtime < '2017-02-18T00:00:00Z';

For more information, see Querying AWS CloudTrail Logs.