Amazon Athena
User Guide  | API Reference

CloudTrail SerDe

AWS CloudTrail is a service that records AWS API calls and events for AWS accounts. CloudTrail generates encrypted log files and stores them in Amazon S3. You can use Athena to query these log files directly from Amazon S3, specifying the LOCATION of log files.

To query CloudTrail logs in Athena, create table from the log files and use the CloudTrail SerDe to deserialize the logs data.

Note

In addition to using the CloudTrail SerDe, instances exist where you need to use a different SerDe. Certain fields in CloudTrail logs are string values that may have a variable data format, which depends on the service. As a result, the CloudTrail SerDe is unable to predictably deserialize them. To query the following fields, identify the data pattern and then use a different SerDe, such as the OpenX JSON SerDe.

  • requestParameters
  • responseElements
  • additionalEventData
  • serviceEventDetails

SerDe Name#

CloudTrail SerDe

Library Name#

com.amazon.emr.hive.serde.CloudTrailSerde

Examples#

The following example uses the CloudTrail SerDe on a fictional set of log files to create a table based on all log files for an account with the ID 123456789012.

CREATE EXTERNAL TABLE my_cloudtrail_table (
   eventversion STRING,
   userIdentity STRUCT<
      type:STRING,
      principalid:STRING,
      arn:STRING,
      accountid:STRING,
      invokedby:STRING,
      accesskeyid:STRING,
      userName:String,
      sessioncontext:STRUCT<
         attributes:STRUCT<
            mfaauthenticated:STRING,
            creationdate:STRING>,
         sessionIssuer:STRUCT<
            type:STRING,
            principalId:STRING,
            arn:STRING,
            accountId:STRING,
            userName:STRING>>>,
   eventTime STRING,
   eventSource STRING,
   eventName STRING,
   awsRegion STRING,
   sourceIpAddress STRING,
   userAgent STRING,
   errorCode STRING,
   errorMessage STRING,
   requestId STRING,
   eventId STRING,
   resources ARRAY<STRUCT<
      ARN:STRING,
      accountId:STRING,
      type:STRING>>,
   eventType STRING,
   apiVersion STRING,
   readOnly BOOLEAN,
   recipientAccountId STRING,
   sharedEventID STRING,
   vpcEndpointId STRING
 )
 ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
 STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
 LOCATION 's3://MyLogFiles/AWSLogs/123456789012/';

The following query returns the logins that occurred over a 24-hour period.

SELECT
 useridentity.username,
 sourceipaddress,
 eventtime,
 additionaleventdata
FROM default.my_cloudtrail_table
WHERE eventname = 'ConsoleLogin'
      AND eventtime >= '2017-02-17T00:00:00Z'
      AND eventtime < '2017-02-18T00:00:00Z';

For more information, see Querying AWS CloudTrail Logs.