Amazon Athena
User Guide  | API Reference

Querying Amazon CloudFront Logs

You can configure Amazon CloudFront CDN to export Web distribution access logs to Amazon Simple Storage Service. Use these logs to explore users’ surfing patterns across your web properties served by CloudFront.

Before you begin querying the logs, enable Web distributions access log on your preferred CloudFront distribution. For information, see Access Logs in the Amazon CloudFront Developer Guide.

Make a note of the Amazon S3 bucket to which to save these logs.

Note

This procedure works for the Web distribution access logs in CloudFront. It does not apply to streaming logs from RTMP distributions.

Creating the Table for CloudFront Logs#

To create the CloudFront table#

  1. Copy and paste the following DDL statement into the AWS Management Console for Athena. Modify the LOCATION for the S3 bucket that stores your logs.

    CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_logs (
      `date` date,
      `time` string,
      `location` string,
      bytes bigint,
      requestip string,
      method string,
      host string,
      uri string,
      status int,
      referrer string,
      useragent string,
      querystring string,
      cookie string,
      resulttype string,
      requestid string,
      hostheader string,
      requestprotocol int,
      requestbytes bigint,
      timetaken bigint,
      xforwardedfor string,
      sslprotocol string,
      sslcipher string,
      responseresulttype string,
      httpversion string
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    WITH SERDEPROPERTIES
     (
     "input.regex" = "^(?!#)([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)\\s+([^ \\t]+)$"
     )
    LOCATION 's3://your_log_bucket/prefix/';
    
  2. Run the query in AWS Management Console for Athena. After the query completes, Athena registers the cloudfront_logs table, making the data in it ready for you to issue queries.

Example Query for CloudFront logs#

The following query adds up the number of bytes served by CloudFront between June 9 and June 11, 2017. Surround the date column name with double quotes because it is a reserved word.

SELECT SUM(bytes) AS total_bytes
FROM cloudfront_logs
WHERE "date" BETWEEN DATE '2017-06-09' AND DATE '2017-06-11'
LIMIT 100;

The previous query uses an expression of type SELECT * FROM cloudfront_logs limit 10;. There are empty rows in the data returned by this query. This is because CloudFront logs include two comment rows in each file. The regular expression used in the original CREATE EXTERNAL TABLE query earlier in this topic does not match those lines. However, the SerDe used to parse the data assumes these unmatched rows are part of the dataset, and leaves them in the final output with empty values. You can eliminate empty values by running SELECT DISTINCT * FROM cloudfront_logs LIMIT 10;

For more information, see the AWS Big Data Blog post Build a Serverless Architecture to Analyze Amazon CloudFront Access Logs Using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics.