Amazon Athena
User Guide

Querying Amazon CloudFront Logs

You can configure Amazon CloudFront CDN to export Web distribution access logs to Amazon Simple Storage Service. Use these logs to explore users’ surfing patterns across your web properties served by CloudFront.

Before you begin querying the logs, enable Web distributions access log on your preferred CloudFront distribution. For information, see Access Logs in the Amazon CloudFront Developer Guide.

Make a note of the Amazon S3 bucket to which to save these logs.

Note

This procedure works for the Web distribution access logs in CloudFront. It does not apply to streaming logs from RTMP distributions.

Creating the Table for CloudFront Logs

To create the CloudFront table

  1. Copy and paste the following DDL statement into the Athena console. Modify the LOCATION for the Amazon S3 bucket that stores your logs.

    This query uses the LazySimpleSerDe by default and it is omitted.

    The column date is escaped using backticks (`) because it is a reserved word in Athena. For information, see Reserved Keywords.

    CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs ( `date` DATE, time STRING, location STRING, bytes BIGINT, request_ip STRING, method STRING, host STRING, uri STRING, status INT, referrer STRING, user_agent STRING, query_string STRING, cookie STRING, result_type STRING, request_id STRING, host_header STRING, request_protocol STRING, request_bytes BIGINT, time_taken FLOAT, xforwarded_for STRING, ssl_protocol STRING, ssl_cipher STRING, response_result_type STRING, http_version STRING, fle_status STRING, fle_encrypted_fields INT, c_port INT, time_to_first_byte FLOAT, x_edge_detailed_result_type STRING, sc_content_type STRING, sc_content_len BIGINT, sc_range_start BIGINT, sc_range_end BIGINT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://CloudFront_bucket_name/CloudFront/' TBLPROPERTIES ( 'skip.header.line.count'='2' )
  2. Run the query in Athena console. After the query completes, Athena registers the cloudfront_logs table, making the data in it ready for you to issue queries.

Example Query for CloudFront Logs

The following query adds up the number of bytes served by CloudFront between June 9 and June 11, 2018. Surround the date column name with double quotes because it is a reserved word.

SELECT SUM(bytes) AS total_bytes FROM cloudfront_logs WHERE "date" BETWEEN DATE '2018-06-09' AND DATE '2018-06-11' LIMIT 100;

In some cases, you need to eliminate empty values from the results of CREATE TABLE query for CloudFront. To do so, run:

SELECT DISTINCT * FROM cloudfront_logs LIMIT 10;

For more information, see the AWS Big Data Blog post Build a Serverless Architecture to Analyze Amazon CloudFront Access Logs Using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics.