Real-Time Web Analytics with Kinesis Data Analytics
Real-Time Web Analytics with Kinesis Data Analytics

Solution Components

Beacon Code

The Real-Time Web Analytics with Kinesis Data Analytics solution includes beacon code that you add to your web server. This code enables your users' web browsers to send requests to the solution's web servers. The requests include header information that is used to calculate the metrics that are displayed as a count over a 10 second time window on the solution's real-time dashboard.

var http = new XMLHttpRequest(); var url = beacon_url; //from Outputs section of CloudFormation stack"POST", beacon_url); http.setRequestHeader("event","click"); http.setRequestHeader("clientid","user123"); http.setRequestHeader("page", window.location.pathname.split("/").slice(-1)); http.setRequestHeader("referer",document.referrer); http.setRequestHeader("custom_metric_name","userAgent"); http.setRequestHeader("custom_metric_string_value",navigator.userAgent); http.send();

Figure 3: Sample beacon code

Metric Headers

This solution includes headers you use to collect website usage metrics that are displayed in the dashboard. The following headers are passed to the beacon server.

  • event: String used to capture the user event (For example, click, pageview, playvideo, conversion, exception, login, logoff)

  • page: Webpage associated with the event (For example, window.location.pathname or window.location.pathname.split("/").slice(-1))

  • referrer: Referring page (For example, document.referrer)

The solution also includes headers you can use to include custom metrics on the real-time dashboard. The following headers define the name of the custom metric and the different data types of the values that are passed to the beacon server.

  • custom_metric_name: Defines the name of your custom metric

  • custom_metric_int_value: For custom metrics with integer values

  • custom_metric_float_value: For custom metrics with float values

  • custom_metric_string_value: For custom metrics with string values

Beacon Web Servers

Real-Time Web Analytics with Kinesis Data Analytics uses two Amazon Elastic Compute Cloud (Amazon EC2) instances for the beacon web servers. You can choose from two preset configurations to support your anticipated request traffic: 50K requests per minute (t2.medium instance) or 100K requests per minute (m5.large instance). Note that the Auto Scaling group will scale in the preset increment (50K or 100K).

Amazon Kinesis Data Analytics Application

This solution includes an Amazon Kinesis Data Analytics application with SQL statements that compute metrics for the built-in dashboard. The application reads records from the Amazon Kinesis Data Firehose delivery stream, and runs the SQL queries to emit specific website clickstream metrics, which are stored in Amazon DynamoDB. For more information, see Appendix A.

Amazon DynamoDB

The Real-Time Web Analytics with Kinesis Data Analytics solution creates an Amazon DynamoDB table: Metrics.

The Metrics table stores the following information on metrics computed by the Amazon Kinesis Data Analytics application:

  • MetricType: The name of the computed metric

  • AmendmentStrategy: The amendment strategy for metric detail items with identical timestamps. For more information, see Amendment Strategy.

  • IsSet: Indicates whether there is more than one data point in a metric detail item. For one data point, set this item to false. For more than one data point, set this item to true.

  • IsWholeNumber: Indicates whether the value of this metric is an integer or a float value. For an integer value, set this item to true. For a float value, set this item to false.

Amendment Strategy

This solution includes an amendment strategy that defines how the solution handles a new record with the same timestamp as a record that has already been received. You can choose from the following three options:

  • Add: Adds values from the new record to values from the existing record. For example, if a record is received for event_count with a value of logon 10 that has the same timestamp as an existing event_count record that has a value of logon 4, the values will be added together (10 + 4) and the existing record will be updated with the new value: logon 14.

  • Replace: Replaces the existing record with the new record.

  • Replace existing: New and existing records are merged, with existing values in the existing record overwritten by the new values. For example, if a record is received for event_count with values of logon 10 and click 2 that has the same timestamp as an existing event_count record that has values of logon 4 and logoff 2, the existing record's new values will be logon 10, logoff 2, and click 2.

Web Usage Dashboard

The solution features a simple dashboard that loads data from Amazon DynamoDB into line charts every 10 seconds and bar charts every minute. The dashboard leverages Amazon Cognito for user authentication and is powered by web assets hosted in an Amazon Simple Storage Service (Amazon S3) bucket.

The dashboard uses the open-source chart.js JavaScript library to draw charts using HTML5. The index.html file contains the HTML elements that render the charts in the dashboard. The dash.js file in the js folder contains the JavaScript that populates the dashboard with metrics. The Kinesis data application contains the SQL queries that compute metrics. For more information, see Appendix A.

After you successfully launch the solution, you will receive an email with instructions for logging into the dashboard.

The dashboard can also be customized to include additional metrics. For more information, see Appendix B.

        Real-time web usage dashboard

Figure 4: Real-time web usage dashboard

Anomaly Detection

The Real-Time Web Analytics with Kinesis Data Analytics solution leverages the built-in anomaly detection of Amazon Kinesis. This solution calculates an anomaly score based on a comparison of the last 256 events to a random set of the last 100,000 events. When an anomaly score hits the threshold, the events are displayed in the solution's anomaly detection graph. For example, the solution might record 50 logon events every 10 seconds and 0 exception events every 10 seconds. If the solution then records five logon events every 10 seconds and 30 exception events every 10 seconds, the solution will detect the anomaly and display it in the graph.

Amazon CloudWatch Dashboard

This solution provides an optional dashboard you can use to monitor the performance of your beacon web servers. The dashboard displays custom operational Amazon CloudWatch metrics for your beacon web servers, including the number of healthy beacon web servers, the average processed network packets, aggregate requests, 5XX errors, and Amazon DynamoDB throughput capacity and throttling.

        CloudWatch metrics dashboard

Figure 5: CloudWatch metrics dashboard