Real-Time Web Analytics with Kinesis Data Analytics
Real-Time Web Analytics with Kinesis Data Analytics

Appendix A: Code Components

The Real-Time Web Analytics with Kinesis Data Analytics solution uses four main code components to process and display metrics on the real-time dashboard. Beacon JavaScript code running in the user's web browser sends requests with header information to the solution's beacon web server. The Amazon Kinesis Data Analytics application (WebMetricsApplication) runs SQL queries against the in-application streams and emits the results. A JavaScript file (dash.js) populates the chart with the results of the queries, and an HTML file (index.html) renders the chart on the dashboard in real-time.

The following example shows the beacon code, SQL, JavaScript, and HTML code for the top_pages metric.

Beacon Code

The beacon code is used to send header information about pages to the solution's web server, which is then sent to the Kinesis data delivery stream. Note that the beacon_url can be found in the AWS CloudFormation stack Outputs.

var http = new XMLHttpRequest(); var url = beacon_url; //from Outputs section of CloudFormation stack"POST", beacon_url); http.setRequestHeader("event","click"); http.setRequestHeader("page","productpage.html"); http.setRequestHeader("clientid","user123"); http.send();

The page header values will be displayed in the top pages chart as a count over a ten second window.

SQL Query

The SQL query calculates the top pages, in 10-second intervals, based on page views. The result is stored in an output in-application stream (DESTINATION_SQL_STREAM) with the name of the metric (top_pages) and the corresponding values.

CREATE OR REPLACE PUMP "PAGEVIEWS_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM" ( MetricType, EventTimestamp, MetricItem, UnitValueInt) SELECT 'top_pages', UNIX_TIMESTAMP(eventTimestamp), page, page_count FROM ( SELECT stream weblogs."page" as page, count(*) as page_count, STEP (CHAR_TO_TIMESTAMP('dd/MMM/yyyy:HH:mm:ss z',weblogs."datetime") by INTERVAL '10' SECOND) as eventTimestamp FROM "WASA_001" weblogs GROUP BY STEP (weblogs.ROWTIME BY INTERVAL '10' SECOND), STEP (CHAR_TO_TIMESTAMP('dd/MMM/yyyy:HH:mm:ss z',weblogs."datetime") by INTERVAL '10' SECOND), weblogs."page" HAVING count(*) > 1 ORDER BY STEP (weblogs.ROWTIME BY INTERVAL '10' SECOND), page_count desc );


The JavaScript (in the dash.js file) populates the chart with the top pages by page views.

switch(mtype) { case 'hourly_events' : makeBarChart(mtype, items); break; case 'event_anomaly' : makeAmomalyBarChart(mtype, items); break; case 'agent_count' : makePieChart(mtype, items); break; case 'referral_count' : case 'top_pages' : makeHorizontalBarChart(mtype,items); break; case 'visitor_count' : document.getElementById(mtype).innerHTML = 'Current Visitor Count:' + items[0].UNITVALUEINT; makeVisitorLineChart(mtype,items); break; case 'event_count' : makeEventLineChart(mtype,items); break; }

HTML Element

The HTML element (in the index.html file) renders the top_pages chart with the results of the SQL query.

<div class="row aws-mb-l"> <div class="col-xs-10 col-xs-offset-1 col-xs-12"> <div class="x_title"> <h3>Pages</h3> </div> <div class="x_content"> <canvas id="top_pages" ts="0"></canvas> </div> </div> </div>