Gaming Analytics Pipeline
Gaming Analytics Pipeline

Solution Components

S3Connector Application

The Gaming Analytics Pipeline includes a Kinesis Client Library (KCL) consumer application (S3Connector) running on top of AWS Elastic Beanstalk. The S3Connector validates each event to ensure that the event is formatted correctly and that all required fields are present. The application also sanitizes incoming events to ensure that they are valid to prevent downstream issues loading the events into Amazon Redshift. Then, the application batches all valid events in memory, periodically compresses those batches using GZIP and writes them to Amazon Simple Storage Service (Amazon S3) as JSON files that contain one event per line. To see an excerpt from a sample batch file, see Appendix D.

To make it easier to find event data from a particular date and time, the solution stores batches as <bucket-name>/events/<YYYY>/<MM>/<DD>/<HH>/<start-sequence-number>–<end-sequence-number>.gzip.

When the batch is successfully written, the S3Connector sends a pointer to the location of the batch to the file stream to initiate the process of loading it into Amazon Redshift.

RedshiftConnector Application

A second KCL application (RedshiftConnector) loads batches of events from Amazon S3. The application uses the manifest file load feature which allows you to specify the name of a JSON-formatted text file that explicitly lists the files to be loaded in the Amazon Redshift COPY command.

The RedshiftConnector also deletes duplicate events that may be introduced by Amazon Kinesis-related retries or during backfills of old data. Then, the application inserts the events into the Amazon Redshift tables.

CronConnector Application

A third application (CronConnector) performs routine database tasks and maintenance. For example, the CronConnector application performs Amazon Redshift vacuuming. Amazon Redshift does not automatically reclaim and reuse space that is freed when you delete rows and update rows. Vacuuming reclaims the space and resorts the rows. For more information, see Vacuuming Tables in the Amazon Redshift Database Developer Guide.

Data and Heat Map Generators

The Gaming Analytics Pipeline includes a data generator you can use to test the pipeline and a heat map generator that allows you to generate heat maps based on various parameters such as event type and game map. The data generator can generate random data, or it can get events from a sample file included with the solution. Use the sample file if you want to generate a sample heat map.

These generators are deployed as Python scripts on an Amazon Elastic Compute Cloud (Amazon EC2) t2.micro instance running Microsoft Windows Server 2016 with SQL Server 2017 Express. The generators require Python 2.7.x.