Appendix A: Demo Application - Real-Time Analytics with Spark Streaming

Appendix A: Demo Application

The Real-Time Analytics with Spark Streaming solution includes a demo application for testing purposes. Deploying this solution with the demo application builds the following environment in the AWS Cloud.


      Real-Time Analytics with Spark Streaming demo application architectural
        overview

Figure 2: Real-Time Analytics with Spark Streaming architecture with demo application

When you use the demo application, the solution deploys an additional private subnet with a sample data producer and uploads the demo application code to your existing Amazon Simple Storage Service (Amazon S3) bucket. The demo application sends sample data through the NAT gateway to Amazon Kinesis Data Streams. In this architecture, the bastion host provides SSH access to the Amazon EMR cluster and the sample data producer, and the filtered data is stored in Amazon S3.

By default, the data producer will only run for a few seconds (creating one file on Amazon S3). To run the producer manually, take the following steps to replace the necessary values including <mykeypair>, <bastion-host-dns>, <kinesis-producer-IP> address, <default-data-stream>, and <your-AWS-Region>.

# Copy pem file to bastion host using SCP scp -i "<mykeypair>.pem" <mykeypair>.pem ec2-user@<bastion-host-dns>:<mykeypair>.pem # Login to bastion host ssh -i "<mykeypair>.pem" ec2-user@<bastion-host-dns> # Login to Kinesis Producer ssh -i "<mykeypair>.pem" ec2-user@<kinesis-producer-IP> # Run producer for 600 seconds (10 minutes) sudo java -jar /home/ec2-user/kinesis-producer.jar <default-data-stream> <your-AWS-Region> 600