Automated Deployment - Real-Time Analytics with Spark Streaming

Automated Deployment

Before you launch the automated deployment, please review the implementation considerations and prerequisites discussed in this guide. Follow the step-by-step instructions in this section to configure and deploy the Real-Time Analytics solution into your account.

Time to deploy: Approximately 15 minutes

Prerequisites

Review the application requirements and processing options in Implementation Considerations.

  • Before you deploy the solution, you must upload your working Spark Streaming application to an Amazon Simple Storage Service (Amazon S3) bucket. If you are using a Spark Submit script to launch your custom application, you must have a spark-submit.sh file with the Spark Submit command in an Amazon S3 bucket.

  • Remember that you can only deploy one running application with a unique name at a time with this solution (see Single Application Deployment for detailed information).

  • Before you deploy the solution, you must enable the Amazon EMR web interfaces to view Apache Zeppelin, the Spark History UI, and the Resource Manager. For more information, see EMR Web Interfaces.

  • You must also configure dynamic port forwarding to connect to the bastion host to securely access the Amazon EMR web interfaces. For more information, see View Web Interfaces Hosted on Amazon EMR Clusters in the Amazon EMR Management Guide.

What We'll Cover

The procedure for deploying this architecture on AWS consists of the following steps. For detailed instructions, follow the links for each step.

Step 1. Launch the Stack

  • Launch the AWS CloudFormation template into your AWS account.

  • Enter values for required parameters.

  • Review the other template parameters, and adjust if necessary.

Step 2. Stop a Running Application

  • Navigate to the EMR cluster’s Resource Manager.

  • Stop the running application.