Step 1. Launch the Stack - Real-Time Analytics with Spark Streaming

Step 1. Launch the Stack

The automated AWS CloudFormation templates deploy Real-Time Analytics with Spark Streaming on the AWS Cloud using either your own Spark Streaming application or the AWS-provided demo application.

Note

You are responsible for the cost of the AWS services used while running this solution. See the Cost section for more details. For full details, see the pricing webpage for each AWS service you will be using in this solution.

  1. Log in to the AWS Management Console and click the button below to launch the solution or demo application AWS CloudFormation template.

    
                            Real-Time Analytics demo launch button
                        
                            Real-Time Analytics launch button

    You can also download the template as a starting point for your own implementation.

  2. The template is launched in the US East (N. Virginia) Region by default. To launch the Real-Time Analytics solution in a different AWS Region, use the region selector in the console navigation bar.

    Note

    This solution uses AWS Lambda, which is currently available in specific AWS Regions only. Therefore, you must launch this solution in a region where Lambda is available. For the most current AWS Lambda availability by region, see the AWS service offerings by region.

  3. On the Select Template page, verify that you selected the correct template and choose Next.

  4. On the Specify Details page, assign a name to your Real-Time Analytics solution stack.

  5. Under Parameters, review the parameters for the template, and modify them as necessary. This solution uses the following default values.

    Parameter Default Description
    Key Name <Requires input> Public and private key pair, which allows you to connect securely to the bastion host. When you created an AWS account, this is the key pair you created in your preferred AWS Region.
    Remote Access CIDR <Requires input> The IP address range that can be used to SSH to the bastion host
    Availability Zones <Requires input> The list of Availability Zones to use for the Amazon VPC subnets
    Number of AZs 2 The number of Availability Zones to use in your VPC
    Note

    This number must match the number of selections you chose for the Availability Zone parameter.

    VPC CIDR 10.0.0.0/16 The VPC CIDR block
    Private Subnet 1A CIDR 10.0.0.0/19 The CIDR block for the private subnet located in AZ1
    Private Subnet 2A CIDR 10.0.32.0/19 The CIDR block for the private subnet located in AZ2
    Public Subnet 1 CIDR 10.0.128.0/20 The CIDR block for the public DMZ subnet located in AZ1
    Public Subnet 2 CIDR 10.0.144.0/20 The CIDR block for the public DMZ subnet located in AZ2
    Kinesis Stream default-data-stream The name of the source Amazon Kinesis stream the template will create
    Shard count 2 The number of shards for your Amazon Kinesis stream
    Master r5.xlarge The EMR master node Amazon EC2 instance type
    Core r5.xlarge The EMR core node Amazon EC2 instance type
    Artifact Bucket <Requires input> The Amazon S3 bucket where your application artifacts are stored. For example, s3://{bucket_location}
    Submit Mode AppJar The processing engine of the Spark Streaming application. Choose the appropriate processing engine (Zeppelin for JSON templates; AppJar for JAR files).
    Note

    This parameter will be set to DemoApp if you are using the demo template.

    Type None The submit type of the Spark Streaming application. You can specify a submit script or a submit command.
    Note

    Use this parameter only if you choose AppJar as your Submit Mode. This parameter will not show if you are using the demo template.

    Script <Optional input> The Amazon S3 bucket where your script with the Spark submit command is stored. For example, s3://{bucket_location/spark_submit.sh}
    Note

    Use this parameter only if you choose AppJar as your Submit Mode. If you choose Command as your Submit Type, leave this parameter blank. This parameter will not show if you are using the demo template.

    Command <Optional input> A comma-delimited Spark submit command. For example, --deploy-mode,{cluster/client},--class {className}, --master,{yarn/local[?]},{s3://AppLocation/AppJar}, {Appname},{StreamName},{OutputLoc}
    Note

    Use this parameter only if you choose AppJar as your Submit Mode. If you choose Script as your Submit Type, leave this parameter blank. This parameter will not show if you are using the demo template.

  6. Select Next.

  7. On the Options page, choose Next.

  8. On the Review page, review and confirm the settings. Check the box acknowledging that the template will create AWS Identity and Access Management (IAM) resources and might require the CAPABILITY_AUTO_EXPAND capability.

  9. Choose Create to deploy the stack.

    You can view the status of the stack in the AWS CloudFormation console in the Status column. You should see a status of CREATE_COMPLETE in roughly 15-20 minutes.

    Note

    This solution includes two AWS Lambda functions that run only during initial configuration or when resources are updated or deleted.

    When running this solution, you will see both Lambda functions in the AWS Lambda console. Do not delete the functions as they are necessary to manage associated resources.