Menu
Amazon EMR
Amazon EMR Release Guide

Create a Cluster With Spark

The following procedure creates a cluster with Spark installed using Quick Options in the EMR console. Use Advanced Options to further customize your cluster setup, and use Step execution mode to programmatically install applications and then execute custom applications that you submit as steps. With either of these advanced options, you can choose to use AWS Glue as your Spark SQL metastore. See Using the AWS Glue Data Catalog as the Metastore for Spark SQL for more information.

To launch a cluster with Spark installed

  1. Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/.

  2. Choose Create cluster to use Quick Create.

  3. For Software Configuration, choose Amazon Release Version or later.

  4. For Select Applications, choose either All Applications or Spark.

  5. Select other options as necessary and then choose Create cluster.

    Note

    To configure Spark when you are creating the cluster, see Configure Spark.

To launch a cluster with Spark installed using the AWS CLI

  • Create the cluster with the following command:

    Copy
    aws emr create-cluster --name "Spark cluster" --release-label --applications Name=Spark \ --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --use-default-roles

Note

Linux line continuation characters (\) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

To launch a cluster with Spark installed using the SDK for Java

Specify Spark as an application with SupportedProductConfig used in RunJobFlowRequest.

  • The following Java program excerpt shows how to create a cluster with Spark:

    Copy
    AmazonElasticMapReduceClient emr = new AmazonElasticMapReduceClient(credentials); Application sparkApp = new Application() .withName("Spark"); Applications myApps = new Applications(); myApps.add(sparkApp); RunJobFlowRequest request = new RunJobFlowRequest() .withName("Spark Cluster") .withApplications(myApps) .withReleaseLabel("") .withInstances(new JobFlowInstancesConfig() .withEc2KeyName("myKeyName") .withInstanceCount(1) .withKeepJobFlowAliveWhenNoSteps(true) .withMasterInstanceType("m3.xlarge") .withSlaveInstanceType("m3.xlarge") ); RunJobFlowResult result = emr.runJobFlow(request);