Menu
Amazon EMR
Amazon EMR Release Guide

Create a Cluster With Spark

To launch a cluster with Spark installed using the console

The following procedure creates a cluster with Spark installed. For more information about launching clusters with the console, see Step 3: Launch an Amazon EMR Cluster in the Amazon EMR Management Guide.

  1. Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/.

  2. Choose Create cluster to use Quick Create.

  3. For Software Configuration, choose Amazon Release Version or later.

  4. For Select Applications, choose either All Applications or Spark.

  5. Select other options as necessary and then choose Create cluster.

    Note

    To configure Spark when you are creating the cluster, see Configure Spark.

To launch a cluster with Spark installed using the AWS CLI

  • Create the cluster with the following command:

    aws emr create-cluster --name "Spark cluster" --release-label  --applications Name=Spark \
    --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --use-default-roles

Note

For Windows, replace the above Linux line continuation character (\) with the caret (^).

To launch a cluster with Spark installed using the SDK for Java

Specify Spark as an application with SupportedProductConfig used in RunJobFlowRequest.

  • The following Java program excerpt shows how to create a cluster with Spark:

    AmazonElasticMapReduceClient emr = new AmazonElasticMapReduceClient(credentials);
    
    Application sparkApp = new Application()
        .withName("Spark");
    Applications myApps = new Applications();
    myApps.add(sparkApp);
    
    RunJobFlowRequest request = new RunJobFlowRequest()
        .withName("Spark Cluster")
        .withApplications(myApps)
        .withReleaseLabel("")
        .withInstances(new JobFlowInstancesConfig()
        .withEc2KeyName("myKeyName")
        .withInstanceCount(1)
            .withKeepJobFlowAliveWhenNoSteps(true)
            .withMasterInstanceType("m3.xlarge")
            .withSlaveInstanceType("m3.xlarge")
        );			
    RunJobFlowResult result = emr.runJobFlow(request);