使用 Apache Spark 建立叢集

下列程序會在 Amazon EMR 主控台中使用透過快速選項安裝的 Spark 來建立叢集。

您也可以使用進階選項進一步自訂叢集設定，或提交步驟來以程式設計方式安裝應用程式，然後執行自訂應用程式。使用任一叢集建立選項，您可以選擇將 AWS Glue 用作您的 Spark SQL 中繼存放區。如需詳細資訊，請參閱使用「 AWS Glue 合資料目錄」做為星火 SQL 的中繼存放區。

使用安裝的 Spark 啟動叢集

在以下位置打開 Amazon EMR 控制台 https://console.aws.amazon.com/emr。
選擇建立叢集，以使用快速建立。
輸入叢集名稱。您的叢集名稱不能包含 <、>、$、| 或 `(反引號) 字元。
對於軟體組態，請選擇版本選項。
對於應用程式，請選擇 Spark 應用程式套件。
依需要選取其他選項，然後選擇 Create cluster (建立叢集)。

注意
若要在建立叢集時設定 Spark，請參閱設定 Spark。

若要啟動已安裝 Spark 的叢集，請使用 AWS CLI

使用下列命令建立一個叢集。


aws emr create-cluster --name "Spark cluster" --release-label emr-7.1.0 --applications Name=Spark \
--ec2-attributes KeyName=myKey --instance-type m5.xlarge --instance-count 3 --use-default-roles

注意

包含 Linux 行接續字元 (\) 是為了提高可讀性。它們可以在 Linux 命令中移除或使用。對於 Windows，請將其移除或取代為插入符號 (^)。

若要使用適用於 Java 的 SDK 啟動已安裝 Spark 的叢集

指定 Spark 做為與在 SupportedProductConfig 中使用之 RunJobFlowRequest 搭配使用的應用程式。

以下範例說明如何使用 Java 透過 Spark 建立叢集。



import com.amazonaws.AmazonClientException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduce;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClientBuilder;
import com.amazonaws.services.elasticmapreduce.model.*;
import com.amazonaws.services.elasticmapreduce.util.StepFactory;

public class Main {

        public static void main(String[] args) {
                AWSCredentials credentials_profile = null;
                try {
                        credentials_profile = new ProfileCredentialsProvider("default").getCredentials();
                } catch (Exception e) {
                        throw new AmazonClientException(
                                        "Cannot load credentials from .aws/credentials file. " +
                                                        "Make sure that the credentials file exists and the profile name is specified within it.",
                                        e);
                }

                AmazonElasticMapReduce emr = AmazonElasticMapReduceClientBuilder.standard()
                                .withCredentials(new AWSStaticCredentialsProvider(credentials_profile))
                                .withRegion(Regions.US_WEST_1)
                                .build();

                // create a step to enable debugging in the AWS Management Console
                StepFactory stepFactory = new StepFactory();
                StepConfig enabledebugging = new StepConfig()
                                .withName("Enable debugging")
                                .withActionOnFailure("TERMINATE_JOB_FLOW")
                                .withHadoopJarStep(stepFactory.newEnableDebuggingStep());

                Application spark = new Application().withName("Spark");

                RunJobFlowRequest request = new RunJobFlowRequest()
                                .withName("Spark Cluster")
                                .withReleaseLabel("emr-5.20.0")
                                .withSteps(enabledebugging)
                                .withApplications(spark)
                                .withLogUri("s3://path/to/my/logs/")
                                .withServiceRole("EMR_DefaultRole")
                                .withJobFlowRole("EMR_EC2_DefaultRole")
                                .withInstances(new JobFlowInstancesConfig()
                                                .withEc2SubnetId("subnet-12ab3c45")
                                                .withEc2KeyName("myEc2Key")
                                                .withInstanceCount(3)
                                                .withKeepJobFlowAliveWhenNoSteps(true)
                                                .withMasterInstanceType("m4.large")
                                                .withSlaveInstanceType("m4.large"));
                RunJobFlowResult result = emr.runJobFlow(request);
                System.out.println("The cluster ID is " + result.toString());
        }
}

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

Spark

在 Amazon EMR 6.x 上使用 Docker 執行 Spark 應用程式