本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
AWS SDK for Java 提供三個具有 Amazon EMR 功能的套件:
如需有關這些套件的詳細資訊,請參閱 AWS SDK for Java API 參考。
以下範例說明 SDK 如何利用 Amazon EMR 簡化程式設計。下方程式碼範例使用 StepFactory
物件 (用來建立一般 Amazon EMR 步驟類型的協助程式類別) 來建立已啟用偵錯功能的互動式 Hive 叢集。
import com.amazonaws.AmazonClientException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduce;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClientBuilder;
import com.amazonaws.services.elasticmapreduce.model.*;
import com.amazonaws.services.elasticmapreduce.util.StepFactory;
public class Main {
public static void main(String[] args) {
AWSCredentialsProvider profile = null;
try {
credentials_profile = new ProfileCredentialsProvider("default"); // specifies any named profile in
// .aws/credentials as the credentials provider
} catch (Exception e) {
throw new AmazonClientException(
"Cannot load credentials from .aws/credentials file. " +
"Make sure that the credentials file exists and that the profile name is defined within it.",
e);
}
// create an EMR client using the credentials and region specified in order to
// create the cluster
AmazonElasticMapReduce emr = AmazonElasticMapReduceClientBuilder.standard()
.withCredentials(credentials_profile)
.withRegion(Regions.US_WEST_1)
.build();
// create a step to enable debugging in the AWS Management Console
StepFactory stepFactory = new StepFactory();
StepConfig enabledebugging = new StepConfig()
.withName("Enable debugging")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(stepFactory.newEnableDebuggingStep());
// specify applications to be installed and configured when EMR creates the
// cluster
Application hive = new Application().withName("Hive");
Application spark = new Application().withName("Spark");
Application ganglia = new Application().withName("Ganglia");
Application zeppelin = new Application().withName("Zeppelin");
// create the cluster
RunJobFlowRequest request = new RunJobFlowRequest()
.withName("MyClusterCreatedFromJava")
.withReleaseLabel("emr-5.20.0") // specifies the EMR release version label, we recommend the latest release
.withSteps(enabledebugging)
.withApplications(hive, spark, ganglia, zeppelin)
.withLogUri("s3://path/to/my/emr/logs") // a URI in S3 for log files is required when debugging is enabled
.withServiceRole("EMR_DefaultRole") // replace the default with a custom IAM service role if one is used
.withJobFlowRole("EMR_EC2_DefaultRole") // replace the default with a custom EMR role for the EC2 instance
// profile if one is used
.withInstances(new JobFlowInstancesConfig()
.withEc2SubnetId("subnet-12ab34c56")
.withEc2KeyName("myEc2Key")
.withInstanceCount(3)
.withKeepJobFlowAliveWhenNoSteps(true)
.withMasterInstanceType("m4.large")
.withSlaveInstanceType("m4.large"));
RunJobFlowResult result = emr.runJobFlow(request);
System.out.println("The cluster ID is " + result.toString());
}
}
至少,您必須通過分別對應至 EMR_DefaultRole 和 EMR_EC2_DefaultRole 的服務角色和 jobflow 角色。您可以針對相同帳戶叫用此 AWS CLI 命令來執行此操作。首先,檢視該角色是否已存在:
aws iam list-roles | grep EMR
若執行個體描述檔 (EMR_EC2_DefaultRole) 和服務角色 (EMR_DefaultRole) 皆存在,它們都將顯示於:
"RoleName": "EMR_DefaultRole", "Arn": "arn:aws:iam::
AccountID
:role/EMR_DefaultRole" "RoleName": "EMR_EC2_DefaultRole", "Arn": "arn:aws:iam::AccountID
:role/EMR_EC2_DefaultRole"
如果預設的角色不存在,您可以使用以下命令以建立它們:
aws emr create-default-roles