Amazon EMR 示例使用 AWS CLI - AWS Command Line Interface

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

Amazon EMR 示例使用 AWS CLI

下列程式碼範例說明如何透過 AWS Command Line Interface 搭配 Amazon EMR 使用來執行動作和實作常見案例。

Actions 是大型程式的程式碼摘錄,必須在內容中執行。雖然動作會告訴您如何呼叫個別服務函數,但您可以在其相關情境和跨服務範例中查看內容中的動作。

Scenarios (案例) 是向您展示如何呼叫相同服務中的多個函數來完成特定任務的程式碼範例。

每個範例都包含一個連結 GitHub,您可以在其中找到如何在內容中設定和執行程式碼的指示。

主題

動作

下列程式碼範例會示範如何使用add-instance-fleet

AWS CLI

若要將作業執行個體叢集新增至叢集

此範例會將新的工作執行個體叢集新增至指定的叢集。

命令:

aws emr add-instance-fleet --cluster-id 'j-12ABCDEFGHI34JK' --instance-fleet InstanceFleetType=TASK,TargetSpotCapacity=1,LaunchSpecifications={SpotSpecification='{TimeoutDurationMinutes=20,TimeoutAction=TERMINATE_CLUSTER}'},InstanceTypeConfigs=['{InstanceType=m3.xlarge,BidPrice=0.5}']

輸出:

{ "ClusterId": "j-12ABCDEFGHI34JK", "InstanceFleetId": "if-23ABCDEFGHI45JJ" }
  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考AddInstanceFleet中的。

下列程式碼範例會示範如何使用add-steps

AWS CLI

1。若要將自訂 JAR 步驟新增至叢集

命令:

aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://mybucket/mytest.jar,Args=arg1,arg2,arg3 Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://mybucket/mytest.jar,MainClass=mymainclass,Args=arg1,arg2,arg3

必要參數:

Jar

可選參數:

Type, Name, ActionOnFailure, Args

輸出:

{ "StepIds":[ "s-XXXXXXXX", "s-YYYYYYYY" ] }

2. 若要將串流步驟新增至叢集

命令:

aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=STREAMING,Name='Streaming Program',ActionOnFailure=CONTINUE,Args=[-files,s3://elasticmapreduce/samples/wordcount/wordSplitter.py,-mapper,wordSplitter.py,-reducer,aggregate,-input,s3://elasticmapreduce/samples/wordcount/input,-output,s3://mybucket/wordcount/output]

必要參數:

Type, Args

可選參數:

Name, ActionOnFailure

JSON 等效(步驟的內容):

[ { "Name": "JSON Streaming Step", "Args": ["-files","s3://elasticmapreduce/samples/wordcount/wordSplitter.py","-mapper","wordSplitter.py","-reducer","aggregate","-input","s3://elasticmapreduce/samples/wordcount/input","-output","s3://mybucket/wordcount/output"], "ActionOnFailure": "CONTINUE", "Type": "STREAMING" } ]

注意:JSON 引數必須包含選項和值作為清單中自己的項目。

命令(使用步驟 JSON):

aws emr add-steps --cluster-id j-XXXXXXXX --steps file://./step.json

輸出:

{ "StepIds":[ "s-XXXXXXXX", "s-YYYYYYYY" ] }

3. 將包含多個檔案的串流步驟新增至叢集 (僅限 JSON)

JSON(多個文件。JSON):

[ { "Name": "JSON Streaming Step", "Type": "STREAMING", "ActionOnFailure": "CONTINUE", "Args": [ "-files", "s3://mybucket/mapper.py,s3://mybucket/reducer.py", "-mapper", "mapper.py", "-reducer", "reducer.py", "-input", "s3://mybucket/input", "-output", "s3://mybucket/output"] } ]

命令:

aws emr add-steps --cluster-id j-XXXXXXXX --steps file://./multiplefiles.json

必要參數:

Type, Args

可選參數:

Name, ActionOnFailure

輸出:

{ "StepIds":[ "s-XXXXXXXX", ] }

4. 若要將 Hive 步驟新增至叢集

命令:

aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=HIVE,Name='Hive program',ActionOnFailure=CONTINUE,Args=[-f,s3://mybucket/myhivescript.q,-d,INPUT=s3://mybucket/myhiveinput,-d,OUTPUT=s3://mybucket/myhiveoutput,arg1,arg2] Type=HIVE,Name='Hive steps',ActionOnFailure=TERMINATE_CLUSTER,Args=[-f,s3://elasticmapreduce/samples/hive-ads/libs/model-build.q,-d,INPUT=s3://elasticmapreduce/samples/hive-ads/tables,-d,OUTPUT=s3://mybucket/hive-ads/output/2014-04-18/11-07-32,-d,LIBS=s3://elasticmapreduce/samples/hive-ads/libs]

必要參數:

Type, Args

可選參數:

Name, ActionOnFailure

輸出:

{ "StepIds":[ "s-XXXXXXXX", "s-YYYYYYYY" ] }

5. 若要將 Pig 步驟新增至叢集

命令:

aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=PIG,Name='Pig program',ActionOnFailure=CONTINUE,Args=[-f,s3://mybucket/mypigscript.pig,-p,INPUT=s3://mybucket/mypiginput,-p,OUTPUT=s3://mybucket/mypigoutput,arg1,arg2] Type=PIG,Name='Pig program',Args=[-f,s3://elasticmapreduce/samples/pig-apache/do-reports2.pig,-p,INPUT=s3://elasticmapreduce/samples/pig-apache/input,-p,OUTPUT=s3://mybucket/pig-apache/output,arg1,arg2]

必要參數:

Type, Args

可選參數:

Name, ActionOnFailure

輸出:

{ "StepIds":[ "s-XXXXXXXX", "s-YYYYYYYY" ] }

6. 若要將黑斑羚步驟新增至叢集

命令:

aws emr add-steps --cluster-id j-XXXXXXXX --steps Type=IMPALA,Name='Impala program',ActionOnFailure=CONTINUE,Args=--impala-script,s3://myimpala/input,--console-output-path,s3://myimpala/output

必要參數:

Type, Args

可選參數:

Name, ActionOnFailure

輸出:

{ "StepIds":[ "s-XXXXXXXX", "s-YYYYYYYY" ] }
  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考AddSteps中的。

下列程式碼範例會示範如何使用add-tags

AWS CLI

1。若要將標記新增至叢集

命令:

aws emr add-tags --resource-id j-xxxxxxx --tags name="John Doe" age=29 sex=male address="123 East NW Seattle"

輸出:

None

2. 若要列出叢集的標籤

-命令:

aws emr describe-cluster --cluster-id j-XXXXXXYY --query Cluster.Tags

輸出:

[ { "Value": "male", "Key": "sex" }, { "Value": "123 East NW Seattle", "Key": "address" }, { "Value": "John Doe", "Key": "name" }, { "Value": "29", "Key": "age" } ]
  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考AddTags中的。

下列程式碼範例會示範如何使用create-cluster-examples

AWS CLI

以下大多數範例假設您已指定 Amazon EMR 服務角色和 Amazon EC2 執行個體設定檔。如果尚未執行此操作,則必須在建立叢集時指定每個必要的 IAM 角色或使用--use-default-roles參數。如需有關指定 IAM 角色的詳細資訊,請參閱 Amazon EMR 管理指南中的為 AWS 服務設定適用於 Amazon EMR 許可的 IAM 角色。

範例 1:建立叢集

下列create-cluster範例會建立簡單的 EMR 叢集。

aws emr create-cluster \ --release-label emr-5.14.0 \ --instance-type m4.large \ --instance-count 2

此命令不會產生輸出。

範例 2:若要建立具有預設值 ServiceRole 和 InstanceProfile 角色的 Amazon EMR 叢集

下列create-cluster範例會建立使用該--instance-groups組態的 Amazon EMR 叢集。

aws emr create-cluster \ --release-label emr-5.14.0 \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large

範例 3:若要建立使用執行個體叢集的 Amazon EMR 叢集

下列create-cluster範例會建立使用該--instance-fleets組態的 Amazon EMR 叢集,並為每個叢集指定兩個執行個體類型和兩個 EC2 子網路。

aws emr create-cluster \ --release-label emr-5.14.0 \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,SubnetIds=['subnet-ab12345c','subnet-de67890f'] \ --instance-fleets InstanceFleetType=MASTER,TargetOnDemandCapacity=1,InstanceTypeConfigs=['{InstanceType=m4.large}'] InstanceFleetType=CORE,TargetSpotCapacity=11,InstanceTypeConfigs=['{InstanceType=m4.large,BidPrice=0.5,WeightedCapacity=3}','{InstanceType=m4.2xlarge,BidPrice=0.9,WeightedCapacity=5}'],LaunchSpecifications={SpotSpecification='{TimeoutDurationMinutes=120,TimeoutAction=SWITCH_TO_ON_DEMAND}'}

範例 4:使用預設角色建立叢集

下列create-cluster範例會使用--use-default-roles參數來指定預設服務角色和執行個體設定檔。

aws emr create-cluster \ --release-label emr-5.9.0 \ --use-default-roles \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

範例 5:建立叢集並指定要安裝的應用程式

下列create-cluster範例使用--applications參數來指定 Amazon EMR 安裝的應用程式。這個例子安裝 Hadoop 的,蜂巢和豬。

aws emr create-cluster \ --applications Name=Hadoop Name=Hive Name=Pig \ --release-label emr-5.9.0 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

範例 6:若要建立包含 Spark 的叢集

下面的例子安裝星火。

aws emr create-cluster \ --release-label emr-5.9.0 \ --applications Name=Spark \ --ec2-attributes KeyName=myKey \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

範例 7:指定用於叢集執行個體的自訂 AMI

下列create-cluster範例會根據具有 ID 的 Amazon Linux AMI 建立叢集執行個體ami-a518e6df

aws emr create-cluster \ --name "Cluster with My Custom AMI" \ --custom-ami-id ami-a518e6df \ --ebs-root-volume-size 20 \ --release-label emr-5.9.0 \ --use-default-roles \ --instance-count 2 \ --instance-type m4.large

範例 8:若要自訂應用程式組態

下列範例會使用--configurations參數來指定包含 Hadoop 應用程式自訂的 JSON 組態檔。如需詳細資訊,請參閱《Amazon EMR 版本指南》中的設定應用程式

configurations.json 的內容:

[ { "Classification": "mapred-site", "Properties": { "mapred.tasktracker.map.tasks.maximum": 2 } }, { "Classification": "hadoop-env", "Properties": {}, "Configurations": [ { "Classification": "export", "Properties": { "HADOOP_DATANODE_HEAPSIZE": 2048, "HADOOP_NAMENODE_OPTS": "-XX:GCTimeRatio=19" } } ] } ]

下列範例參考configurations.json為本機檔案。

aws emr create-cluster \ --configurations file://configurations.json \ --release-label emr-5.9.0 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

下列範例會以 Amazon S3 中的檔案configurations.json形式參考。

aws emr create-cluster \ --configurations https://s3.amazonaws.com/myBucket/configurations.json \ --release-label emr-5.9.0 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

範例 9:使用主要、核心和工作執行個體群組建立叢集

下列create-cluster範例用--instance-groups來指定要用於主要、核心和任務執行個體群組的 EC2 執行個體類型和數量。

aws emr create-cluster \ --release-label emr-5.9.0 \ --instance-groups Name=Master,InstanceGroupType=MASTER,InstanceType=m4.large,InstanceCount=1 Name=Core,InstanceGroupType=CORE,InstanceType=m4.large,InstanceCount=2 Name=Task,InstanceGroupType=TASK,InstanceType=m4.large,InstanceCount=2

範例 10:指定叢集應在完成所有步驟後終止

下列create-cluster範例會用--auto-terminate來指定叢集應在完成所有步驟後自動關閉。

aws emr create-cluster \ --release-label emr-5.9.0 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

範例 11:指定叢集組態詳細資料,例如 Amazon EC2 key pair、網路組態和安全群組

下列create-cluster範例會使用名為的 Amazon EC2 key pair myKey 和名為的自訂執行個體設定檔建立叢集myProfile。金鑰配對可用來授權叢集節點 (通常是主節點) 的 SSH 連線。如需詳細資訊,請參閱《Amazon EMR 管理指南》中的使用 Amazon EC2 金鑰對進行安全殼層登入資料。

aws emr create-cluster \ --ec2-attributes KeyName=myKey,InstanceProfile=myProfile \ --release-label emr-5.9.0 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

下列範例會在 Amazon VPC 子網路中建立叢集。

aws emr create-cluster \ --ec2-attributes SubnetId=subnet-xxxxx \ --release-label emr-5.9.0 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

下列範例會在us-east-1b可用區域中建立叢集。

aws emr create-cluster \ --ec2-attributes AvailabilityZone=us-east-1b \ --release-label emr-5.9.0 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large

下列範例會建立叢集,並僅指定 Amazon EMR 管理的安全群組。

aws emr create-cluster \ --release-label emr-5.9.0 \ --service-role myServiceRole \ --ec2-attributes InstanceProfile=myRole,EmrManagedMasterSecurityGroup=sg-master1,EmrManagedSlaveSecurityGroup=sg-slave1 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large

下列範例會建立叢集,並僅指定其他 Amazon EC2 安全群組。

aws emr create-cluster \ --release-label emr-5.9.0 \ --service-role myServiceRole \ --ec2-attributes InstanceProfile=myRole,AdditionalMasterSecurityGroups=[sg-addMaster1,sg-addMaster2,sg-addMaster3,sg-addMaster4],AdditionalSlaveSecurityGroups=[sg-addSlave1,sg-addSlave2,sg-addSlave3,sg-addSlave4] \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large

下列範例會建立叢集,並指定 EMR 管理的安全性群組以及其他安全性群組。

aws emr create-cluster \ --release-label emr-5.9.0 \ --service-role myServiceRole \ --ec2-attributes InstanceProfile=myRole,EmrManagedMasterSecurityGroup=sg-master1,EmrManagedSlaveSecurityGroup=sg-slave1,AdditionalMasterSecurityGroups=[sg-addMaster1,sg-addMaster2,sg-addMaster3,sg-addMaster4],AdditionalSlaveSecurityGroups=[sg-addSlave1,sg-addSlave2,sg-addSlave3,sg-addSlave4] \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large

下列範例會在 VPC 私有子網路中建立叢集,並使用特定的 Amazon EC2 安全群組啟用 Amazon EMR 服務存取,這是私有子網路中叢集所需的。

aws emr create-cluster \ --release-label emr-5.9.0 \ --service-role myServiceRole \ --ec2-attributes InstanceProfile=myRole,ServiceAccessSecurityGroup=sg-service-access,EmrManagedMasterSecurityGroup=sg-master,EmrManagedSlaveSecurityGroup=sg-slave \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large

下列範例使用名為的 JSON 檔案指定安全性群組組態參數,ec2_attributes.json該檔案儲存在本機。注意:JSON 引數必須包含選項和值作為清單中自己的項目。

aws emr create-cluster \ --release-label emr-5.9.0 \ --service-role myServiceRole \ --ec2-attributes file://ec2_attributes.json \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large

ec2_attributes.json 的內容:

[ { "SubnetId": "subnet-xxxxx", "KeyName": "myKey", "InstanceProfile":"myRole", "EmrManagedMasterSecurityGroup": "sg-master1", "EmrManagedSlaveSecurityGroup": "sg-slave1", "ServiceAccessSecurityGroup": "sg-service-access", "AdditionalMasterSecurityGroups": ["sg-addMaster1","sg-addMaster2","sg-addMaster3","sg-addMaster4"], "AdditionalSlaveSecurityGroups": ["sg-addSlave1","sg-addSlave2","sg-addSlave3","sg-addSlave4"] } ]

範例 12:啟用偵錯並指定記錄 URI

下列create-cluster範例使用--enable-debugging參數,可讓您使用 Amazon EMR 主控台中的偵錯工具更輕鬆地檢視記錄檔。參--log-uri數是必需的--enable-debugging

aws emr create-cluster \ --enable-debugging \ --log-uri s3://myBucket/myLog \ --release-label emr-5.9.0 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

範例 13:建立叢集時新增標籤

標籤是索引鍵值配對,可協助您識別和管理叢集。下列create-cluster範例會使用--tags參數為叢集建立三個標籤,一個標籤具有索引鍵名稱name和值Shirley Rodriguez,第二個標籤含索引鍵名稱age和值29,以及第三個標籤 (含索引鍵名稱department和值) Analytics

aws emr create-cluster \ --tags name="Shirley Rodriguez" age=29 department="Analytics" \ --release-label emr-5.32.0 \ --instance-type m5.xlarge \ --instance-count 3 \ --use-default-roles

下列範例會列出套用至叢集的標籤。

aws emr describe-cluster \ --cluster-id j-XXXXXXYY \ --query Cluster.Tags

範例 14:使用啟用加密和其他安全功能的安全性組態

下列create-cluster範例會使用--security-configuration參數來指定 EMR 叢集的安全性組態。您可以將安全組態與 Amazon EMR 4.8.0 版或更新版本搭配使用。

aws emr create-cluster \ --instance-type m4.large \ --release-label emr-5.9.0 \ --security-configuration mySecurityConfiguration

範例 15:建立具有針對執行個體群組設定的其他 EBS 儲存磁碟區的叢集

指定其他 EBS 磁碟區時,必須使用下列引數:VolumeType(SizeInGB如果EbsBlockDeviceConfigs已指定)。

下列create-cluster範例會建立一個叢集,其中包含多個 EBS 磁碟區連接至核心執行個體群組中的 EC2 執行個體。

aws emr create-cluster \ --release-label emr-5.9.0 \ --use-default-roles \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=d2.xlarge 'InstanceGroupType=CORE,InstanceCount=2,InstanceType=d2.xlarge,EbsConfiguration={EbsOptimized=true,EbsBlockDeviceConfigs=[{VolumeSpecification={VolumeType=gp2,SizeInGB=100}},{VolumeSpecification={VolumeType=io1,SizeInGB=100,Iops=100},VolumesPerInstance=4}]}' \ --auto-terminate

下列範例會建立一個叢集,其中包含多個 EBS 磁碟區連接至主執行個體群組中的 EC2 執行個體。

aws emr create-cluster \ --release-label emr-5.9.0 \ --use-default-roles \ --instance-groups 'InstanceGroupType=MASTER, InstanceCount=1, InstanceType=d2.xlarge, EbsConfiguration={EbsOptimized=true, EbsBlockDeviceConfigs=[{VolumeSpecification={VolumeType=io1, SizeInGB=100, Iops=100}},{VolumeSpecification={VolumeType=standard,SizeInGB=50},VolumesPerInstance=3}]}' InstanceGroupType=CORE,InstanceCount=2,InstanceType=d2.xlarge \ --auto-terminate

範例 16:使用自動調整規模政策建立叢集

您可以使用 Amazon EMR 4.0 版及更新版本,將自動擴展政策附加到核心和任務執行個體群組。自動擴展政策會動態新增和移除 EC2 執行個體,以回應 Amazon CloudWatch 指標。如需詳細資訊,請參閱 Amazon EMR 管理指南中的在 Amazon EMR 中使用自動擴展功能 < https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-automatic-scaling.html>`_。

附加自動調度資源調度政策時,您還必須使用指定自動擴展的預設角色--auto-scaling-role EMR_AutoScaling_DefaultRole

下列create-cluster範例會使用含有內嵌 JSON 結構的引數來指定CORE執行個體群組的自動資源調度資源調度政策,該AutoScalingPolicy參數會指定資源調整原則組態。具有內嵌 JSON 結構的執行個體群組必須將整個引數集合用單引號括住。對於沒有內嵌 JSON 結構的執行個體群組而言,使用單引號是可選的。

aws emr create-cluster --release-label emr-5.9.0 \ --use-default-roles --auto-scaling-role EMR_AutoScaling_DefaultRole \ --instance-groups InstanceGroupType=MASTER,InstanceType=d2.xlarge,InstanceCount=1 'InstanceGroupType=CORE,InstanceType=d2.xlarge,InstanceCount=2,AutoScalingPolicy={Constraints={MinCapacity=1,MaxCapacity=5},Rules=[{Name=TestRule,Description=TestDescription,Action={Market=ON_DEMAND,SimpleScalingPolicyConfiguration={AdjustmentType=EXACT_CAPACITY,ScalingAdjustment=2}},Trigger={CloudWatchAlarmDefinition={ComparisonOperator=GREATER_THAN,EvaluationPeriods=5,MetricName=TestMetric,Namespace=EMR,Period=3,Statistic=MAXIMUM,Threshold=4.5,Unit=NONE,Dimensions=[{Key=TestKey,Value=TestValue}]}}}]}'

下列範例使用 JSON 檔案來指定叢集中所有執行個體群組的組態。instancegroupconfig.jsonJSON 檔案會指定核心執行個體群組的自動擴展政策設定。

aws emr create-cluster \ --release-label emr-5.9.0 \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \ --instance-groups file://myfolder/instancegroupconfig.json \ --auto-scaling-role EMR_AutoScaling_DefaultRole

instancegroupconfig.json 的內容:

[ { "InstanceCount": 1, "Name": "MyMasterIG", "InstanceGroupType": "MASTER", "InstanceType": "m4.large" }, { "InstanceCount": 2, "Name": "MyCoreIG", "InstanceGroupType": "CORE", "InstanceType": "m4.large", "AutoScalingPolicy": { "Constraints": { "MinCapacity": 2, "MaxCapacity": 10 }, "Rules": [ { "Name": "Default-scale-out", "Description": "Replicates the default scale-out rule in the console for YARN memory.", "Action": { "SimpleScalingPolicyConfiguration": { "AdjustmentType": "CHANGE_IN_CAPACITY", "ScalingAdjustment": 1, "CoolDown": 300 } }, "Trigger": { "CloudWatchAlarmDefinition": { "ComparisonOperator": "LESS_THAN", "EvaluationPeriods": 1, "MetricName": "YARNMemoryAvailablePercentage", "Namespace": "AWS/ElasticMapReduce", "Period": 300, "Threshold": 15, "Statistic": "AVERAGE", "Unit": "PERCENT", "Dimensions": [ { "Key": "JobFlowId", "Value": "${emr.clusterId}" } ] } } } ] } } ]

範例 17:建立叢集時新增自訂 JAR 步驟

下列create-cluster範例透過指定存放在 Amazon S3 中的 JAR 檔案來新增步驟。步驟將工作提交到叢集。JAR 檔案中定義的主要功能會在佈建 EC2 執行個體、執行任何引導動作以及安裝應用程式之後執行。這些步驟是使用指定的Type=CUSTOM_JAR

自訂 JAR 步驟需要Jar=參數,該參數會指定 JAR 的路徑和檔案名稱。可選參數為TypeNameActionOnFailureArgs、和MainClass。如果未指定主類別,JAR 檔案應該Main-Class在其資訊清單檔案中指定。

aws emr create-cluster \ --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://myBucket/mytest.jar,Args=arg1,arg2,arg3 Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://myBucket/mytest.jar,MainClass=mymainclass,Args=arg1,arg2,arg3 \ --release-label emr-5.3.1 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

範例 18:若要在建立叢集時新增串流步驟

下列create-cluster範例會將串流步驟新增至叢集,該步驟會在所有步驟執行後終止。串流步驟需要參數TypeArgs。串流步驟選擇性參數為NameActionOnFailure

下列範例會指定內嵌步驟。

aws emr create-cluster \ --steps Type=STREAMING,Name='Streaming Program',ActionOnFailure=CONTINUE,Args=[-files,s3://elasticmapreduce/samples/wordcount/wordSplitter.py,-mapper,wordSplitter.py,-reducer,aggregate,-input,s3://elasticmapreduce/samples/wordcount/input,-output,s3://mybucket/wordcount/output] \ --release-label emr-5.3.1 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

下列範例會使用名為的本機儲存 JSON 組態檔multiplefiles.json。JSON 組態會指定多個檔案。若要在一個步驟中指定多個檔案,您必須使用 JSON 組態檔來指定步驟。JSON 引數必須包含選項和值作為清單中自己的項目。

aws emr create-cluster \ --steps file://./multiplefiles.json \ --release-label emr-5.9.0 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

multiplefiles.json 的內容:

[ { "Name": "JSON Streaming Step", "Args": [ "-files", "s3://elasticmapreduce/samples/wordcount/wordSplitter.py", "-mapper", "wordSplitter.py", "-reducer", "aggregate", "-input", "s3://elasticmapreduce/samples/wordcount/input", "-output", "s3://mybucket/wordcount/output" ], "ActionOnFailure": "CONTINUE", "Type": "STREAMING" } ]

範例 19:若要在建立叢集時新增 Hive 步驟

下列範例會在建立叢集時新增 Hive 步驟。蜂巢步驟需要參數TypeArgs. 蜂巢步驟可選參數是NameActionOnFailure

aws emr create-cluster \ --steps Type=HIVE,Name='Hive program',ActionOnFailure=CONTINUE,ActionOnFailure=TERMINATE_CLUSTER,Args=[-f,s3://elasticmapreduce/samples/hive-ads/libs/model-build.q,-d,INPUT=s3://elasticmapreduce/samples/hive-ads/tables,-d,OUTPUT=s3://mybucket/hive-ads/output/2014-04-18/11-07-32,-d,LIBS=s3://elasticmapreduce/samples/hive-ads/libs] \ --applications Name=Hive \ --release-label emr-5.3.1 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large

範例 20:若要在建立叢集時新增 Pig 步驟

下列範例會在建立叢集時新增 Pig 步驟。豬步驟所需的參數是TypeArgs。豬步驟可選參數是NameActionOnFailure

aws emr create-cluster \ --steps Type=PIG,Name='Pig program',ActionOnFailure=CONTINUE,Args=[-f,s3://elasticmapreduce/samples/pig-apache/do-reports2.pig,-p,INPUT=s3://elasticmapreduce/samples/pig-apache/input,-p,OUTPUT=s3://mybucket/pig-apache/output] \ --applications Name=Pig \ --release-label emr-5.3.1 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large

範例 21:若要新增啟動程序動作

下列create-cluster範例會執行兩個啟動程序動作,定義為存放在 Amazon S3 中的指令碼。

aws emr create-cluster \ --bootstrap-actions Path=s3://mybucket/myscript1,Name=BootstrapAction1,Args=[arg1,arg2] Path=s3://mybucket/myscript2,Name=BootstrapAction2,Args=[arg1,arg2] \ --release-label emr-5.3.1 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large \ --auto-terminate

範例 22:啟用 EMRFS 一致檢視並自訂和設定 RetryCount RetryPeriod

下列create-cluster範例會指定 EMRFS 一致檢視的重試計數和重試期間。Consistent=true 是必要引數。

aws emr create-cluster \ --instance-type m4.large \ --release-label emr-5.9.0 \ --emrfs Consistent=true,RetryCount=6,RetryPeriod=30

下列範例會使用名為的本機儲存 JSON 組態檔,指定與前一個範例相同的 EMRFS 組態。emrfsconfig.json

aws emr create-cluster \ --instance-type m4.large \ --release-label emr-5.9.0 \ --emrfs file://emrfsconfig.json

emrfsconfig.json 的內容:

{ "Consistent": true, "RetryCount": 6, "RetryPeriod": 30 }

範例 23:若要建立已設定 Kerberos 的叢集

下列create-cluster範例使用已啟用 Kerberos 的安全性組態建立叢集,並使用建立叢集的 Kerberos 參數。--kerberos-attributes

下列命令會指定叢集內嵌的 Kerberos 屬性。

aws emr create-cluster \ --instance-type m3.xlarge \ --release-label emr-5.10.0 \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \ --security-configuration mySecurityConfiguration \ --kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=123,CrossRealmTrustPrincipalPassword=123

以下命令指定相同的屬性,但引用名為的本地存儲 JSON 文件kerberos_attributes.json。在此範例中,檔案會儲存在執行命令的相同目錄中。您也可以參考儲存在 Amazon S3 中的組態檔案。

aws emr create-cluster \ --instance-type m3.xlarge \ --release-label emr-5.10.0 \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \ --security-configuration mySecurityConfiguration \ --kerberos-attributes file://kerberos_attributes.json

kerberos_attributes.json 的內容:

{ "Realm": "EC2.INTERNAL", "KdcAdminPassword": "123", "CrossRealmTrustPrincipalPassword": "123", }

下列create-cluster範例會建立使用--instance-groups組態並具有受管擴展政策的 Amazon EMR 叢集。

aws emr create-cluster \ --release-label emr-5.30.0 \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large --managed-scaling-policy ComputeLimits='{MinimumCapacityUnits=2,MaximumCapacityUnits=4,UnitType=Instances}'

下列create-cluster範例會建立 Amazon EMR 叢集,該叢集使用「-log-encryption-kms-key--id」來定義用於日誌加密的 KMS 金鑰識別碼。

aws emr create-cluster \ --release-label emr-5.30.0 \ --log-uri s3://myBucket/myLog \ --log-encryption-kms-key-id arn:aws:kms:us-east-1:110302272565:key/dd559181-283e-45d7-99d1-66da348c4d33 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m4.large

下列create-cluster範例會建立 Amazon EMR 叢集,該叢集使用「-placement-group-configs」組態,使用放置策略將主節點放置在 EC2 置放群組內的高可用SPREAD性 (HA) 叢集中。

aws emr create-cluster \ --release-label emr-5.30.0 \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \ --instance-groups InstanceGroupType=MASTER,InstanceCount=3,InstanceType=m4.largeInstanceGroupType=CORE,InstanceCount=1,InstanceType=m4.large \ --placement-group-configs InstanceRole=MASTER

下列create-cluster範例會建立 Amazon EMR 叢集,該叢集使用「-auto-termination-policy」組態為叢集設定自動閒置終止閾值。

aws emr create-cluster \ --release-label emr-5.34.0 \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=1,InstanceType=m4.large \ --auto-termination-policy IdleTimeout=100

下列create-cluster範例會建立一個 Amazon EMR 叢集,該叢集使用「-os-release-label」定義用於叢集啟動的 Amazon Linux 版本

aws emr create-cluster \ --release-label emr-6.6.0 \ --os-release-label 2.0.20220406.1 \ --service-role EMR_DefaultRole \ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m4.large InstanceGroupType=CORE,InstanceCount=1,InstanceType=m4.large

範例 24:若要指定 EBS 根磁碟區屬性:使用 EMR 版本 6.15.0 及更新版本建立的叢集執行個體的大小、iops 和輸送量

下列create-cluster範例會建立 Amazon EMR 叢集,該叢集使用根磁碟區屬性為 EC2 執行個體設定根磁碟區規格。

aws emr create-cluster \ --name "Cluster with My Custom AMI" \ --custom-ami-id ami-a518e6df \ --ebs-root-volume-size 20 \ --ebs-root-volume-iops 3000 \ --ebs-root-volume-throughput 125 \ --release-label emr-6.15.0 \ --use-default-roles \ --instance-count 2 \ --instance-type m4.large

下列程式碼範例會示範如何使用create-default-roles

AWS CLI

1。若要為 EC2 建立預設的 IAM 角色

命令:

aws emr create-default-roles

輸出:

If the role already exists then the command returns nothing. If the role does not exist then the output will be: [ { "RolePolicy": { "Version": "2012-10-17", "Statement": [ { "Action": [ "cloudwatch:*", "dynamodb:*", "ec2:Describe*", "elasticmapreduce:Describe*", "elasticmapreduce:ListBootstrapActions", "elasticmapreduce:ListClusters", "elasticmapreduce:ListInstanceGroups", "elasticmapreduce:ListInstances", "elasticmapreduce:ListSteps", "kinesis:CreateStream", "kinesis:DeleteStream", "kinesis:DescribeStream", "kinesis:GetRecords", "kinesis:GetShardIterator", "kinesis:MergeShards", "kinesis:PutRecord", "kinesis:SplitShard", "rds:Describe*", "s3:*", "sdb:*", "sns:*", "sqs:*" ], "Resource": "*", "Effect": "Allow" } ] }, "Role": { "AssumeRolePolicyDocument": { "Version": "2008-10-17", "Statement": [ { "Action": "sts:AssumeRole", "Sid": "", "Effect": "Allow", "Principal": { "Service": "ec2.amazonaws.com" } } ] }, "RoleId": "AROAIQ5SIQUGL5KMYBJX6", "CreateDate": "2015-06-09T17:09:04.602Z", "RoleName": "EMR_EC2_DefaultRole", "Path": "/", "Arn": "arn:aws:iam::176430881729:role/EMR_EC2_DefaultRole" } }, { "RolePolicy": { "Version": "2012-10-17", "Statement": [ { "Action": [ "ec2:AuthorizeSecurityGroupIngress", "ec2:CancelSpotInstanceRequests", "ec2:CreateSecurityGroup", "ec2:CreateTags", "ec2:DeleteTags", "ec2:DescribeAvailabilityZones", "ec2:DescribeAccountAttributes", "ec2:DescribeInstances", "ec2:DescribeInstanceStatus", "ec2:DescribeKeyPairs", "ec2:DescribePrefixLists", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroups", "ec2:DescribeSpotInstanceRequests", "ec2:DescribeSpotPriceHistory", "ec2:DescribeSubnets", "ec2:DescribeVpcAttribute", "ec2:DescribeVpcEndpoints", "ec2:DescribeVpcEndpointServices", "ec2:DescribeVpcs", "ec2:ModifyImageAttribute", "ec2:ModifyInstanceAttribute", "ec2:RequestSpotInstances", "ec2:RunInstances", "ec2:TerminateInstances", "iam:GetRole", "iam:GetRolePolicy", "iam:ListInstanceProfiles", "iam:ListRolePolicies", "iam:PassRole", "s3:CreateBucket", "s3:Get*", "s3:List*", "sdb:BatchPutAttributes", "sdb:Select", "sqs:CreateQueue", "sqs:Delete*", "sqs:GetQueue*", "sqs:ReceiveMessage" ], "Resource": "*", "Effect": "Allow" } ] }, "Role": { "AssumeRolePolicyDocument": { "Version": "2008-10-17", "Statement": [ { "Action": "sts:AssumeRole", "Sid": "", "Effect": "Allow", "Principal": { "Service": "elasticmapreduce.amazonaws.com" } } ] }, "RoleId": "AROAI3SRVPPVSRDLARBPY", "CreateDate": "2015-06-09T17:09:10.401Z", "RoleName": "EMR_DefaultRole", "Path": "/", "Arn": "arn:aws:iam::176430881729:role/EMR_DefaultRole" } } ]

下列程式碼範例會示範如何使用create-security-configuration

AWS CLI

1。使用憑證提供者的 PEM 啟用傳輸中加密,以及使用 SSE-S3 進行 S3 加密和 AWS-KMS (適用於本機磁碟金鑰提供者) 建立安全組態

命令:

aws emr create-security-configuration --name MySecurityConfig --security-configuration '{ "EncryptionConfiguration": { "EnableInTransitEncryption" : true, "EnableAtRestEncryption" : true, "InTransitEncryptionConfiguration" : { "TLSCertificateConfiguration" : { "CertificateProviderType" : "PEM", "S3Object" : "s3://mycertstore/artifacts/MyCerts.zip" } }, "AtRestEncryptionConfiguration" : { "S3EncryptionConfiguration" : { "EncryptionMode" : "SSE-S3" }, "LocalDiskEncryptionConfiguration" : { "EncryptionKeyProviderType" : "AwsKms", "AwsKmsKey" : "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012" } } } }'

輸出:

{ "CreationDateTime": 1474070889.129, "Name": "MySecurityConfig" }

JSON 等效項目 (安全性組態 .json 的內容):

{ "EncryptionConfiguration": { "EnableInTransitEncryption": true, "EnableAtRestEncryption": true, "InTransitEncryptionConfiguration": { "TLSCertificateConfiguration": { "CertificateProviderType": "PEM", "S3Object": "s3://mycertstore/artifacts/MyCerts.zip" } }, "AtRestEncryptionConfiguration": { "S3EncryptionConfiguration": { "EncryptionMode": "SSE-S3" }, "LocalDiskEncryptionConfiguration": { "EncryptionKeyProviderType": "AwsKms", "AwsKmsKey": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012" } } } }

命令(使用安全性配置 .json):

aws emr create-security-configuration --name "MySecurityConfig" --security-configuration file://./security_configuration.json

輸出:

{ "CreationDateTime": 1474070889.129, "Name": "MySecurityConfig" }

2. 在使用叢集專用 KDC 和跨領域信任啟用 Kerberos 的情況下建立安全性組態

命令:

aws emr create-security-configuration --name MySecurityConfig --security-configuration '{ "AuthenticationConfiguration": { "KerberosConfiguration": { "Provider": "ClusterDedicatedKdc", "ClusterDedicatedKdcConfiguration": { "TicketLifetimeInHours": 24, "CrossRealmTrustConfiguration": { "Realm": "AD.DOMAIN.COM", "Domain": "ad.domain.com", "AdminServer": "ad.domain.com", "KdcServer": "ad.domain.com" } } } } }'

輸出:

{ "CreationDateTime": 1490225558.982, "Name": "MySecurityConfig" }

JSON 等效項目 (安全性組態 .json 的內容):

{ "AuthenticationConfiguration": { "KerberosConfiguration": { "Provider": "ClusterDedicatedKdc", "ClusterDedicatedKdcConfiguration": { "TicketLifetimeInHours": 24, "CrossRealmTrustConfiguration": { "Realm": "AD.DOMAIN.COM", "Domain": "ad.domain.com", "AdminServer": "ad.domain.com", "KdcServer": "ad.domain.com" } } } } }

命令(使用安全性配置 .json):

aws emr create-security-configuration --name "MySecurityConfig" --security-configuration file://./security_configuration.json

輸出:

{ "CreationDateTime": 1490225558.982, "Name": "MySecurityConfig" }

下列程式碼範例會示範如何使用delete-security-configuration

AWS CLI

若要刪除目前區域中的安全性組態

命令:

aws emr delete-security-configuration --name MySecurityConfig

輸出:

None

下列程式碼範例會示範如何使用describe-cluster

AWS CLI

命令:

aws emr describe-cluster --cluster-id j-XXXXXXXX

輸出:

For release-label based uniform instance groups cluster: { "Cluster": { "Status": { "Timeline": { "ReadyDateTime": 1436475075.199, "CreationDateTime": 1436474656.563, }, "State": "WAITING", "StateChangeReason": { "Message": "Waiting for steps to run" } }, "Ec2InstanceAttributes": { "ServiceAccessSecurityGroup": "sg-xxxxxxxx", "EmrManagedMasterSecurityGroup": "sg-xxxxxxxx", "IamInstanceProfile": "EMR_EC2_DefaultRole", "Ec2KeyName": "myKey", "Ec2AvailabilityZone": "us-east-1c", "EmrManagedSlaveSecurityGroup": "sg-yyyyyyyyy" }, "Name": "My Cluster", "ServiceRole": "EMR_DefaultRole", "Tags": [], "TerminationProtected": true, "UnhealthyNodeReplacement": true, "ReleaseLabel": "emr-4.0.0", "NormalizedInstanceHours": 96, "InstanceGroups": [ { "RequestedInstanceCount": 2, "Status": { "Timeline": { "ReadyDateTime": 1436475074.245, "CreationDateTime": 1436474656.564, "EndDateTime": 1436638158.387 }, "State": "RUNNING", "StateChangeReason": { "Message": "", } }, "Name": "CORE", "InstanceGroupType": "CORE", "Id": "ig-YYYYYYY", "Configurations": [], "InstanceType": "m3.large", "Market": "ON_DEMAND", "RunningInstanceCount": 2 }, { "RequestedInstanceCount": 1, "Status": { "Timeline": { "ReadyDateTime": 1436475074.245, "CreationDateTime": 1436474656.564, "EndDateTime": 1436638158.387 }, "State": "RUNNING", "StateChangeReason": { "Message": "", } }, "Name": "MASTER", "InstanceGroupType": "MASTER", "Id": "ig-XXXXXXXXX", "Configurations": [], "InstanceType": "m3.large", "Market": "ON_DEMAND", "RunningInstanceCount": 1 } ], "Applications": [ { "Name": "Hadoop" } ], "VisibleToAllUsers": true, "BootstrapActions": [], "MasterPublicDnsName": "ec2-54-147-144-78.compute-1.amazonaws.com", "AutoTerminate": false, "Id": "j-XXXXXXXX", "Configurations": [ { "Properties": { "fs.s3.consistent.retryPeriodSeconds": "20", "fs.s3.enableServerSideEncryption": "true", "fs.s3.consistent": "false", "fs.s3.consistent.retryCount": "2" }, "Classification": "emrfs-site" } ] } } For release-label based instance fleet cluster: { "Cluster": { "Status": { "Timeline": { "ReadyDateTime": 1487897289.705, "CreationDateTime": 1487896933.942 }, "State": "WAITING", "StateChangeReason": { "Message": "Waiting for steps to run" } }, "Ec2InstanceAttributes": { "EmrManagedMasterSecurityGroup": "sg-xxxxx", "RequestedEc2AvailabilityZones": [], "RequestedEc2SubnetIds": [], "IamInstanceProfile": "EMR_EC2_DefaultRole", "Ec2AvailabilityZone": "us-east-1a", "EmrManagedSlaveSecurityGroup": "sg-xxxxx" }, "Name": "My Cluster", "ServiceRole": "EMR_DefaultRole", "Tags": [], "TerminationProtected": false, "UnhealthyNodeReplacement": false, "ReleaseLabel": "emr-5.2.0", "NormalizedInstanceHours": 472, "InstanceCollectionType": "INSTANCE_FLEET", "InstanceFleets": [ { "Status": { "Timeline": { "ReadyDateTime": 1487897212.74, "CreationDateTime": 1487896933.948 }, "State": "RUNNING", "StateChangeReason": { "Message": "" } }, "ProvisionedSpotCapacity": 1, "Name": "MASTER", "InstanceFleetType": "MASTER", "LaunchSpecifications": { "SpotSpecification": { "TimeoutDurationMinutes": 60, "TimeoutAction": "TERMINATE_CLUSTER" } }, "TargetSpotCapacity": 1, "ProvisionedOnDemandCapacity": 0, "InstanceTypeSpecifications": [ { "BidPrice": "0.5", "InstanceType": "m3.xlarge", "WeightedCapacity": 1 } ], "Id": "if-xxxxxxx", "TargetOnDemandCapacity": 0 } ], "Applications": [ { "Version": "2.7.3", "Name": "Hadoop" } ], "ScaleDownBehavior": "TERMINATE_AT_INSTANCE_HOUR", "VisibleToAllUsers": true, "BootstrapActions": [], "MasterPublicDnsName": "ec2-xxx-xx-xxx-xx.compute-1.amazonaws.com", "AutoTerminate": false, "Id": "j-xxxxx", "Configurations": [] } } For ami based uniform instance group cluster: { "Cluster": { "Status": { "Timeline": { "ReadyDateTime": 1399400564.432, "CreationDateTime": 1399400268.62 }, "State": "WAITING", "StateChangeReason": { "Message": "Waiting for steps to run" } }, "Ec2InstanceAttributes": { "IamInstanceProfile": "EMR_EC2_DefaultRole", "Ec2AvailabilityZone": "us-east-1c" }, "Name": "My Cluster", "Tags": [], "TerminationProtected": true, "UnhealthyNodeReplacement": true, "RunningAmiVersion": "2.5.4", "InstanceGroups": [ { "RequestedInstanceCount": 1, "Status": { "Timeline": { "ReadyDateTime": 1399400558.848, "CreationDateTime": 1399400268.621 }, "State": "RUNNING", "StateChangeReason": { "Message": "" } }, "Name": "Master instance group", "InstanceGroupType": "MASTER", "InstanceType": "m1.small", "Id": "ig-ABCD", "Market": "ON_DEMAND", "RunningInstanceCount": 1 }, { "RequestedInstanceCount": 2, "Status": { "Timeline": { "ReadyDateTime": 1399400564.439, "CreationDateTime": 1399400268.621 }, "State": "RUNNING", "StateChangeReason": { "Message": "" } }, "Name": "Core instance group", "InstanceGroupType": "CORE", "InstanceType": "m1.small", "Id": "ig-DEF", "Market": "ON_DEMAND", "RunningInstanceCount": 2 } ], "Applications": [ { "Version": "1.0.3", "Name": "hadoop" } ], "BootstrapActions": [], "VisibleToAllUsers": false, "RequestedAmiVersion": "2.4.2", "LogUri": "s3://myLogUri/", "AutoTerminate": false, "Id": "j-XXXXXXXX" } }
  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考DescribeCluster中的。

下列程式碼範例會示範如何使用describe-step

AWS CLI

以下命令描述了具有集群 ID 的集群s-3LZC0QUT43AM中的步驟 ID 的步驟j-3SD91U2E1L2QX

aws emr describe-step --cluster-id j-3SD91U2E1L2QX --step-id s-3LZC0QUT43AM

輸出:

{ "Step": { "Status": { "Timeline": { "EndDateTime": 1433200470.481, "CreationDateTime": 1433199926.597, "StartDateTime": 1433200404.959 }, "State": "COMPLETED", "StateChangeReason": {} }, "Config": { "Args": [ "s3://us-west-2.elasticmapreduce/libs/hive/hive-script", "--base-path", "s3://us-west-2.elasticmapreduce/libs/hive/", "--install-hive", "--hive-versions", "0.13.1" ], "Jar": "s3://us-west-2.elasticmapreduce/libs/script-runner/script-runner.jar", "Properties": {} }, "Id": "s-3LZC0QUT43AM", "ActionOnFailure": "TERMINATE_CLUSTER", "Name": "Setup hive" } }
  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考DescribeStep中的。

下列程式碼範例會示範如何使用get

AWS CLI

下列步驟會從具有叢集 ID 的叢集中的主要執行個體下載hadoop-examples.jar歸檔j-3SD91U2E1L2QX

aws emr get --cluster-id j-3SD91U2E1L2QX --key-pair-file ~/.ssh/mykey.pem --src /home/hadoop-examples.jar --dest ~
  • 如需 API 詳細資訊,請參閱取得AWS CLI命令參考

下列程式碼範例會示範如何使用list-clusters

AWS CLI

下列命令會列出目前區域中所有作用中的 EMR 叢集:

aws emr list-clusters --active

輸出:

{ "Clusters": [ { "Status": { "Timeline": { "ReadyDateTime": 1433200405.353, "CreationDateTime": 1433199926.596 }, "State": "WAITING", "StateChangeReason": { "Message": "Waiting after step completed" } }, "NormalizedInstanceHours": 6, "Id": "j-3SD91U2E1L2QX", "Name": "my-cluster" } ] }
  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考ListClusters中的。

下列程式碼範例會示範如何使用list-instance-fleets

AWS CLI

取得叢集中執行個體叢集的組態詳細資料

此範例會列出指定叢集中執行個體叢集的詳細資訊。

命令:

list-instance-fleets --cluster-id 'j-12ABCDEFGHI34JK'

輸出:

{ "InstanceFleets": [ { "Status": { "Timeline": { "ReadyDateTime": 1488759094.637, "CreationDateTime": 1488758719.817 }, "State": "RUNNING", "StateChangeReason": { "Message": "" } }, "ProvisionedSpotCapacity": 6, "Name": "CORE", "InstanceFleetType": "CORE", "LaunchSpecifications": { "SpotSpecification": { "TimeoutDurationMinutes": 60, "TimeoutAction": "TERMINATE_CLUSTER" } }, "ProvisionedOnDemandCapacity": 2, "InstanceTypeSpecifications": [ { "BidPrice": "0.5", "InstanceType": "m3.xlarge", "WeightedCapacity": 2 } ], "Id": "if-1ABC2DEFGHIJ3" }, { "Status": { "Timeline": { "ReadyDateTime": 1488759058.598, "CreationDateTime": 1488758719.811 }, "State": "RUNNING", "StateChangeReason": { "Message": "" } }, "ProvisionedSpotCapacity": 0, "Name": "MASTER", "InstanceFleetType": "MASTER", "ProvisionedOnDemandCapacity": 1, "InstanceTypeSpecifications": [ { "BidPriceAsPercentageOfOnDemandPrice": 100.0, "InstanceType": "m3.xlarge", "WeightedCapacity": 1 } ], "Id": "if-2ABC4DEFGHIJ4" } ] }

下列程式碼範例會示範如何使用list-instances

AWS CLI

以下命令列出具有叢集 ID 的叢集中的所有執行個體j-3C6XNQ39VR9WL

aws emr list-instances --cluster-id j-3C6XNQ39VR9WL

輸出:

For a uniform instance group based cluster { "Instances": [ { "Status": { "Timeline": { "ReadyDateTime": 1433200400.03, "CreationDateTime": 1433199960.152 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-f19ecfee", "PublicDnsName": "ec2-52-52-41-150.us-west-2.compute.amazonaws.com", "PrivateDnsName": "ip-172-21-11-216.us-west-2.compute.internal", "PublicIpAddress": "52.52.41.150", "Id": "ci-3NNHQUQ2TWB6Y", "PrivateIpAddress": "172.21.11.216" }, { "Status": { "Timeline": { "ReadyDateTime": 1433200400.031, "CreationDateTime": 1433199949.102 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-1feee4c2", "PublicDnsName": "ec2-52-63-246-32.us-west-2.compute.amazonaws.com", "PrivateDnsName": "ip-172-31-24-130.us-west-2.compute.internal", "PublicIpAddress": "52.63.246.32", "Id": "ci-GAOCMKNKDCV7", "PrivateIpAddress": "172.21.11.215" }, { "Status": { "Timeline": { "ReadyDateTime": 1433200400.031, "CreationDateTime": 1433199949.102 }, "State": "RUNNING", "StateChangeReason": {} }, "Ec2InstanceId": "i-15cfeee3", "PublicDnsName": "ec2-52-25-246-63.us-west-2.compute.amazonaws.com", "PrivateDnsName": "ip-172-31-24-129.us-west-2.compute.internal", "PublicIpAddress": "52.25.246.63", "Id": "ci-2W3TDFFB47UAD", "PrivateIpAddress": "172.21.11.214" } ] } For a fleet based cluster: { "Instances": [ { "Status": { "Timeline": { "ReadyDateTime": 1487810810.878, "CreationDateTime": 1487810588.367, "EndDateTime": 1488022990.924 }, "State": "TERMINATED", "StateChangeReason": { "Message": "Instance was terminated." } }, "Ec2InstanceId": "i-xxxxx", "InstanceFleetId": "if-xxxxx", "EbsVolumes": [], "PublicDnsName": "ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com", "InstanceType": "m3.xlarge", "PrivateDnsName": "ip-xx-xx-xxx-xx.ec2.internal", "Market": "SPOT", "PublicIpAddress": "xx.xx.xxx.xxx", "Id": "ci-xxxxx", "PrivateIpAddress": "10.47.191.80" } ] }
  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考ListInstances中的。

下列程式碼範例會示範如何使用list-security-configurations

AWS CLI

列出目前區域中的安全性組態

命令:

aws emr list-security-configurations

輸出:

{ "SecurityConfigurations": [ { "CreationDateTime": 1473889697.417, "Name": "MySecurityConfig-1" }, { "CreationDateTime": 1473889697.417, "Name": "MySecurityConfig-2" } ] }

下列程式碼範例會示範如何使用list-steps

AWS CLI

以下命令列出了具有集群 ID 的集群中的所有步驟j-3SD91U2E1L2QX

aws emr list-steps --cluster-id j-3SD91U2E1L2QX
  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考ListSteps中的。

下列程式碼範例會示範如何使用modify-cluster-attributes

AWS CLI

下列命令會將具有 ID 的 EMR 叢集的可見度設定j-301CDNY0J5XM4給所有使用者:

aws emr modify-cluster-attributes --cluster-id j-301CDNY0J5XM4 --visible-to-all-users

下列程式碼範例會示範如何使用modify-instance-fleet

AWS CLI

變更執行個體叢集的目標容量

此範例會針對指定的執行個體叢集,將隨需和 Spot 目標容量變更為 1。

命令:

aws emr modify-instance-fleet --cluster-id 'j-12ABCDEFGHI34JK' --instance-fleet InstanceFleetId='if-2ABC4DEFGHIJ4',TargetOnDemandCapacity=1,TargetSpotCapacity=1

下列程式碼範例會示範如何使用put

AWS CLI

下列指令會healthcheck.sh將名為的檔案上傳至叢集中具有叢集 ID 的主要執行個體j-3SD91U2E1L2QX

aws emr put --cluster-id j-3SD91U2E1L2QX --key-pair-file ~/.ssh/mykey.pem --src ~/scripts/healthcheck.sh --dest /home/hadoop/bin/healthcheck.sh
  • 有關 API 的詳細信息,請參閱放入AWS CLI命令參考

下列程式碼範例會示範如何使用remove-tags

AWS CLI

以下命令從具有集群 ID 的集群prod中刪除帶有密鑰的標籤j-3SD91U2E1L2QX

aws emr remove-tags --resource-id j-3SD91U2E1L2QX --tag-keys prod
  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考RemoveTags中的。

下列程式碼範例會示範如何使用schedule-hbase-backup

AWS CLI

注意:此命令只能與 AMI 版本 2.x 和 3.x 版本上的 HBase 一起使用

1。要安排一個完整的 HBase 備份

命令:

aws emr schedule-hbase-backup --cluster-id j-XXXXXXYY --type full --dir s3://myBucket/backup --interval 10 --unit hours --start-time 2014-04-21T05:26:10Z --consistent

輸出:

None

2. 若要排程增量 HBase 備份

命令:

aws emr schedule-hbase-backup --cluster-id j-XXXXXXYY --type incremental --dir s3://myBucket/backup --interval 30 --unit minutes --start-time 2014-04-21T05:26:10Z --consistent

輸出:

None

下列程式碼範例會示範如何使用socks

AWS CLI

下列指令會在叢集 ID 的叢集中開啟與主執行個體的 socks 連線j-3SD91U2E1L2QX

aws emr socks --cluster-id j-3SD91U2E1L2QX --key-pair-file ~/.ssh/mykey.pem

key pair 檔案選項會取得私密金鑰檔案的本機路徑。

  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考中的襪子

下列程式碼範例會示範如何使用ssh

AWS CLI

下列指令會開啟與叢集 ID 之叢集中主執行個體的 ssh 連線j-3SD91U2E1L2QX

aws emr ssh --cluster-id j-3SD91U2E1L2QX --key-pair-file ~/.ssh/mykey.pem

key pair 檔案選項會取得私密金鑰檔案的本機路徑。

輸出:

ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=10 -i /home/local/user/.ssh/mykey.pem hadoop@ec2-52-52-41-150.us-west-2.compute.amazonaws.com Warning: Permanently added 'ec2-52-52-41-150.us-west-2.compute.amazonaws.com,52.52.41.150' (ECDSA) to the list of known hosts. Last login: Mon Jun 1 23:15:38 2015 __| __|_ ) _| ( / Amazon Linux AMI ___|\___|___| https://aws.amazon.com/amazon-linux-ami/2015.03-release-notes/ 26 package(s) needed for security, out of 39 available Run "sudo yum update" to apply all updates. -------------------------------------------------------------------------------- Welcome to Amazon Elastic MapReduce running Hadoop and Amazon Linux. Hadoop is installed in /home/hadoop. Log files are in /mnt/var/log/hadoop. Check /mnt/var/log/hadoop/steps for diagnosing step failures. The Hadoop UI can be accessed via the following commands: ResourceManager lynx http://ip-172-21-11-216:9026/ NameNode lynx http://ip-172-21-11-216:9101/ -------------------------------------------------------------------------------- [hadoop@ip-172-31-16-216 ~]$
  • 如需 API 詳細資訊,請參閱AWS CLI 命令參考中的 SSH