本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
使用作業提交器分類
概要
Amazon EMR on EKS StartJobRun
請求會建立作業提交器 Pod (也稱為 job-runner Pod) 以產生 Spark 驅動程式。可以使用 emr-job-submitter
分類為作業提交器 Pod 設定節點選取器。
emr-job-submitter
分類下提供下列設定:
jobsubmitter.node.selector.[
labelKey
]-
新增至作業提交器 Pod 的節點選取器,使用索引鍵
和值作為組態的值。例如,可將labelKey
jobsubmitter.node.selector.identifier
設定為myIdentifier
,且作業提交器 Pod 將擁有一個節點選取器,它具有myIdentifier
的索引鍵識別符值。要新增多個節點選取器索引鍵,請使用此字首設定多個組態。
作為最佳實務,建議作業提交器 Pod 將節點放置在隨需執行個體上,而非 Spot 執行個體上。這是因為如果作業提交器 Pod 遭受 Spot 執行個體中斷,則作業將會失敗。也可以將作業提交器 Pod 置於單一可用區域中,或使用套用至節點的任何 Kubernetes 標籤。
作業提交器分類範例
本節內容
適用於作業提交器 Pod 的具有隨需節點放置的 StartJobRun
請求
cat >spark-python-in-s3-nodeselector-job-submitter.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "
virtual-cluster-id
", "executionRoleArn": "execution-role-arn
", "releaseLabel": "emr-6.11.0-latest
", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix
/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.eks.amazonaws.com/capacityType": "ON_DEMAND" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter.json
適用於作業提交器 Pod 的具有單一可用區域節點放置的 StartJobRun
請求
cat >spark-python-in-s3-nodeselector-job-submitter-az.json << EOF { "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "
virtual-cluster-id
", "executionRoleArn": "execution-role-arn
", "releaseLabel": "emr-6.11.0-latest
", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix
/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false" } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone
" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } } EOF aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter-az.json
適用於作業提交器 Pod 的具有單一可用區域和 Amazon EC2 執行個體類型放置的 StartJobRun
請求
{ "name": "spark-python-in-s3-nodeselector", "virtualClusterId": "
virtual-cluster-id
", "executionRoleArn": "execution-role-arn
", "releaseLabel": "emr-6.11.0-latest
", "jobDriver": { "sparkSubmitJobDriver": { "entryPoint": "s3://S3-prefix
/trip-count.py", "sparkSubmitParameters": "--conf spark.driver.cores=5 --conf spark.kubernetes.pyspark.pythonVersion=3 --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6 --conf spark.sql.shuffle.partitions=1000" } }, "configurationOverrides": { "applicationConfiguration": [ { "classification": "spark-defaults", "properties": { "spark.dynamicAllocation.enabled":"false", } }, { "classification": "emr-job-submitter", "properties": { "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone
", "jobsubmitter.node.selector.node.kubernetes.io/instance-type":"m5.4xlarge
" } } ], "monitoringConfiguration": { "cloudWatchMonitoringConfiguration": { "logGroupName": "/emr-containers/jobs", "logStreamNamePrefix": "demo" }, "s3MonitoringConfiguration": { "logUri": "s3://joblogs" } } } }