概觀建立叢集安裝 YuniKorn 提交：Spark Operator 提交：spark-submit

使用 YuniKorn 作為 Amazon EMR on EKS 上 Apache Spark 的自訂排程器

透過 Amazon EMR on EKS，可以將 Spark 運算子或 spark-submit 與 Kubernetes 自訂排程器搭配使用，以執行 Spark 作業。本教學課程介紹了如何在自訂佇列上使用 YuniKorn 排程器和群排程來執行 Spark 作業。

概觀

Apache YuniKorn 可以透過應用程式感知排程來協助管理 Spark 排程，讓您可以對資源配額和優先順序進行精細控制。透過群排程，YuniKorn 僅在滿足應用程式的最小資源請求時才會對應用程式進行排程。如需詳細資訊，請參閱 Apache YuniKorn 文件網站中的什麼是在群排程。

建立您的叢集並設定 YuniKorn

使用下列步驟來部署 Amazon EKS 叢集。可以變更 AWS 區域 (region) 和可用區域 (availabilityZones)。

定義 Amazon EKS 叢集：


cat <<EOF >eks-cluster.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: emr-eks-cluster
  region: eu-west-1

vpc:
  clusterEndpoints:
    publicAccess: true
    privateAccess: true

iam:
  withOIDC: true
  
nodeGroups:
  - name: spark-jobs
    labels: { app: spark }
    instanceType: m5.xlarge
    desiredCapacity: 2
    minSize: 2
    maxSize: 3
    availabilityZones: ["eu-west-1a"]
EOF

建立叢集：


eksctl create cluster -f eks-cluster.yaml

建立您將在其中執行 Spark 作業的命名空間 spark-job：
```
kubectl create namespace spark-job
```

接下來，建立 Kubernetes 角色和角色連結。這是 Spark 作業執行使用的服務帳戶所必需的。

定義 Spark 作業的服務帳戶、角色和角色連結。


cat <<EOF >emr-job-execution-rbac.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark-sa
  namespace: spark-job
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: spark-role
  namespace: spark-job
rules:
  - apiGroups: ["", "batch","extensions"]
    resources: ["configmaps","serviceaccounts","events","pods","pods/exec","pods/log","pods/portforward","secrets","services","persistentvolumeclaims"]
    verbs: ["create","delete","get","list","patch","update","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: spark-sa-rb
  namespace: spark-job
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: spark-role
subjects:
  - kind: ServiceAccount
    name: spark-sa
    namespace: spark-job
EOF

使用下列命令套用 Kubernetes 角色和角色連結定義：
```
kubectl apply -f emr-job-execution-rbac.yaml
```

安裝與設定 YuniKorn

使用下列 kubectl 命令建立命名空間 yunikorn，以部署 Yunikorn 排程器：
```
kubectl create namespace yunikorn
```

若要安裝排程器，請執行下列 Helm 命令：


helm repo add yunikorn https://apache.github.io/yunikorn-release


helm repo update


helm install yunikorn yunikorn/yunikorn --namespace yunikorn

使用 YuniKorn 排程器和 Spark Operator 來執行 Spark 應用程式

如果您尚未完成，請先完成下節中的步驟進行設定：
1. 建立您的叢集並設定 YuniKorn
2. 安裝與設定 YuniKorn
3. 為 Amazon EMR on EKS 設定 Spark Operator
4. 安裝 Spark Operator
  
  當您執行 helm install spark-operator-demo 命令時，請包含下列引數：
```
--set batchScheduler.enable=true 
--set webhook.enable=true
```

建立 SparkApplication 定義檔案 spark-pi.yaml。

若要使用 YuniKorn 作為作業的排程器，必須將某些註釋和標籤新增至應用程式定義。註釋和標籤會指定作業的佇列，以及您要使用的排程策略。

在下列範例中，註釋 schedulingPolicyParameters 會設定應用程式的群排程。然後，此範例會建立作業群組或「成群」作業，以指定在排程 Pod 開始作業執行之前必須具備的最小容量。最後，它會在任務群組定義中進行指定，以使用帶有 "app": "spark" 標籤的節點群組，如建立您的叢集並設定 YuniKorn 區段中所定義。


apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark-job
spec:
  type: Scala
  mode: cluster
  image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar"
  sparkVersion: "3.3.1"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.3.1
    annotations:
      yunikorn.apache.org/schedulingPolicyParameters: "placeholderTimeoutSeconds=30 gangSchedulingStyle=Hard"
      yunikorn.apache.org/task-group-name: "spark-driver"
      yunikorn.apache.org/task-groups: |-
        [{
            "name": "spark-driver",
            "minMember": 1,
            "minResource": {
              "cpu": "1200m",
              "memory": "1Gi"
            },
            "nodeSelector": {
              "app": "spark"
            }
          },
          {
            "name": "spark-executor",
            "minMember": 1,
            "minResource": {
              "cpu": "1200m",
              "memory": "1Gi"
            },
            "nodeSelector": {
              "app": "spark"
            }
        }]
    serviceAccount: spark-sa
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.3.1
    annotations:
      yunikorn.apache.org/task-group-name: "spark-executor"
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

使用下列命令提交 Spark 應用程式。這也會建立 SparkApplication 物件，名為 spark-pi：
```
kubectl apply -f spark-pi.yaml
```

使用下列命令檢查 SparkApplication 物件的事件：


kubectl describe sparkapplication spark-pi --namespace spark-job

第一個 Pod 事件顯示 YuniKorn 已排程 Pod：

Type    Reason            Age   From                          Message
----    ------            ----  ----                          -------
Normal Scheduling        3m12s yunikorn   spark-operator/org-apache-spark-examples-sparkpi-2a777a88b98b8a95-driver is queued and waiting for allocation
Normal GangScheduling    3m12s yunikorn   Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member
Normal Scheduled         3m10s yunikorn   Successfully assigned spark
Normal PodBindSuccessful 3m10s yunikorn   Pod spark-operator/
Normal TaskCompleted     2m3s  yunikorn   Task spark-operator/
Normal Pulling           3m10s kubelet    Pulling

使用 YuniKorn 排程器和 `spark-submit` 來執行 Spark 應用程式

首先，請先完成為 Amazon EMR on EKS 設定 spark-submit 一節中的步驟。

設定以下環境變數的值：


export SPARK_HOME=spark-home
export MASTER_URL=k8s://Amazon-EKS-cluster-endpoint

使用下列命令提交 Spark 應用程式：


$SPARK_HOME/bin/spark-submit \
 --class org.apache.spark.examples.SparkPi \
 --master $MASTER_URL \
 --conf spark.kubernetes.container.image=895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest \
 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
 --deploy-mode cluster \
 --conf spark.kubernetes.namespace=spark-job \
 --conf spark.kubernetes.scheduler.name=yunikorn \
 --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters="placeholderTimeoutSeconds=30 gangSchedulingStyle=Hard" \
 --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name="spark-driver" \
 --conf spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name="spark-executor" \
 --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{
            "name": "spark-driver",
            "minMember": 1,
            "minResource": {
              "cpu": "1200m",
              "memory": "1Gi"
            },
            "nodeSelector": {
              "app": "spark"
            }
          },
          {
            "name": "spark-executor",
            "minMember": 1,
            "minResource": {
              "cpu": "1200m",
              "memory": "1Gi"
            },
            "nodeSelector": {
              "app": "spark"
            }
        }]' \
 local:///usr/lib/spark/examples/jars/spark-examples.jar 20

使用下列命令檢查 SparkApplication 物件的事件：


kubectl describe pod spark-driver-pod --namespace spark-job

第一個 Pod 事件顯示 YuniKorn 已排程 Pod：

Type    Reason           Age   From                          Message
----    ------           ----  ----                          -------
Normal Scheduling        3m12s yunikorn   spark-operator/org-apache-spark-examples-sparkpi-2a777a88b98b8a95-driver is queued and waiting for allocation
Normal GangScheduling    3m12s yunikorn   Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member
Normal Scheduled         3m10s yunikorn   Successfully assigned spark
Normal PodBindSuccessful 3m10s yunikorn   Pod spark-operator/
Normal TaskCompleted     2m3s  yunikorn   Task spark-operator/
Normal Pulling           3m10s kubelet    Pulling

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

使用 Volcano

安全

使用 YuniKorn 作為 Amazon EMR on EKS 上 Apache Spark 的自訂排程器

概觀

建立您的叢集並設定 YuniKorn

安裝與設定 YuniKorn

使用 YuniKorn 排程器和 Spark Operator 來執行 Spark 應用程式

使用 YuniKorn 排程器和 spark-submit 來執行 Spark 應用程式

使用 YuniKorn 排程器和 `spark-submit` 來執行 Spark 應用程式