使用 YuniKorn 作為 Amazon EMR on EKS 上 Apache Spark 的自訂排程器 - Amazon EMR

本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。

使用 YuniKorn 作為 Amazon EMR on EKS 上 Apache Spark 的自訂排程器

透過 Amazon EMR on EKS,可以將 Spark 運算子或 spark-submit 與 Kubernetes 自訂排程器搭配使用,以執行 Spark 作業。本教學課程介紹了如何在自訂佇列上使用 YuniKorn 排程器和群排程來執行 Spark 作業。

概要

Apache YuniKorn 可以透過應用程式感知排程來協助管理 Spark 排程,讓您可以對資源配額和優先順序進行精細控制。透過群排程,YuniKorn 僅在滿足應用程式的最小資源請求時才會對應用程式進行排程。如需詳細資訊,請參閱 Apache YuniKorn 文件網站中的什麼是在群排程

建立您的叢集並設定 YuniKorn

使用下列步驟來部署 Amazon EKS 叢集。可以變更 AWS 區域 (region) 和可用區域 (availabilityZones)。

  1. 定義 Amazon EKS 叢集:

    cat <<EOF >eks-cluster.yaml --- apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: emr-eks-cluster region: eu-west-1 vpc: clusterEndpoints: publicAccess: true privateAccess: true iam: withOIDC: true nodeGroups: - name: spark-jobs labels: { app: spark } instanceType: m5.xlarge desiredCapacity: 2 minSize: 2 maxSize: 3 availabilityZones: ["eu-west-1a"] EOF
  2. 建立叢集:

    eksctl create cluster -f eks-cluster.yaml
  3. 建立您將在其中執行 Spark 作業的命名空間 spark-job

    kubectl create namespace spark-job
  4. 接下來,建立 Kubernetes 角色和角色連結。這是 Spark 作業執行使用的服務帳戶所必需的。

    1. 定義 Spark 作業的服務帳戶、角色和角色連結。

      cat <<EOF >emr-job-execution-rbac.yaml --- apiVersion: v1 kind: ServiceAccount metadata: name: spark-sa namespace: spark-job automountServiceAccountToken: false --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: spark-role namespace: spark-job rules: - apiGroups: ["", "batch","extensions"] resources: ["configmaps","serviceaccounts","events","pods","pods/exec","pods/log","pods/portforward","secrets","services","persistentvolumeclaims"] verbs: ["create","delete","get","list","patch","update","watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: spark-sa-rb namespace: spark-job roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: spark-role subjects: - kind: ServiceAccount name: spark-sa namespace: spark-job EOF
    2. 使用下列命令套用 Kubernetes 角色和角色連結定義:

      kubectl apply -f emr-job-execution-rbac.yaml

安裝與設定 YuniKorn

  1. 使用下列 kubectl 命令建立命名空間 yunikorn,以部署 Yunikorn 排程器:

    kubectl create namespace yunikorn
  2. 若要安裝排程器,請執行下列 Helm 命令:

    helm repo add yunikorn https://apache.github.io/yunikorn-release
    helm repo update
    helm install yunikorn yunikorn/yunikorn --namespace yunikorn

使用 YuniKorn 排程器和 Spark Operator 來執行 Spark 應用程式

  1. 如果您尚未完成,請先完成下節中的步驟進行設定:

    1. 建立您的叢集並設定 YuniKorn

    2. 安裝與設定 YuniKorn

    3. 建立 Amazon EMR 的星火運營商 EKS

    4. 安裝 Spark Operator

      當您執行 helm install spark-operator-demo 命令時,請包含下列引數:

      --set batchScheduler.enable=true --set webhook.enable=true
  2. 建立 SparkApplication 定義檔案 spark-pi.yaml

    若要使用 YuniKorn 作為作業的排程器,必須將某些註釋和標籤新增至應用程式定義。註釋和標籤會指定作業的佇列,以及您要使用的排程策略。

    在下列範例中,註釋 schedulingPolicyParameters 會設定應用程式的群排程。然後,此範例會建立作業群組或「成群」作業,以指定在排程 Pod 開始作業執行之前必須具備的最小容量。最後,它會在任務群組定義中進行指定,以使用帶有 "app": "spark" 標籤的節點群組,如 建立您的叢集並設定 YuniKorn 區段中所定義。

    apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: spark-pi namespace: spark-job spec: type: Scala mode: cluster image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest" imagePullPolicy: Always mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar" sparkVersion: "3.3.1" restartPolicy: type: Never volumes: - name: "test-volume" hostPath: path: "/tmp" type: Directory driver: cores: 1 coreLimit: "1200m" memory: "512m" labels: version: 3.3.1 annotations: yunikorn.apache.org/schedulingPolicyParameters: "placeholderTimeoutSeconds=30 gangSchedulingStyle=Hard" yunikorn.apache.org/task-group-name: "spark-driver" yunikorn.apache.org/task-groups: |- [{ "name": "spark-driver", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }, { "name": "spark-executor", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }] serviceAccount: spark-sa volumeMounts: - name: "test-volume" mountPath: "/tmp" executor: cores: 1 instances: 1 memory: "512m" labels: version: 3.3.1 annotations: yunikorn.apache.org/task-group-name: "spark-executor" volumeMounts: - name: "test-volume" mountPath: "/tmp"
  3. 使用下列命令提交 Spark 應用程式。這也會建立 SparkApplication 物件,名為 spark-pi

    kubectl apply -f spark-pi.yaml
  4. 使用下列命令檢查 SparkApplication 物件的事件:

    kubectl describe sparkapplication spark-pi --namespace spark-job

    第一個 Pod 事件顯示 YuniKorn 已排程 Pod:

    Type    Reason            Age   From                          Message
    ----    ------            ----  ----                          -------
    Normal Scheduling        3m12s yunikorn   spark-operator/org-apache-spark-examples-sparkpi-2a777a88b98b8a95-driver is queued and waiting for allocation
    Normal GangScheduling    3m12s yunikorn   Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member
    Normal Scheduled         3m10s yunikorn   Successfully assigned spark
    Normal PodBindSuccessful 3m10s yunikorn   Pod spark-operator/
    Normal TaskCompleted     2m3s  yunikorn   Task spark-operator/
    Normal Pulling           3m10s kubelet    Pulling

使用 YuniKorn 排程器和 spark-submit 來執行 Spark 應用程式

  1. 首先,請先完成 在上為 Amazon 設置火花提交 EMR EKS 一節中的步驟。

  2. 設定以下環境變數的值:

    export SPARK_HOME=spark-home export MASTER_URL=k8s://Amazon-EKS-cluster-endpoint
  3. 使用下列命令提交 Spark 應用程式:

    在下列範例中,註釋 schedulingPolicyParameters 會設定應用程式的群排程。然後,此範例會建立作業群組或「成群」作業,以指定在排程 Pod 開始作業執行之前必須具備的最小容量。最後,它會在任務群組定義中進行指定,以使用帶有 "app": "spark" 標籤的節點群組,如 建立您的叢集並設定 YuniKorn 區段中所定義。

    $SPARK_HOME/bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master $MASTER_URL \ --conf spark.kubernetes.container.image=895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \ --deploy-mode cluster \ --conf spark.kubernetes.namespace=spark-job \ --conf spark.kubernetes.scheduler.name=yunikorn \ --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters="placeholderTimeoutSeconds=30 gangSchedulingStyle=Hard" \ --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name="spark-driver" \ --conf spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name="spark-executor" \ --conf spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups='[{ "name": "spark-driver", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }, { "name": "spark-executor", "minMember": 1, "minResource": { "cpu": "1200m", "memory": "1Gi" }, "nodeSelector": { "app": "spark" } }]' \ local:///usr/lib/spark/examples/jars/spark-examples.jar 20
  4. 使用下列命令檢查 SparkApplication 物件的事件:

    kubectl describe pod spark-driver-pod --namespace spark-job

    第一個 Pod 事件顯示 YuniKorn 已排程 Pod:

    Type    Reason           Age   From                          Message
    ----    ------           ----  ----                          -------
    Normal Scheduling        3m12s yunikorn   spark-operator/org-apache-spark-examples-sparkpi-2a777a88b98b8a95-driver is queued and waiting for allocation
    Normal GangScheduling    3m12s yunikorn   Pod belongs to the taskGroup spark-driver, it will be scheduled as a gang member
    Normal Scheduled         3m10s yunikorn   Successfully assigned spark
    Normal PodBindSuccessful 3m10s yunikorn   Pod spark-operator/
    Normal TaskCompleted     2m3s  yunikorn   Task spark-operator/
    Normal Pulling           3m10s kubelet    Pulling