Place Kubernetes Pods on Amazon EKS by using node affinity, taints, and tolerations - AWS Prescriptive Guidance

Place Kubernetes Pods on Amazon EKS by using node affinity, taints, and tolerations

Created by Hitesh Parikh (AWS) and Raghu Bhamidimarri (AWS)

Environment: PoC or pilot

Technologies: Containers & microservices

Workload: Open-source

AWS services: Amazon EKS

Summary

This pattern demonstrates the use of Kubernetes node affinity, node taints, and Pod tolerations to intentionally schedule application Pods on specific worker nodes in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on the Amazon Web Services (AWS) Cloud.

A taint is a node property that enables nodes to reject a set of pods. A toleration is a Pod property that enables the Kubernetes scheduler to schedule Pods on nodes that have matching taints.

However, tolerations alone can’t prevent a scheduler from placing a Pod on a worker node that doesn’t have any taints. For example, a compute intensive Pod with a toleration can unintentionally get scheduled on a general-purpose untainted node. In that scenario, the node affinity property of a Pod instructs the scheduler to place the Pod on a node that meets the node selection criteria specified in the node affinity.

Taints, tolerations, and node affinity together instruct the scheduler to schedule Pods consistently on the nodes with matching taints and the node labels that match the node affinity node-selection criteria specified on the Pod.

This pattern provides an example Kubernetes deployment manifest file, and the steps to create an EKS cluster, deploy an application, and validate Pod placement.

Prerequisites and limitations

Prerequisites 

Limitations 

  • This pattern doesn’t provide the Java code, and it assumes that you are already familiar with Java. To create a basic Java microservice, see Deploy a sample Java microservice on Amazon EKS.

  • The steps in this article create AWS resources that can accrue cost. Make sure that you clean up the AWS resources after you have completed the steps to implement and validate the pattern.

Architecture

Target technology stack  

  • Amazon EKS

  • Java

  • Docker

  • Amazon Elastic Container Registry (Amazon ECR)

Target architecture 

The solution architecture diagram shows Amazon EKS with two Pods (Deployment 1 and Deployment 2) and two node groups (ng1 and ng2) with two nodes each. The Pods and nodes have the following properties.

 

Deployment 1 Pod

Deployment 2 Pod

Node group 1 (ng1)

Node group 2 (ng2)

Toleration

key: classified_workload, value: true, effect: NoSchedule

key: machine_learning_workload, value: true, effect: NoSchedule

None

 

 

Node affinity

key: alpha.eksctl.io/nodegroup-name = ng1;

None

nodeGroups.name = ng1

 

Taint

 

 

key: classified_workload, value: true, effect: NoSchedule

key: machine_learning_workload, value: true, effect: NoSchedule

None

Amazon EKS configuration with two pods and two node groups.
  1. The Deployment 1 Pod has tolerations and node affinity defined, which instructs the Kubernetes scheduler to place the deployment Pods on the Node group 1 (ng1) nodes.

  2. Node group 2 (ng2) doesn’t have a node label that matches the node affinity node selector expression for Deployment 1, so the Pods will not be scheduled on ng2 nodes.

  3. The Deployment 2 Pod doesn’t have any tolerations or node affinity defined in the deployment manifest. The scheduler will reject scheduling Deployment 2 Pods on Node group 1 because of the taints on the nodes.

  4. The Deployment 2 Pods will be placed on Node group 2 instead, because the nodes don’t have any taints.

This pattern demonstrates that by using taints and tolerations, combined with node affinity, you can control placement of Pods on specific sets of worker nodes.

Tools

AWS services

Other tools

  • Docker is a set of platform as a service (PaaS) products that use virtualization at the operating-system level to deliver software in containers.

  • kubectl is a command-line interface that helps you run commands against Kubernetes clusters.

Epics

TaskDescriptionSkills required

Create the cluster.yaml file.

Create a file called cluster.yaml with the following code.

apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: eks-taint-demo region: us-west-1 # Unmanaged nodegroups with and without taints. nodeGroups: - name: ng1 instanceType: m5.xlarge minSize: 2 maxSize: 3 taints: - key: classified_workload value: "true" effect: NoSchedule - key: machine_learning_workload value: "true" effect: NoSchedule - name: ng2 instanceType: m5.xlarge minSize: 2 maxSize: 3
App owner, AWS DevOps, Cloud administrator, DevOps engineer

Create the cluster by using eksctl.

Run the cluster.yaml file to create the EKS cluster. Creating the cluster might take a few minutes.

eksctl create cluster -f cluster.yaml
AWS DevOps, AWS systems administrator, App developer
TaskDescriptionSkills required

Create an Amazon ECR private repository.

To create an Amazon ECR repository, see Creating a private repository. Note the URI of the repo.

AWS DevOps, DevOps engineer, App developer

Create the Dockerfile.

If you have an existing Docker container image that you want to use to test the pattern, you can skip this step.

To create a Dockerfile, use the following snippet as a reference. If you encounter errors, see the Troubleshooting section.

FROM adoptopenjdk/openjdk11:jdk-11.0.14.1_1-alpine RUN apk add maven WORKDIR /code # Prepare by downloading dependencies ADD pom.xml /code/pom.xml RUN ["mvn", "dependency:resolve"] RUN ["mvn", "verify"] # Adding source, compile and package into a fat jar ADD src /code/src RUN ["mvn", "package"] EXPOSE 4567 CMD ["java", "-jar", "target/eksExample-jar-with-dependencies.jar"]
AWS DevOps, DevOps engineer

Create the pom.xml and source files, and build and push the Docker image.

To create the pom.xml file and the Java source file, see Deploy a sample Java microservice on Amazon EKS pattern.

Use the instructions in that pattern to build and push the Docker image.

AWS DevOps, DevOps engineer, App developer
TaskDescriptionSkills required

Create the deployment.yaml file.

To create the deployment.yaml file, use the code in the Additional information section.

In the code, the key for node affinity is any label that you create while creating node groups. This pattern uses the default label created by eksctl. For information about customizing labels, see Assigning Pods to Nodes in the Kubernetes documentation.

The value for the node affinity key is the name of the node group that was created by cluster.yaml.

To get the key and value for the taint, run the following command.

kubectl get nodes -o json | jq '.items[].spec.taints'

The image is the URI of the Amazon ECR repository that you created in an earlier step.

AWS DevOps, DevOps engineer, App developer

Deploy the file.

To deploy to Amazon EKS, run the following command.

kubectl apply -f deployment.yaml
App developer, DevOps engineer, AWS DevOps

Check the deployment.

  1. To check if the pods are READY, run the following command.

    kubectl get pods -o wide

    If the POD is ready, the output should look similar to the following, with the STATUS as Running.

    NAME        READY    STATUS    RESTARTS   AGE   IP  NODE  NOMINATED NODE   READINESS GATES <pod_name>   1/1     Running   0          12d   192.168.18.50   ip-192-168-20-110.us-west-1.compute.internal   <none>           <none>

    Note the name of the Pod and the name of the node. You can skip the next step.

  2. (Optional) To get additional details about the Pod and check the tolerations on the Pod, run the following command.

    kubectl describe pod <pod_name>

    An example of the output is in the Additional information section.

  3. To validate that the Pod placement on the node is correct, run the following command.

    kubectl describe node <node name> | grep -A 1 "Taints"

    Confirm that the taint on the node matches toleration, and the label on the node matches the node affinity defined in deployment.yaml.

    The Pod with tolerations and node affinity should be placed on a node with the matching taints and the node affinity labels. The previous command gives you the taints on the node. The following is an example output.

    kubectl describe node ip-192-168-29-181.us-west-1.compute.internal | grep -A 1 "Taints" Taints:             classifled_workload=true:NoSchedule                     machine_learning_workload=true:NoSchedule

    Additionally, run the following command to check that the node on which the Pod is placed has a label matching the node affinity node label.

    kubectl get node <node name> --show-labels
  4. To verify that the application is doing what it is intended to do, check the Pod logs by running the following command.

    kubectl logs -f <name-of-the-pod>
App developer, DevOps engineer, AWS DevOps

Create a second deployment .yaml file without toleration and node affinity.

This additional step is to validate that when no node affinity or tolerations are specified in the deployment manifest file, the resulting Pod is not scheduled on a node with taints. (It should be scheduled on a node that doesn’t have any taints). Use the following code to create a new deployment file called deploy_no_taint.yaml.

apiVersion: apps/v1 kind: Deployment metadata: name: microservice-deployment-non-tainted spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: java-microservice-no-taint template: metadata: labels: app.kubernetes.io/name: java-microservice-no-taint spec: containers: - name: java-microservice-container-2 image: <account_number>.dkr.ecr<region>.amazonaws.com/<repository_name>:latest ports: - containerPort: 4567
App developer, AWS DevOps, DevOps engineer

Deploy the second deployment .yaml file, and validate Pod placement

  1. Run the following command.

    kubectl apply -f deploy_no_taint.yaml
  2. After the deployment is successful, run the same commands that you ran previously to check the Pod placement in a node group with no taint.

    kubectl describe node <node_name> | grep "Taints"

    The output should be the following.

    Taints: <none>

    This completes the testing.

App developer, AWS DevOps, DevOps engineer
TaskDescriptionSkills required

Clean up the resources.

To avoid incurring AWS charges for resources that are left running, use the following command.

eksctl delete cluster --name <Name of the cluster> --region <region-code>
AWS DevOps, App developer

Troubleshooting

IssueSolution

Some of these commands might not run if your system uses arm64 architecture (especially if you are running this on an M1 Mac). The following line might error out.

FROM adoptopenjdk/openjdk11:jdk-11.0.14.1_1-alpine

If you have errors when running the Dockerfile, replace the FROM line with the following line.

FROM bellsoft/liberica-openjdk-alpine-musl:17

Related resources

Additional information

deployment.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: microservice-deployment spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: java-microservice template: metadata: labels: app.kubernetes.io/name: java-microservice spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: alpha.eksctl.io/nodegroup-name operator: In values: - <node-group-name-from-cluster.yaml> tolerations: #only this pod has toleration and is viable to go to ng with taint - key: "<Taint key>" #classified_workload in our case operator: Equal value: "<Taint value>" #true effect: "NoSchedule" - key: "<Taint key>" #machine_learning_workload in our case operator: Equal value: "<Taint value>" #true effect: "NoSchedule" containers: - name: java-microservice-container image: <account_number>.dkr.ecr<region>.amazonaws.com/<repository_name>:latest ports: - containerPort: 4567

describe pod example output

Name: microservice-deployment-in-tainted-nodes-5684cc495b-vpcfx Namespace: default Priority: 0 Node: ip-192-168-29-181.us-west-1.compute.internal/192.168.29.181 Start Time: Wed, 14 Sep 2022 11:06:47 -0400 Labels: app.kubernetes.io/name=java-microservice-taint pod-template-hash=5684cc495b Annotations: kubernetes.io/psp: eks.privileged Status: Running IP: 192.168.13.44 IPs: IP: 192.168.13.44 Controlled By: ReplicaSet/microservice-deployment-in-tainted-nodes-5684cc495b Containers: java-microservice-container-1: Container ID: docker://5c158df8cc160de8f57f62f3ee16b12725a87510a809d90a1fb9e5d873c320a4 Image: 934188034500.dkr.ecr.us-east-1.amazonaws.com/java-eks-apg Image ID: docker-pullable://934188034500.dkr.ecr.us-east-1.amazonaws.com/java-eks-apg@sha256:d223924aca8315aab20d54eddf3443929eba511b6433017474d01b63a4114835 Port: 4567/TCP Host Port: 0/TCP State: Running Started: Wed, 14 Sep 2022 11:07:02 -0400 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ddvvw (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-ddvvw: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: <none> Tolerations: classifled_workload=true:NoSchedule machine_learning_workload=true:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: <none>