Place Kubernetes Pods on Amazon EKS by using node affinity, taints, and tolerations
Created by Hitesh Parikh (AWS) and Raghu Bhamidimarri (AWS)
Environment: PoC or pilot | Technologies: Containers & microservices | Workload: Open-source |
AWS services: Amazon EKS |
Summary
This pattern demonstrates the use of Kubernetes node affinity, node taints, and Pod tolerations to intentionally schedule application Pods on specific worker nodes in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on the Amazon Web Services (AWS) Cloud.
A taint is a node property that enables nodes to reject a set of pods. A toleration is a Pod property that enables the Kubernetes scheduler to schedule Pods on nodes that have matching taints.
However, tolerations alone can’t prevent a scheduler from placing a Pod on a worker node that doesn’t have any taints. For example, a compute intensive Pod with a toleration can unintentionally get scheduled on a general-purpose untainted node. In that scenario, the node affinity property of a Pod instructs the scheduler to place the Pod on a node that meets the node selection criteria specified in the node affinity.
Taints, tolerations, and node affinity together instruct the scheduler to schedule Pods consistently on the nodes with matching taints and the node labels that match the node affinity node-selection criteria specified on the Pod.
This pattern provides an example Kubernetes deployment manifest file, and the steps to create an EKS cluster, deploy an application, and validate Pod placement.
Prerequisites and limitations
Prerequisites
An AWS account with credentials configured to create resources on your AWS account
AWS Command Line Interface (AWS CLI)
eksctl
kubectl
Docker
installed (for the operating system being used), and the engine started (for information about Docker licensing requirements, see the Docker site ) Java
version 11 or later A Java microservice running on your favorite integrated development environment (IDE); for example, AWS Cloud9
, IntelliJ IDEA Community Edition or Eclipse (if you don’t have a Java microservice, see the Deploy a sample Java microservice on Amazon EKS pattern and Microservices with Spring for help with creating the microservice)
Limitations
This pattern doesn’t provide the Java code, and it assumes that you are already familiar with Java. To create a basic Java microservice, see Deploy a sample Java microservice on Amazon EKS.
The steps in this article create AWS resources that can accrue cost. Make sure that you clean up the AWS resources after you have completed the steps to implement and validate the pattern.
Architecture
Target technology stack
Amazon EKS
Java
Docker
Amazon Elastic Container Registry (Amazon ECR)
Target architecture
The solution architecture diagram shows Amazon EKS with two Pods (Deployment 1 and Deployment 2) and two node groups (ng1 and ng2) with two nodes each. The Pods and nodes have the following properties.
| Deployment 1 Pod | Deployment 2 Pod | Node group 1 (ng1) | Node group 2 (ng2) |
---|---|---|---|---|
Toleration | key: classified_workload, value: true, effect: NoSchedule key: machine_learning_workload, value: true, effect: NoSchedule | None |
|
|
Node affinity | key: alpha.eksctl.io/nodegroup-name = ng1; | None | nodeGroups.name = ng1 |
|
Taint |
|
| key: classified_workload, value: true, effect: NoSchedule key: machine_learning_workload, value: true, effect: NoSchedule | None |
The Deployment 1 Pod has tolerations and node affinity defined, which instructs the Kubernetes scheduler to place the deployment Pods on the Node group 1 (ng1) nodes.
Node group 2 (ng2) doesn’t have a node label that matches the node affinity node selector expression for Deployment 1, so the Pods will not be scheduled on ng2 nodes.
The Deployment 2 Pod doesn’t have any tolerations or node affinity defined in the deployment manifest. The scheduler will reject scheduling Deployment 2 Pods on Node group 1 because of the taints on the nodes.
The Deployment 2 Pods will be placed on Node group 2 instead, because the nodes don’t have any taints.
This pattern demonstrates that by using taints and tolerations, combined with node affinity, you can control placement of Pods on specific sets of worker nodes.
Tools
AWS services
AWS Command Line Interface (AWS CLI) is an open-source tool that helps you interact with AWS services through commands in your command-line shell.
Amazon Elastic Container Registry (Amazon ECR) is a managed container image registry service that’s secure, scalable, and reliable.
Amazon Elastic Kubernetes Service (Amazon EKS) helps you run Kubernetes on AWS without needing to install or maintain your own Kubernetes control plane or nodes.
eksctl is AWS equivalent of kubectl and helps with creating EKS.
Other tools
Epics
Task | Description | Skills required |
---|---|---|
Create the cluster.yaml file. | Create a file called
| App owner, AWS DevOps, Cloud administrator, DevOps engineer |
Create the cluster by using eksctl. | Run the
| AWS DevOps, AWS systems administrator, App developer |
Task | Description | Skills required |
---|---|---|
Create an Amazon ECR private repository. | To create an Amazon ECR repository, see Creating a private repository. Note the URI of the repo. | AWS DevOps, DevOps engineer, App developer |
Create the Dockerfile. | If you have an existing Docker container image that you want to use to test the pattern, you can skip this step. To create a Dockerfile, use the following snippet as a reference. If you encounter errors, see the Troubleshooting section.
| AWS DevOps, DevOps engineer |
Create the pom.xml and source files, and build and push the Docker image. | To create the Use the instructions in that pattern to build and push the Docker image. | AWS DevOps, DevOps engineer, App developer |
Task | Description | Skills required |
---|---|---|
Create the deployment.yaml file. | To create the In the code, the key for node affinity is any label that you create while creating node groups. This pattern uses the default label created by eksctl. For information about customizing labels, see Assigning Pods to Nodes The value for the node affinity key is the name of the node group that was created by To get the key and value for the taint, run the following command.
The image is the URI of the Amazon ECR repository that you created in an earlier step. | AWS DevOps, DevOps engineer, App developer |
Deploy the file. | To deploy to Amazon EKS, run the following command.
| App developer, DevOps engineer, AWS DevOps |
Check the deployment. |
| App developer, DevOps engineer, AWS DevOps |
Create a second deployment .yaml file without toleration and node affinity. | This additional step is to validate that when no node affinity or tolerations are specified in the deployment manifest file, the resulting Pod is not scheduled on a node with taints. (It should be scheduled on a node that doesn’t have any taints). Use the following code to create a new deployment file called
| App developer, AWS DevOps, DevOps engineer |
Deploy the second deployment .yaml file, and validate Pod placement |
| App developer, AWS DevOps, DevOps engineer |
Task | Description | Skills required |
---|---|---|
Clean up the resources. | To avoid incurring AWS charges for resources that are left running, use the following command.
| AWS DevOps, App developer |
Troubleshooting
Issue | Solution |
---|---|
Some of these commands might not run if your system uses arm64 architecture
| If you have errors when running the Dockerfile, replace the
|
Related resources
Assigning Pods to Nodes
(Kubernetes documentation) Taints and Tolerations
(Kubernetes documentation)
Additional information
deployment.yaml
apiVersion: apps/v1 kind: Deployment metadata: name: microservice-deployment spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: java-microservice template: metadata: labels: app.kubernetes.io/name: java-microservice spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: alpha.eksctl.io/nodegroup-name operator: In values: - <node-group-name-from-cluster.yaml> tolerations: #only this pod has toleration and is viable to go to ng with taint - key: "<Taint key>" #classified_workload in our case operator: Equal value: "<Taint value>" #true effect: "NoSchedule" - key: "<Taint key>" #machine_learning_workload in our case operator: Equal value: "<Taint value>" #true effect: "NoSchedule" containers: - name: java-microservice-container image: <account_number>.dkr.ecr<region>.amazonaws.com/<repository_name>:latest ports: - containerPort: 4567
describe pod example output
Name: microservice-deployment-in-tainted-nodes-5684cc495b-vpcfx Namespace: default Priority: 0 Node: ip-192-168-29-181.us-west-1.compute.internal/192.168.29.181 Start Time: Wed, 14 Sep 2022 11:06:47 -0400 Labels: app.kubernetes.io/name=java-microservice-taint pod-template-hash=5684cc495b Annotations: kubernetes.io/psp: eks.privileged Status: Running IP: 192.168.13.44 IPs: IP: 192.168.13.44 Controlled By: ReplicaSet/microservice-deployment-in-tainted-nodes-5684cc495b Containers: java-microservice-container-1: Container ID: docker://5c158df8cc160de8f57f62f3ee16b12725a87510a809d90a1fb9e5d873c320a4 Image: 934188034500.dkr.ecr.us-east-1.amazonaws.com/java-eks-apg Image ID: docker-pullable://934188034500.dkr.ecr.us-east-1.amazonaws.com/java-eks-apg@sha256:d223924aca8315aab20d54eddf3443929eba511b6433017474d01b63a4114835 Port: 4567/TCP Host Port: 0/TCP State: Running Started: Wed, 14 Sep 2022 11:07:02 -0400 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ddvvw (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-ddvvw: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: <none> Tolerations: classifled_workload=true:NoSchedule machine_learning_workload=true:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: <none>