Create a NodeClass
Important
You must start with 0 nodes in your instance group and let Karpenter handle the autoscaling. If you start with more than 0 nodes, Karpenter will scale them down to 0.
A node class (NodeClass
) defines infrastructure-level settings that apply
to groups of nodes in your Amazon EKS cluster, including network configuration, storage
settings, and resource tagging. A HyperPodNodeClass
is a custom
NodeClass
that maps to pre-created instance groups in SageMaker HyperPod,
defining constraints around which instance types and Availability Zones are supported
for Karpenter's autoscaling decisions.
Considerations for creating a node class
-
You can specify up to 10 instance groups in a
NodeClass
. -
If you choose to delete an instance group, we recommend removing it from your
NodeClass
before deleting it from your HyperPod cluster. If an instance group is deleted while it is used in aNodeClass
, theNodeClass
will be marked as notReady
for provisioning and won't be used for subsequent scaling operations until the instance group is removed fromNodeClass
. -
When you remove instance groups from a
NodeClass
, Karpenter will detect a drift on the nodes that were managed by Karpenter in the instance group(s) and disrupt the nodes based on your disruption budget controls. -
Subnets used by the instance group should belong to the same AZ. Subnets are specified either using
OverrideVpcConfig
at the instance group level or the cluster level.VpcConfig
is used by default. -
Only on-demand capacity is supported at this time. Instance groups with Training plan or reserved capacity are not supported.
-
Instance groups with
DeepHealthChecks (DHC)
are not supported. This is because a DHC takes around 60-90 minutes to complete and pods will remain in pending state during that time which can cause over-provisioning.
The following steps cover how to create a NodeClass
.
-
Create a YAML file (for example, nodeclass.yaml) with your
NodeClass
configuration. -
Apply the configuration to your cluster using kubectl.
-
Reference the
NodeClass
in yourNodePool
configuration. -
Here's a sample
NodeClass
that uses a ml.c5.xlarge and ml.c5.4xlarge instance types:apiVersion: karpenter.sagemaker.amazonaws.com/v1 kind: HyperpodNodeClass metadata: name: sample-nc spec: instanceGroups: # name of InstanceGroup in HyperPod cluster. InstanceGroup needs to pre-created # MaxItems: 10 - auto-c5-xaz1 - auto-c5-4xaz2
-
Apply the configuration:
kubectl apply -f nodeclass.yaml
-
Monitor the NodeClass status to ensure the Ready condition in status is set to True:
kubectl get hyperpodnodeclass sample-nc -o yaml
apiVersion: karpenter.sagemaker.amazonaws.com/v1 kind: HyperpodNodeClass metadata: creationTimestamp: "<timestamp>" name: sample-nc uid: <resource-uid> spec: instanceGroups: - auto-c5-az1 - auto-c5-4xaz2 status: conditions: // true when all IGs in the spec are present in SageMaker cluster, false otherwise - lastTransitionTime: "<timestamp>" message: "" observedGeneration: 3 reason: InstanceGroupReady status: "True" type: InstanceGroupReady // true if subnets of IGs are discoverable, false otherwise - lastTransitionTime: "<timestamp>" message: "" observedGeneration: 3 reason: SubnetsReady status: "True" type: SubnetsReady // true when all dependent resources are Ready [InstanceGroup, Subnets] - lastTransitionTime: "<timestamp>" message: "" observedGeneration: 3 reason: Ready status: "True" type: Ready instanceGroups: - instanceTypes: - ml.c5.xlarge name: auto-c5-az1 subnets: - id: <subnet-id> zone: <availability-zone-a> zoneId: <zone-id-a> - instanceTypes: - ml.c5.4xlarge name: auto-c5-4xaz2 subnets: - id: <subnet-id> zone: <availability-zone-b> zoneId: <zone-id-b>