Create a NodeClass

Important

You must start with 0 nodes in your instance group and let Karpenter handle the autoscaling. If you start with more than 0 nodes, Karpenter will scale them down to 0.

A node class (NodeClass) defines infrastructure-level settings that apply to groups of nodes in your Amazon EKS cluster, including network configuration, storage settings, and resource tagging. A HyperPodNodeClass is a custom NodeClass that maps to pre-created instance groups in SageMaker HyperPod, defining constraints around which instance types and Availability Zones are supported for Karpenter's autoscaling decisions.

Considerations for creating a node class

You can specify up to 10 instance groups in a NodeClass.
If you choose to delete an instance group, we recommend removing it from your NodeClass before deleting it from your HyperPod cluster. If an instance group is deleted while it is used in a NodeClass, the NodeClass will be marked as not Ready for provisioning and won't be used for subsequent scaling operations until the instance group is removed from NodeClass.
When you remove instance groups from a NodeClass, Karpenter will detect a drift on the nodes that were managed by Karpenter in the instance group(s) and disrupt the nodes based on your disruption budget controls.
Subnets used by the instance group should belong to the same AZ. Subnets are specified either using OverrideVpcConfig at the instance group level or the cluster level. VpcConfig is used by default.
Only on-demand capacity is supported at this time. Instance groups with Training plan or reserved capacity are not supported.
Instance groups with DeepHealthChecks (DHC) are not supported. This is because a DHC takes around 60-90 minutes to complete and pods will remain in pending state during that time which can cause over-provisioning.

The following steps cover how to create a NodeClass.

Create a YAML file (for example, nodeclass.yaml) with your NodeClass configuration.
Apply the configuration to your cluster using kubectl.
Reference the NodeClass in your NodePool configuration.

Here's a sample NodeClass that uses a ml.c5.xlarge and ml.c5.4xlarge instance types:


apiVersion: karpenter.sagemaker.amazonaws.com/v1
kind: HyperpodNodeClass
metadata:
  name: sample-nc
spec:
  instanceGroups:
    # name of InstanceGroup in HyperPod cluster. InstanceGroup needs to pre-created
    # MaxItems: 10
    - auto-c5-xaz1
    - auto-c5-4xaz2

Apply the configuration:
```
kubectl apply -f nodeclass.yaml
```

Monitor the NodeClass status to ensure the Ready condition in status is set to True:


kubectl get hyperpodnodeclass sample-nc -o yaml


apiVersion: karpenter.sagemaker.amazonaws.com/v1
kind: HyperpodNodeClass
metadata:
  creationTimestamp: "<timestamp>"
  name: sample-nc
  uid: <resource-uid>
spec:
  instanceGroups:
  - auto-c5-az1
  - auto-c5-4xaz2
status:
  conditions:
  // true when all IGs in the spec are present in SageMaker cluster, false otherwise
  - lastTransitionTime: "<timestamp>"
    message: ""
    observedGeneration: 3
    reason: InstanceGroupReady
    status: "True"
    type: InstanceGroupReady
  // true if subnets of IGs are discoverable, false otherwise
  - lastTransitionTime: "<timestamp>"
    message: ""
    observedGeneration: 3
    reason: SubnetsReady
    status: "True"
    type: SubnetsReady
  // true when all dependent resources are Ready [InstanceGroup, Subnets]
  - lastTransitionTime: "<timestamp>"
    message: ""
    observedGeneration: 3
    reason: Ready
    status: "True"
    type: Ready
  instanceGroups:
  - instanceTypes:
    - ml.c5.xlarge
    name: auto-c5-az1
    subnets:
    - id: <subnet-id>
      zone: <availability-zone-a>
      zoneId: <zone-id-a>
  - instanceTypes:
    - ml.c5.4xlarge
    name: auto-c5-4xaz2
    subnets:
    - id: <subnet-id>
      zone: <availability-zone-b>
      zoneId: <zone-id-b>

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Create a HyperPod cluster

Create a NodePool