SageMaker HyperPod references - Amazon SageMaker

SageMaker HyperPod references

Find more information and references about using SageMaker HyperPod in the following topics.

SageMaker HyperPod pricing

The following topics provide information about SageMaker HyperPod pricing. To find more details on price per hour for using SageMaker HyperPod instances, see also Amazon SageMaker Pricing.

Capacity requests

You can allocate on-demand or reserved compute capacity with SageMaker for use on SageMaker HyperPod. On-demand cluster creation allocates available capacity from the SageMaker on-demand capacity pool. Alternatively, you can request reserved capacity to ensure access by submitting a ticket for a quota increase. Inbound capacity requests are prioritized by SageMaker and you receive an estimated time for capacity allocation.

Service billing

When you provision a compute capacity on SageMaker HyperPod, you are billed for the duration of the capacity allocation. SageMaker HyperPod billing appears in your anniversary bills with a line item for the type of capacity allocation (on-demand, reserved), the instance type, and the time spent on using the instance.

To submit a ticket for a quota increase, see SageMaker HyperPod quotas.

SageMaker HyperPod APIs

The following list is a full set of SageMaker HyperPod APIs for submitting action requests in JSON format to SageMaker through AWS CLI or AWS SDK for Python (Boto3).

SageMaker HyperPod forms

To configure the Slurm workload manager tool on HyperPod, you should create a Slurm configuration file required by HyperPod using the provided form.

Configuration form for provisioning Slurm nodes on HyperPod

The following code is the Slurm configuration form you should prepare to properly set up Slurm nodes on your HyperPod cluster. You should complete this form and upload it as part of a set of lifecycle scripts during cluster creation. To learn how this form should be prepared throughout HyperPod cluster creation processes, see SageMaker HyperPod lifecycle configuration best practices.

// Save as provisioning_params.json. { "version": "1.0.0", "workload_manager": "slurm", "controller_group": "string", "login_group": "string", "worker_groups": [ { "instance_group_name": "string", "partition_name": "string" } ], "fsx_dns_name": "string", "fsx_mountname": "string" }
  • version – Required. This is the version of the HyperPod provisioning parameter form. Keep it to 1.0.0.

  • workload_manager – Required. This is for specifying which workload manager to be configured on the HyperPod cluster. Keep it to slurm.

  • controller_group – Required. This is for specifying the name of the HyperPod cluster instance group you want to assign to Slurm controller (head) node.

  • login_group – Optional. This is for specifying the name of the HyperPod cluster instance group you want to assign to Slurm login node.

  • worker_groups – Required. This is for setting up Slurm worker (compute) nodes on the HyperPod cluster.

    • instance_group_name – Required. This is for specifying the name of the HyperPod instance group you want to assign to Slurm worker (compute) node.

    • partition_name – Required. This is for specifying the partition name to the node.

  • fsx_dns_name – Optional. If you want to set up your Slurm nodes on the HyperPod cluster to communicate with Amazon FSx, specify the FSx DNS name.

  • fsx_mountname – Optional. If you want to set up your Slurm nodes on the HyperPod cluster to communicate with Amazon FSx, specify the FSx mount name.

SageMaker HyperPod DLAMI

The SageMaker HyperPod agent runs a SageMaker HyperPod DLAMI, which is built on top of AWS Deep Learning Base GPU AMI (Ubuntu 20.04).

The SageMaker HyperPod DLAMI is bundled with additional packages to support open source tools such as Slurm and dependencies, and SageMaker HyperPod cluster software packages to support features such as cluster health check and auto-resume. To follow up with HyperPod software updates that the HyperPod service team distributes through the DLAMI, see Amazon SageMaker HyperPod release notes.

SageMaker HyperPod API permissions reference

Important

Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see Provide Permissions for Tagging SageMaker Resources.

AWS Managed Policies for Amazon SageMaker that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

When you are setting up access control for allowing to run SageMaker HyperPod API operations and writing a permissions policy that you can attach to IAM users for cloud administrators, use the following table as a reference.

Amazon SageMaker API Operations Required Permissions (API Actions) Resources
CreateCluster sagemaker:CreateCluster arn:aws:sagemaker:region:account-id:cluster/cluster-id
DeleteCluster sagemaker:DeleteCluster arn:aws:sagemaker:region:account-id:cluster/cluster-id
DescribeCluster sagemaker:DescribeCluster arn:aws:sagemaker:region:account-id:cluster/cluster-id
DescribeClusterNode sagemaker:DescribeClusterNode arn:aws:sagemaker:region:account-id:cluster/cluster-id
ListClusterNodes sagemaker:ListClusterNodes arn:aws:sagemaker:region:account-id:cluster/cluster-id
ListClusters sagemaker:ListClusters arn:aws:sagemaker:region:account-id:cluster/cluster-id
UpdateCluster sagemaker:UpdateCluster arn:aws:sagemaker:region:account-id:cluster/cluster-id
UpdateClusterSoftware sagemaker:UpdateClusterSoftware arn:aws:sagemaker:region:account-id:cluster/cluster-id

For a complete list of permissions and resource types for SageMaker APIs, see Actions, resources, and condition keys for Amazon SageMaker in the AWS Service Authorization Reference.

SageMaker HyperPod commands in AWS CLI

The following are the AWS CLI commands for SageMaker HyperPod to run the core HyperPod API operations.

SageMaker HyperPod Python modules in AWS SDK for Python (Boto3)

The following are the methods of the AWS SDK for Python (Boto3) client for SageMaker to run the core HyperPod API operations.