BatchDeleteClusterNodes
Deletes specific nodes within a SageMaker HyperPod cluster. BatchDeleteClusterNodes
accepts a cluster name and a list of node IDs.
Important
-
To safeguard your work, back up your data to Amazon S3 or an FSx for Lustre file system before invoking the API on a worker node group. This will help prevent any potential data loss from the instance root volume. For more information about backup, see Use the backup script provided by SageMaker HyperPod.
-
If you want to invoke this API on an existing cluster, you'll first need to patch the cluster by running the UpdateClusterSoftware API. For more information about patching a cluster, see Update the SageMaker HyperPod platform software of a cluster.
Request Syntax
{
"ClusterName": "string
",
"NodeIds": [ "string
" ]
}
Request Parameters
For information about the parameters that are common to all actions, see Common Parameters.
The request accepts the following data in JSON format.
- ClusterName
-
The name of the SageMaker HyperPod cluster from which to delete the specified nodes.
Type: String
Length Constraints: Maximum length of 256.
Pattern:
^(arn:aws[a-z\-]*:sagemaker:[a-z0-9\-]*:[0-9]{12}:cluster/[a-z0-9]{12})|([a-zA-Z0-9](-*[a-zA-Z0-9]){0,62})$
Required: Yes
- NodeIds
-
A list of node IDs to be deleted from the specified cluster.
Note
-
For SageMaker HyperPod clusters using the Slurm workload manager, you cannot remove instances that are configured as Slurm controller nodes.
-
If you need to delete more than 99 instances, contact Support
for assistance.
Type: Array of strings
Array Members: Minimum number of 1 item. Maximum number of 3000 items.
Length Constraints: Minimum length of 1. Maximum length of 256.
Pattern:
^i-[a-f0-9]{8}(?:[a-f0-9]{9})?$
Required: Yes
-
Response Syntax
{
"Failed": [
{
"Code": "string",
"Message": "string",
"NodeId": "string"
}
],
"Successful": [ "string" ]
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
The following data is returned in JSON format by the service.
- Failed
-
A list of errors encountered when deleting the specified nodes.
Type: Array of BatchDeleteClusterNodesError objects
Array Members: Minimum number of 1 item. Maximum number of 3000 items.
- Successful
-
A list of node IDs that were successfully deleted from the specified cluster.
Type: Array of strings
Array Members: Minimum number of 1 item. Maximum number of 3000 items.
Length Constraints: Minimum length of 1. Maximum length of 256.
Pattern:
^i-[a-f0-9]{8}(?:[a-f0-9]{9})?$
Errors
For information about the errors that are common to all actions, see Common Errors.
- ResourceNotFound
-
Resource being access is not found.
HTTP Status Code: 400
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following: