Migrate resources to the new SageMaker Operators for Kubernetes - Amazon SageMaker

Migrate resources to the new SageMaker Operators for Kubernetes

The new Amazon SageMaker Operators for Kubernetes uses AWS Controllers for Kubernetes (ACK) to train, tune, and deploy machine learning models with Amazon SageMaker. ACK is a framework for building Kubernetes custom controllers, where each controller communicates with an AWS service API. These controllers allow Kubernetes users to provision AWS resources like databases or message queues using the Kubernetes API. For a list of all ACK service controllers, see Services in the ACK documentation.

Note

The new SageMaker Operators for Kubernetes are not backwards compatible. Use the following steps to migrate your resources.

For more information, see the ACK documentation or the ACK SageMaker controller GitHub repository. For a list of supported SageMaker resources, refer to the ACK API Reference. To get started with hands-on examples, see Tutorials.

Prerequisites

To successfully migrate resources to the new SageMaker Operators for Kubernetes, you must:

  1. Install the new SageMaker Operators for Kubernetes. See Setup in Machine Learning with the ACK SageMaker Controller for step-by-step instructions.

  2. If you are using HostingAutoscalingPolicy resources, install the new Application Auto Scaling Operator. See Setup in Scale SageMaker Workloads with Application Auto Scaling for step-by-step instructions. This step is optional if you are not using HostingAutoScalingPolicy resources.

If permissions are configured correctly, then the ACK SageMaker service controller can determine the specification and status of the AWS resource and reconcile the resource as if the ACK controller originally created it.

Adopt resources

The new SageMaker Operators for Kubernetes provide the ability to adopt resources that were not originally created by the ACK service controller. For more information, see Adopt Existing AWS Resources in the ACK documentation.

The following steps show how the new SageMaker Operators for Kubernetes can adopt an existing SageMaker endpoint. Save the following sample to a file named adopt-endpoint-sample.yaml.

apiVersion: services.k8s.aws/v1alpha1 kind: AdoptedResource metadata: name: adopt-endpoint-sample spec: aws: # resource to adopt, not created by ACK nameOrID: xgboost-endpoint kubernetes: group: sagemaker.services.k8s.aws kind: Endpoint metadata: # target K8s CR name name: xgboost-endpoint

Submit the custom resource (CR) using kubectl apply:

kubectl apply -f adopt-endpoint-sample.yaml

Use kubectl describe to check the status conditions of your adopted resource.

kubectl describe adoptedresource adopt-endpoint-sample

Verify that the ACK.Adopted condition is True. The output should look similar to the following example:

--- kind: AdoptedResource metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"services.k8s.aws/v1alpha1","kind":"AdoptedResource","metadata":{"annotations":{},"name":"xgboost-endpoint","namespace":"default"},"spec":{"aws":{"nameOrID":"xgboost-endpoint"},"kubernetes":{"group":"sagemaker.services.k8s.aws","kind":"Endpoint","metadata":{"name":"xgboost-endpoint"}}}}' creationTimestamp: '2021-04-27T02:49:14Z' finalizers: - finalizers.services.k8s.aws/AdoptedResource generation: 1 name: adopt-endpoint-sample namespace: default resourceVersion: '12669876' selfLink: "/apis/services.k8s.aws/v1alpha1/namespaces/default/adoptedresources/adopt-endpoint-sample" uid: 35f8fa92-29dd-4040-9d0d-0b07bbd7ca0b spec: aws: nameOrID: xgboost-endpoint kubernetes: group: sagemaker.services.k8s.aws kind: Endpoint metadata: name: xgboost-endpoint status: conditions: - status: 'True' type: ACK.Adopted

Check that your resource exists in your cluster:

kubectl describe endpoints.sagemaker xgboost-endpoint

HostingAutoscalingPolicy resources

The HostingAutoscalingPolicy (HAP) resource consists of multiple Application Auto Scaling resources: ScalableTarget and ScalingPolicy. When adopting a HAP resource with ACK, you need to first install the Application Auto Scaling controller. To adopt HAP resources, you need to adopt both ScalableTarget and ScalingPolicy resources. You can find the resource indentifier for these resources in the status of the HostingAutoscalingPolicy resource (status.ResourceIDList).

HostingDeployment resources

The HostingDeployment resource consists of multiple SageMaker resources: Endpoint, EndpointConfig, and each Model. If you adopt a SageMaker endpoint in ACK, you need to adopt the Endpoint, EndpointConfig, and each Model separately. The Endpoint, EndpointConfig, and Model names can be found in status of the HostingDeployment resource (status.endpointName, status.endpointConfigName, and status.modelNames).

For a list of all supported SageMaker resources, refer to the ACK API Reference.

Clean up old resources

After the new SageMaker Operators for Kubernetes adopt your resources, it is time to uninstall old operators and clean up old resources.

Step 1: Uninstall the old operator

To uninstall the old operator, see Delete operators.

Warning

Uninstall the old operator before deleting any old resources.

Step 2: Remove finalizers and delete old resources

Warning

Before deleting old resources, be sure that you have uninstalled the old operator.

After uninstalling the old operator, you must explicitly remove the finalizers to delete old operator resources. The following sample script shows how to delete all training jobs managed by the old operator in a given namespace. You can use a similar pattern to delete additional resources once they are adopted by the new operator.

Note

You must use full resource names to get resources. For example, use kubectl get trainingjobs.sagemaker.aws.amazon.com instead of kubectl get trainingjob.

namespace=sagemaker_namespace training_jobs=$(kubectl get trainingjobs.sagemaker.aws.amazon.com -n $namespace -ojson | jq -r '.items | .[] | .metadata.name') for job in $training_jobs do echo "Deleting $job resource in $namespace namespace" kubectl patch trainingjobs.sagemaker.aws.amazon.com $job -n $namespace -p '{"metadata":{"finalizers":null}}' --type=merge kubectl delete trainingjobs.sagemaker.aws.amazon.com $job -n $namespace done

Tutorials

For more in-depth guides on installing and using the new SageMaker Operator for Kubernetes, see the following tutorials: