SageMaker AI Components for Kubeflow Pipelines - Amazon SageMaker AI

SageMaker AI Components for Kubeflow Pipelines

With SageMaker AI components for Kubeflow Pipelines, you can create and monitor native SageMaker AI training, tuning, endpoint deployment, and batch transform jobs from your Kubeflow Pipelines. By running Kubeflow Pipeline jobs on SageMaker AI, you move data processing and training jobs from the Kubernetes cluster to SageMaker AI's machine learning-optimized managed service. This document assumes prior knowledge of Kubernetes and Kubeflow.

What are Kubeflow Pipelines?

Kubeflow Pipelines (KFP) is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. The Kubeflow Pipelines platform consists of the following:

  • A user interface (UI) for managing and tracking experiments, jobs, and runs.

  • An engine (Argo) for scheduling multi-step ML workflows.

  • An SDK for defining and manipulating pipelines and components.

  • Notebooks for interacting with the system using the SDK.

A pipeline is a description of an ML workflow expressed as a directed acyclic graph. Every step in the workflow is expressed as a Kubeflow Pipeline component, which is a AWS SDK for Python (Boto3) module.

For more information on Kubeflow Pipelines, see the Kubeflow Pipelines documentation.

What are Kubeflow Pipeline components?

A Kubeflow Pipeline component is a set of code used to execute one step of a Kubeflow pipeline. Components are represented by a Python module built into a Docker image. When the pipeline runs, the component's container is instantiated on one of the worker nodes on the Kubernetes cluster running Kubeflow, and your logic is executed. Pipeline components can read outputs from the previous components and create outputs that the next component in the pipeline can consume. These components make it fast and easy to write pipelines for experimentation and production environments without having to interact with the underlying Kubernetes infrastructure.

You can use SageMaker AI Components in your Kubeflow pipeline. Rather than encapsulating your logic in a custom container, you simply load the components and describe your pipeline using the Kubeflow Pipelines SDK. When the pipeline runs, your instructions are translated into a SageMaker AI job or deployment. The workload then runs on the fully managed infrastructure of SageMaker AI.

Why use SageMaker AI Components for Kubeflow Pipelines?

SageMaker AI Components for Kubeflow Pipelines offer an alternative to launching your compute-intensive jobs from SageMaker AI. The components integrate SageMaker AI with the portability and orchestration of Kubeflow Pipelines. Using the SageMaker AI Components for Kubeflow Pipelines, you can create and monitor your SageMaker AI resources as part of a Kubeflow Pipelines workflow. Each of the jobs in your pipelines runs on SageMaker AI instead of the local Kubernetes cluster allowing you to take advantage of key SageMaker AI features such as data labeling, large-scale hyperparameter tuning and distributed training jobs, or one-click secure and scalable model deployment. The job parameters, status, logs, and outputs from SageMaker AI are still accessible from the Kubeflow Pipelines UI.

The SageMaker AI components integrate key SageMaker AI features into your ML workflows from preparing data, to building, training, and deploying ML models. You can create a Kubeflow Pipeline built entirely using these components, or integrate individual components into your workflow as needed. The components are available in one or two versions. Each version of a component leverages a different backend. For more information on those versions, see SageMaker AI Components for Kubeflow Pipelines versions.

There is no additional charge for using SageMaker AI Components for Kubeflow Pipelines. You incur charges for any SageMaker AI resources you use through these components.

SageMaker AI Components for Kubeflow Pipelines versions

SageMaker AI Components for Kubeflow Pipelines come in two versions. Each version leverages a different backend to create and manage resources on SageMaker AI.

  • The SageMaker AI Components for Kubeflow Pipelines version 1 (v1.x or below) use Boto3 (AWS SDK for Python (Boto3)) as backend.

  • The version 2 (v2.0.0-alpha2 and above) of SageMaker AI Components for Kubeflow Pipelines use SageMaker AI Operator for Kubernetes (ACK).

    AWS introduced ACK to facilitate a Kubernetes-native way of managing AWS Cloud resources. ACK includes a set of AWS service-specific controllers, one of which is the SageMaker AI controller. The SageMaker AI controller makes it easier for machine learning developers and data scientists using Kubernetes as their control plane to train, tune, and deploy machine learning (ML) models in SageMaker AI. For more information, see SageMaker AI Operators for Kubernetes

Both versions of the SageMaker AI Components for Kubeflow Pipelines are supported. However, the version 2 provides some additional advantages. In particular, it offers:

  1. A consistent experience to manage your SageMaker AI resources from any application; whether you are using Kubeflow pipelines, or Kubernetes CLI (kubectl) or other Kubeflow applications such as Notebooks.

  2. The flexibility to manage and monitor your SageMaker AI resources outside of the Kubeflow pipeline workflow.

  3. Zero setup time to use the SageMaker AI components if you deployed the full Kubeflow on AWS release since the SageMaker AI Operator is part of its deployment.

List of SageMaker AI Components for Kubeflow Pipelines

The following is a list of all SageMaker AI Components for Kubeflow Pipelines and their available versions. Alternatively, you can find all SageMaker AI Components for Kubeflow Pipelines in GitHub.

Note

We encourage users to utilize Version 2 of a SageMaker AI component wherever it is available.

IAM permissions

Deploying Kubeflow Pipelines with SageMaker AI components requires the following three layers of authentication:

  • An IAM role granting your gateway node (which can be your local machine or a remote instance) access to the Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

    The user accessing the gateway node assumes this role to:

    • Create an Amazon EKS cluster and install KFP

    • Create IAM roles

    • Create Amazon S3 buckets for your sample input data

    The role requires the following permissions:

  • A Kubernetes IAM execution role assumed by Kubernetes pipeline pods (kfp-example-pod-role) or the SageMaker AI Operator for Kubernetes controller pod to access SageMaker AI. This role is used to create and monitor SageMaker AI jobs from Kubernetes.

    The role requires the following permission:

    • AmazonSageMakerFullAccess

    You can limit permissions to the KFP and controller pods by creating and attaching your own custom policy.

  • A SageMaker AI IAM execution role assumed by SageMaker AI jobs to access AWS resources such as Amazon S3 or Amazon ECR (kfp-example-sagemaker-execution-role).

    SageMaker AI jobs use this role to:

    • Access SageMaker AI resources

    • Input Data from Amazon S3

    • Store your output model to Amazon S3

    The role requires the following permissions:

    • AmazonSageMakerFullAccess

    • AmazonS3FullAccess

Converting pipelines to use SageMaker AI

You can convert an existing pipeline to use SageMaker AI by porting your generic Python processing containers and training containers. If you are using SageMaker AI for inference, you also need to attach IAM permissions to your cluster and convert an artifact to a model.